Výsledky vyhledávání - "Zosa, Elaine"

Report

Poro 34B and the Blessing of Multilinguality

Autor: Luukkonen, Risto, Burdge, Jonathan, Zosa, Elaine, Talman, Aarne, Komulainen, Ville, Hatanpää, Väinö, Sarlin, Peter, Pyysalo, Sampo

The pretraining of state-of-the-art large language models now requires trillions of words of text, which is orders of magnitude more than available for the vast majority of languages. While including text in more than one language is an obvious way t

Externí odkaz: http://arxiv.org/abs/2404.01856

Zobrazit plný text záznamu

Report

SemEval-2024 Shared Task 6: SHROOM, a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes

Autor: Mickus, Timothee, Zosa, Elaine, Vázquez, Raúl, Vahtola, Teemu, Tiedemann, Jörg, Segonne, Vincent, Raganato, Alessandro, Apidianaki, Marianna

This paper presents the results of the SHROOM, a shared task focused on detecting hallucinations: outputs from natural language generation (NLG) systems that are fluent, yet inaccurate. Such cases of overgeneration put in jeopardy many NLG applicatio

Externí odkaz: http://arxiv.org/abs/2403.07726

Zobrazit plný text záznamu

Report

Grounded and Well-rounded: A Methodological Approach to the Study of Cross-modal and Cross-lingual Grounding

Autor: Mickus, Timothee, Zosa, Elaine, Paperno, Denis

Grounding has been argued to be a crucial component towards the development of more complete and truly semantically competent artificial intelligence systems. Literature has divided into two camps: While some argue that grounding allows for qualitati

Externí odkaz: http://arxiv.org/abs/2310.11938

Zobrazit plný text záznamu

Report

Multilingual and Multimodal Topic Modelling with Pretrained Embeddings

Autor: Zosa, Elaine, Pivovarova, Lidia

This paper presents M3L-Contrast -- a novel multimodal multilingual (M3L) neural topic model for comparable data that maps texts from multiple languages and images into a shared topic space. Our model is trained jointly on texts and images and takes

Externí odkaz: http://arxiv.org/abs/2211.08057

Zobrazit plný text záznamu

Report

Not All Comments are Equal: Insights into Comment Moderation from a Topic-Aware Model

Autor: Zosa, Elaine, Shekhar, Ravi, Karan, Mladen, Purver, Matthew

Moderation of reader comments is a significant problem for online news platforms. Here, we experiment with models for automatic moderation, using a dataset of comments from a popular Croatian newspaper. Our analysis shows that while comments that vio

Externí odkaz: http://arxiv.org/abs/2109.10033

Zobrazit plný text záznamu

Report

Topic modelling discourse dynamics in historical newspapers

Autor: Marjanen, Jani, Zosa, Elaine, Hengchen, Simon, Pivovarova, Lidia, Tolonen, Mikko

This paper addresses methodological issues in diachronic data analysis for historical research. We apply two families of topic models (LDA and DTM) on a relatively large set of historical newspapers, with the aim of capturing and understanding discou

Externí odkaz: http://arxiv.org/abs/2011.10428

Zobrazit plný text záznamu

Report

Capturing Evolution in Word Usage: Just Add More Clusters?

Autor: Martinc, Matej, Montariol, Syrielle, Zosa, Elaine, Pivovarova, Lidia

Publikováno v: WWW 20 Companion Proceedings of the Web Conference 2020 (April 2020) p. 343-349

The way the words are used evolves through time, mirroring cultural or technological evolution of society. Semantic change detection is the task of detecting and analysing word evolution in textual data, even in short periods of time. In this paper w

Externí odkaz: http://arxiv.org/abs/2001.06629

Zobrazit plný text záznamu

Analysis of News Media with Topic Models

Autor: Zosa, Elaine

The news is a detailed record of events, issues, and opinions published daily in every country around the world. In addition to daily news content, many national libraries are digitising their historical newspaper collections. This wealth of material

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=od______1593::9df38c444becd403ac3aa4afb6b5eb49
http://hdl.handle.net/10138/353011

Zobrazit plný text záznamu

Evaluating the Robustness of Embedding-Based Topic Models to OCR Noise

Autor: Zosa, Elaine, Mutuvi, Stephen, Granroth-Wilding, Mark, Doucet, Antoine

Publikováno v: International Conference on Asian Digital Libraries (ICADL)
Towards Open and Trustworthy Digital Societies. 23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021, Virtual Event, December 1–3, 2021, Proceedings
Hao-Ren Ke; Chei Sian Lee; Kazunari Sugiyama. Towards Open and Trustworthy Digital Societies. 23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021, Virtual Event, December 1–3, 2021, Proceedings, 13133, Springer, pp.392-400, 2021, Lecture Notes in Computer Science, 978-3-030-91668-8. ⟨10.1007/978-3-030-91669-5_30⟩
Lecture Notes in Computer Science
Lecture Notes in Computer Science-Towards Open and Trustworthy Digital Societies
Lecture Notes in Computer Science ISBN: 9783030916688

International audience; Unsupervised topic models such as Latent Dirichlet Allocation (LDA) are popular tools to analyse digitised corpora. However, the performance of these tools have been shown to degrade with OCR noise. Topic models that incorpora

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::12c482759de5d67b0e63baea47b23e61

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání