Zobrazeno 1 - 10
of 33
pro vyhledávání: '"Zosa, Elaine"'
Autor:
Luukkonen, Risto, Burdge, Jonathan, Zosa, Elaine, Talman, Aarne, Komulainen, Ville, Hatanpää, Väinö, Sarlin, Peter, Pyysalo, Sampo
The pretraining of state-of-the-art large language models now requires trillions of words of text, which is orders of magnitude more than available for the vast majority of languages. While including text in more than one language is an obvious way t
Externí odkaz:
http://arxiv.org/abs/2404.01856
Autor:
Mickus, Timothee, Zosa, Elaine, Vázquez, Raúl, Vahtola, Teemu, Tiedemann, Jörg, Segonne, Vincent, Raganato, Alessandro, Apidianaki, Marianna
This paper presents the results of the SHROOM, a shared task focused on detecting hallucinations: outputs from natural language generation (NLG) systems that are fluent, yet inaccurate. Such cases of overgeneration put in jeopardy many NLG applicatio
Externí odkaz:
http://arxiv.org/abs/2403.07726
Grounding has been argued to be a crucial component towards the development of more complete and truly semantically competent artificial intelligence systems. Literature has divided into two camps: While some argue that grounding allows for qualitati
Externí odkaz:
http://arxiv.org/abs/2310.11938
Autor:
Zosa, Elaine, Pivovarova, Lidia
This paper presents M3L-Contrast -- a novel multimodal multilingual (M3L) neural topic model for comparable data that maps texts from multiple languages and images into a shared topic space. Our model is trained jointly on texts and images and takes
Externí odkaz:
http://arxiv.org/abs/2211.08057
Moderation of reader comments is a significant problem for online news platforms. Here, we experiment with models for automatic moderation, using a dataset of comments from a popular Croatian newspaper. Our analysis shows that while comments that vio
Externí odkaz:
http://arxiv.org/abs/2109.10033
This paper addresses methodological issues in diachronic data analysis for historical research. We apply two families of topic models (LDA and DTM) on a relatively large set of historical newspapers, with the aim of capturing and understanding discou
Externí odkaz:
http://arxiv.org/abs/2011.10428
Publikováno v:
WWW 20 Companion Proceedings of the Web Conference 2020 (April 2020) p. 343-349
The way the words are used evolves through time, mirroring cultural or technological evolution of society. Semantic change detection is the task of detecting and analysing word evolution in textual data, even in short periods of time. In this paper w
Externí odkaz:
http://arxiv.org/abs/2001.06629
Autor:
Zosa, Elaine
The news is a detailed record of events, issues, and opinions published daily in every country around the world. In addition to daily news content, many national libraries are digitising their historical newspaper collections. This wealth of material
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=od______1593::9df38c444becd403ac3aa4afb6b5eb49
http://hdl.handle.net/10138/353011
http://hdl.handle.net/10138/353011
Publikováno v:
International Conference on Asian Digital Libraries (ICADL)
Towards Open and Trustworthy Digital Societies. 23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021, Virtual Event, December 1–3, 2021, Proceedings
Hao-Ren Ke; Chei Sian Lee; Kazunari Sugiyama. Towards Open and Trustworthy Digital Societies. 23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021, Virtual Event, December 1–3, 2021, Proceedings, 13133, Springer, pp.392-400, 2021, Lecture Notes in Computer Science, 978-3-030-91668-8. ⟨10.1007/978-3-030-91669-5_30⟩
Lecture Notes in Computer Science
Lecture Notes in Computer Science-Towards Open and Trustworthy Digital Societies
Lecture Notes in Computer Science ISBN: 9783030916688
Towards Open and Trustworthy Digital Societies. 23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021, Virtual Event, December 1–3, 2021, Proceedings
Hao-Ren Ke; Chei Sian Lee; Kazunari Sugiyama. Towards Open and Trustworthy Digital Societies. 23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021, Virtual Event, December 1–3, 2021, Proceedings, 13133, Springer, pp.392-400, 2021, Lecture Notes in Computer Science, 978-3-030-91668-8. ⟨10.1007/978-3-030-91669-5_30⟩
Lecture Notes in Computer Science
Lecture Notes in Computer Science-Towards Open and Trustworthy Digital Societies
Lecture Notes in Computer Science ISBN: 9783030916688
International audience; Unsupervised topic models such as Latent Dirichlet Allocation (LDA) are popular tools to analyse digitised corpora. However, the performance of these tools have been shown to degrade with OCR noise. Topic models that incorpora
Newspapers have been a rich source of information for historians for the past hundred years or so. In the past twenty years, digitization of newspapers has made it possible to do simple tasks such as keyword searches or more elaborate text mining ana
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::c49231437fc85a2b210fadfc79b89189
http://hdl.handle.net/10138/314883
http://hdl.handle.net/10138/314883