Zobrazeno 1 - 10
of 114
pro vyhledávání: '"Lindén, Krister"'
This chapter discusses the use of digital tools—in particular, language technology—to study the history of emotions. There are a growing number of annotated text corpora for ancient languages large enough to benefit from computational analysis. T
Externí odkaz:
https://library.oapen.org/handle/20.500.12657/59177
Autor:
Moisio, Anssi, Porjazovski, Dejan, Rouhe, Aku, Getman, Yaroslav, Virkkunen, Anja, Grósz, Tamás, Lindén, Krister, Kurimo, Mikko
The Donate Speech campaign has so far succeeded in gathering approximately 3600 hours of ordinary, colloquial Finnish speech into the Lahjoita puhetta (Donate Speech) corpus. The corpus includes over twenty thousand speakers from all the regions of F
Externí odkaz:
http://arxiv.org/abs/2203.12906
Sentiment analysis and opinion mining is an important task with obvious application areas in social media, e.g. when indicating hate speech and fake news. In our survey of previous work, we note that there is no large-scale social media data set with
Externí odkaz:
http://arxiv.org/abs/2012.02613
This article introduces the Wanca 2017 corpus of texts crawled from the internet from which the sentences in rare Uralic languages for the use of the Uralic Language Identification (ULI) 2020 shared task were collected. We describe the ULI dataset an
Externí odkaz:
http://arxiv.org/abs/2008.12169
Autor:
Rehm, Georg, Marheinecke, Katrin, Hegele, Stefanie, Piperidis, Stelios, Bontcheva, Kalina, Hajič, Jan, Choukri, Khalid, Vasiļjevs, Andrejs, Backfried, Gerhard, Prinz, Christoph, Pérez, José Manuel Gómez, Meertens, Luc, Lukowicz, Paul, van Genabith, Josef, Lösch, Andrea, Slusallek, Philipp, Irgens, Morten, Gatellier, Patrick, Köhler, Joachim, Bars, Laure Le, Anastasiou, Dimitra, Auksoriūtė, Albina, Bel, Núria, Branco, António, Budin, Gerhard, Daelemans, Walter, De Smedt, Koenraad, Garabík, Radovan, Gavriilidou, Maria, Gromann, Dagmar, Koeva, Svetla, Krek, Simon, Krstev, Cvetana, Lindén, Krister, Magnini, Bernardo, Odijk, Jan, Ogrodniczuk, Maciej, Rögnvaldsson, Eiríkur, Rosner, Mike, Pedersen, Bolette Sandford, Skadiņa, Inguna, Tadić, Marko, Tufiş, Dan, Váradi, Tamás, Vider, Kadri, Way, Andy, Yvon, François
Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality. However, language barriers impacting business, cross-lingual and cross-cultural communication are still omnipresent. La
Externí odkaz:
http://arxiv.org/abs/2003.13833
We present a corpus of Finnish news articles with a manually prepared named entity annotation. The corpus consists of 953 articles (193,742 word tokens) with six named entity classes (organization, location, person, product, event, and date). The art
Externí odkaz:
http://arxiv.org/abs/1908.04212
This article describes an unsupervised language model adaptation approach that can be used to enhance the performance of language identification methods. The approach is applied to a current version of the HeLI language identification method, which i
Externí odkaz:
http://arxiv.org/abs/1903.10915
This article introduces a corpus of cuneiform texts from which the dataset for the use of the Cuneiform Language Identification (CLI) 2019 shared task was derived as well as some preliminary language identification experiments conducted using that co
Externí odkaz:
http://arxiv.org/abs/1903.01891
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Language identification (LI) is the problem of determining the natural language that a document or part thereof is written in. Automatic LI has been extensively researched for over fifty years. Today, LI is a key part of many text processing pipeline
Externí odkaz:
http://arxiv.org/abs/1804.08186