Výsledky vyhledávání - "Lindén, Krister"

Elektronická kniha

Chapter 3 Digital Approaches to Analyzing and Translating Emotion

Autor: Alstola, Tero, Jauhiainen, Heidi, Svärd, Saana, Sahala, Aleksi, Lindén, Krister

This chapter discusses the use of digital tools—in particular, language technology—to study the history of emotions. There are a growing number of annotated text corpora for ancient languages large enough to benefit from computational analysis. T

Externí odkaz: https://library.oapen.org/handle/20.500.12657/59177

Zobrazit plný text záznamu

Report

Lahjoita puhetta -- a large-scale corpus of spoken Finnish with some benchmarks

Autor: Moisio, Anssi, Porjazovski, Dejan, Rouhe, Aku, Getman, Yaroslav, Virkkunen, Anja, Grósz, Tamás, Lindén, Krister, Kurimo, Mikko

The Donate Speech campaign has so far succeeded in gathering approximately 3600 hours of ordinary, colloquial Finnish speech into the Lahjoita puhetta (Donate Speech) corpus. The corpus includes over twenty thousand speakers from all the regions of F

Externí odkaz: http://arxiv.org/abs/2203.12906

Zobrazit plný text záznamu

Report

FinnSentiment -- A Finnish Social Media Corpus for Sentiment Polarity Annotation

Autor: Lindén, Krister, Jauhiainen, Tommi, Hardwick, Sam

Sentiment analysis and opinion mining is an important task with obvious application areas in social media, e.g. when indicating hate speech and fake news. In our survey of previous work, we note that there is no large-scale social media data set with

Externí odkaz: http://arxiv.org/abs/2012.02613

Zobrazit plný text záznamu

Report

Uralic Language Identification (ULI) 2020 shared task dataset and the Wanca 2017 corpus

Autor: Jauhiainen, Tommi, Jauhiainen, Heidi, Partanen, Niko, Lindén, Krister

This article introduces the Wanca 2017 corpus of texts crawled from the internet from which the sentences in rare Uralic languages for the use of the Uralic Language Identification (ULI) 2020 shared task were collected. We describe the ULI dataset an

Externí odkaz: http://arxiv.org/abs/2008.12169

Zobrazit plný text záznamu

Report

The European Language Technology Landscape in 2020: Language-Centric and Human-Centric AI for Cross-Cultural Communication in Multilingual Europe

Autor: Rehm, Georg, Marheinecke, Katrin, Hegele, Stefanie, Piperidis, Stelios, Bontcheva, Kalina, Hajič, Jan, Choukri, Khalid, Vasiļjevs, Andrejs, Backfried, Gerhard, Prinz, Christoph, Pérez, José Manuel Gómez, Meertens, Luc, Lukowicz, Paul, van Genabith, Josef, Lösch, Andrea, Slusallek, Philipp, Irgens, Morten, Gatellier, Patrick, Köhler, Joachim, Bars, Laure Le, Anastasiou, Dimitra, Auksoriūtė, Albina, Bel, Núria, Branco, António, Budin, Gerhard, Daelemans, Walter, De Smedt, Koenraad, Garabík, Radovan, Gavriilidou, Maria, Gromann, Dagmar, Koeva, Svetla, Krek, Simon, Krstev, Cvetana, Lindén, Krister, Magnini, Bernardo, Odijk, Jan, Ogrodniczuk, Maciej, Rögnvaldsson, Eiríkur, Rosner, Mike, Pedersen, Bolette Sandford, Skadiņa, Inguna, Tadić, Marko, Tufiş, Dan, Váradi, Tamás, Vider, Kadri, Way, Andy, Yvon, François

Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality. However, language barriers impacting business, cross-lingual and cross-cultural communication are still omnipresent. La

Externí odkaz: http://arxiv.org/abs/2003.13833

Zobrazit plný text záznamu

Report

A Finnish News Corpus for Named Entity Recognition

Autor: Ruokolainen, Teemu, Kauppinen, Pekka, Silfverberg, Miikka, Lindén, Krister

We present a corpus of Finnish news articles with a manually prepared named entity annotation. The corpus consists of 953 articles (193,742 word tokens) with six named entity classes (organization, location, person, product, event, and date). The art

Externí odkaz: http://arxiv.org/abs/1908.04212

Zobrazit plný text záznamu

Report

Language Model Adaptation for Language and Dialect Identification of Text

Autor: Jauhiainen, Tommi, Lindén, Krister, Jauhiainen, Heidi

This article describes an unsupervised language model adaptation approach that can be used to enhance the performance of language identification methods. The approach is applied to a current version of the HeLI language identification method, which i

Externí odkaz: http://arxiv.org/abs/1903.10915

Zobrazit plný text záznamu

Report

Language and Dialect Identification of Cuneiform Texts

Autor: Jauhiainen, Tommi, Jauhiainen, Heidi, Alstola, Tero, Lindén, Krister

This article introduces a corpus of cuneiform texts from which the dataset for the use of the Cuneiform Language Identification (CLI) 2019 shared task was derived as well as some preliminary language identification experiments conducted using that co

Externí odkaz: http://arxiv.org/abs/1903.01891

Zobrazit plný text záznamu

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Report

Automatic Language Identification in Texts: A Survey

Autor: Jauhiainen, Tommi, Lui, Marco, Zampieri, Marcos, Baldwin, Timothy, Lindén, Krister

Language identification (LI) is the problem of determining the natural language that a document or part thereof is written in. Automatic LI has been extensively researched for over fifty years. Today, LI is a key part of many text processing pipeline

Externí odkaz: http://arxiv.org/abs/1804.08186

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání