Estimating the Quality of Crowdsourced Translations Based on the Characteristics of Source and Target Words and Participants
Autor: | Jong Gun Lee, Rajius Idzalika, Pamungkas Jutta, Muhammad Rizal Khaefi, George Hodge, Imaduddin Amin, Yulistina Riyadi, Zakiya Pramestri |
---|---|
Rok vydání: | 2018 |
Předmět: |
Computer science
business.industry media_common.quotation_subject Supervised learning Vernacular 02 engineering and technology computer.software_genre Filter (software) Empirical research 020204 information systems Metric (mathematics) 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Social media Quality (business) Artificial intelligence Set (psychology) business computer Natural language processing media_common |
Zdroj: | ASONAM |
DOI: | 10.1109/asonam.2018.8508319 |
Popis: | Text-based media possess a wealth of insights that can be mined to understand perceptions and actions. Researchers and public officials can use these data to inform development policy and humanitarian action. An important step in analyzing text-based databases, such as social media, is the creation of taxonomies which are used to filter information relevant to topics of interest. We worked with thousands of online volunteers to translate 2,137 keywords or phrases in English to formal or vernacular expressions in 29 different languages with the aim of understanding human responses to natural disasters, as well as developing sets of corpus on non popular languages (non English and non EU languages) which still has limited studies. In processing the data set, we faced a challenge in selecting a set of quality translations for each language. This paper aims to estimate the quality of the crowdsourced translations by non-professional translators. This paper presents an extensive empirical study using 91 features from 29 languages corpora to describe (a) translators, (b) source expressions, and (c) translated expressions. Our results show that our approach exploring two regression models and two supervised learning methods produces better results than a baseline approach with a commonly used metric, namely peer-review scores. |
Databáze: | OpenAIRE |
Externí odkaz: |