Curating and extending data for language comparison in Concepticon and NoRaRe

Autor: Tjuka, Annika, Forkel, Robert, List, Johann-Mattis
Rok vydání: 2022
Předmět:
Zdroj: Open Research Europe
ISSN: 2732-5121
DOI: 10.12688/openreseurope.15380.1
Popis: Over the past decade, there have been several attempts to standardize cross-linguistic datasets. Since language comparison is a notoriously difficult endeavor, it requires tools that facilitate standardization and are convenient to use. The Concepticon is based on a toolkit provided for cross-linguistic comparison and offers a reference catalog for comparable concepts that appear in concept lists. While curating the Concepticon, we found that a variety of studies in distinct research fields collected information on word properties. However, until recently, no resource existed that contained these data to enable the comparison of the different word properties across languages. This gap was filled by the Database of Norms, Ratings, and Relations (NoRaRe), which is an extension of the Concepticon. Here, we present the major release of both resources - Concepticon Version 3.0 and NoRaRe Version 1.0 - which represents an important step in our data development. We show that extending and adapting the data curation workflow in Concepticon to NoRaRe is useful for the standardization of cross-linguistic datasets. In addition, combining datasets from different research fields enables studies grounded in language comparison. Concepticon and NoRaRe include lexical data for various languages, tools for test-driven data curation, and the possibility for data reuse. The first major release of NoRaRe is also accompanied by a new web application that allows convenient access to the data.
Databáze: OpenAIRE