Computer-Assisted Language Comparison: State of the Art
Autor: | Nathan W. Hill, Nathanael E. Schweikhard, Mei Shin Wu, Timotheus A. Bodt, Johann-Mattis List |
---|---|
Rok vydání: | 2020 |
Předmět: |
lcsh:Language and Literature
Computer science Comparative method Computational linguistics 02 engineering and technology computer-assisted language comparison historical linguistics Hmong-Mien language family Rule-based machine translation lcsh:AZ20-999 0202 electrical engineering electronic engineering information engineering Historical linguistics Digital humanities computer.programming_language Historical Linguistics Computational Linguistics 060201 languages & linguistics Information retrieval Language and languages 06 humanities and the arts Python (programming language) lcsh:History of scholarship and learning. The humanities Southeast Asia hmong-mien language family Workflow 0602 languages and literature lcsh:P 020201 artificial intelligence & image processing Raw data computer |
Zdroj: | Journal of Open Humanities Data Journal of open humanities data Journal of Open Humanities Data; Vol 6 (2020); 2 Journal of Open Humanities Data, Vol 6, Iss 1 (2020) BASE-Bielefeld Academic Search Engine |
ISSN: | 2059-481X |
Popis: | Historical language comparison opens windows onto a human past, long before the availability of written records. Since traditional language comparison within the framework of the comparative method is largely based on manual data comparison, requiring the meticulous sifting through dictionaries, word lists, and grammars, the framework is difficult to apply, especially in times where more and more data have become available in digital form. Unfortunately, it is not possible to simply automate the process of historical language comparison, not only because computational solutions lag behind human judgments in historical linguistics, but also because they lack the flexibility that would allow them to integrate various types of information from various kinds of sources. A more promising approach is to integrate computational and classical approaches within a 'computer-assisted framework', “neither completely computer-driven nor ignorant of the assistance computers afford” [1, p. 4]. In this paper, we will illustrate what we consider the current state of the art of computer-assisted language comparison by presenting a workflow that starts with raw data and leads up to a stage where sound correspondence patterns across multiple languages have been identified and can be readily presented, inspected, and discussed. We illustrate this workflow with the help of a newly prepared dataset on Hmong-Mien languages. Our illustration is accompanied by Python code and instructions on how to use additional web-based tools we developed so that users can apply our workflow for their own purposes. |
Databáze: | OpenAIRE |
Externí odkaz: |