Discovering Lexical Similarity Using Articulatory Feature-Based Phonetic Edit Distance
Autor: | Muhammad Yaseen Khan, Alessandro BOGLIOLO, Tafseer Ahmed, Muhammd Suffian |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2022 |
Předmět: |
FOS: Computer and information sciences
Computer Science - Computation and Language General Computer Science cognates General Engineering edit distance TK1-9971 computational linguistics General Materials Science lexical similarity Electrical engineering. Electronics. Nuclear engineering Electrical and Electronic Engineering natural language processing Computation and Language (cs.CL) Articulatory features |
Zdroj: | IEEE Access, Vol 10, Pp 1533-1544 (2022) |
ISSN: | 2169-3536 |
Popis: | Lexical Similarity (LS) between two languages uncovers many interesting linguistic insights such as phylogenetic relationship, mutual intelligibility, common etymology, and loan words. There are various methods through which LS is evaluated. This paper presents a method of Phonetic Edit Distance (PED) that uses a soft comparison of letters using the articulatory features associated with their International Phonetic Alphabet (IPA) transcription. In particular, the comparison between the articulatory features of two letters taken from words belonging to different languages is used to compute the cost of replacement in the inner loop of edit distance computation. As an example, PED gives edit distance of 0.82 between German word ‘vater’ ([fa:tər]) and Persian word ‘ ’ ([pedær]), meaning ‘father,’ and, similarly, PED of 0.93 between Hebrew word ‘ ’ ([ʃəɭam]) and Arabic word ‘ ’ ([səɭa:m], meaning ‘peace,’ whereas classical edit distances would be 4 and 2, respectively. We report the results of systematic experiments conducted on six languages: Arabic, Hindi, Marathi, Persian, Sanskrit, and Urdu. Universal Dependencies (UD) corpora were used to restrict the comparison to lists of words belonging to the same part of speech. The LS based on the average PED between pair of words was then computed for each pair of languages, unveiling similarities otherwise masked by the adoption of different alphabets, grammars, and pronunciations rules. |
Databáze: | OpenAIRE |
Externí odkaz: |