Discovering Lexical Similarity Using Articulatory Feature-Based Phonetic Edit Distance

Autor:	Muhammad Yaseen Khan, Alessandro BOGLIOLO, Tafseer Ahmed, Muhammd Suffian
Jazyk:	angličtina
Rok vydání:	2022
Předmět:	FOS: Computer and information sciences Computer Science - Computation and Language General Computer Science cognates General Engineering edit distance TK1-9971 computational linguistics General Materials Science lexical similarity Electrical engineering. Electronics. Nuclear engineering Electrical and Electronic Engineering natural language processing Computation and Language (cs.CL) Articulatory features
Zdroj:	IEEE Access, Vol 10, Pp 1533-1544 (2022)
ISSN:	2169-3536
Popis:	Lexical Similarity (LS) between two languages uncovers many interesting linguistic insights such as phylogenetic relationship, mutual intelligibility, common etymology, and loan words. There are various methods through which LS is evaluated. This paper presents a method of Phonetic Edit Distance (PED) that uses a soft comparison of letters using the articulatory features associated with their International Phonetic Alphabet (IPA) transcription. In particular, the comparison between the articulatory features of two letters taken from words belonging to different languages is used to compute the cost of replacement in the inner loop of edit distance computation. As an example, PED gives edit distance of 0.82 between German word ‘vater’ ([fa:tər]) and Persian word ‘ ’ ([pedær]), meaning ‘father,’ and, similarly, PED of 0.93 between Hebrew word ‘ ’ ([ʃəɭam]) and Arabic word ‘ ’ ([səɭa:m], meaning ‘peace,’ whereas classical edit distances would be 4 and 2, respectively. We report the results of systematic experiments conducted on six languages: Arabic, Hindi, Marathi, Persian, Sanskrit, and Urdu. Universal Dependencies (UD) corpora were used to restrict the comparison to lists of words belonging to the same part of speech. The LS based on the average PED between pair of words was then computed for each pair of languages, unveiling similarities otherwise masked by the adoption of different alphabets, grammars, and pronunciations rules.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::3e0170b76d553472a5e8695c5f85dc85 https://ieeexplore.ieee.org/document/9662078/ Zobrazit plný text záznamu