Data-driven spell checking: The synergy of two algorithms for spelling error detection and correction

Autor:	Asanka Wasala, Ruvan Weerasinghe, Eranga Jayalatharachchi
Rok vydání:	2012
Předmět:	business.industry Computer science Group method of data handling media_common.quotation_subject Spell Rule-based system computer.software_genre Spelling Data-driven Quality (business) Edit distance Artificial intelligence Error detection and correction business computer Natural language processing media_common
Zdroj:	International Conference on Advances in ICT for Emerging Regions (ICTer2012).
DOI:	10.1109/icter.2012.6422063
Popis:	Sinhala, the majority language of Sri Lanka, is still in its infancy with respect to natural language processing research and applications. Spell checking is an important application which has received inadequate attention. One of the major issues with implementing a Sinhala spell checker is the deficiency of resources such as morphological analyzers, tagged corpora and comprehensive lexica. Due to the richness of Sinhala morphology, using an entirely rule based approach is deficient. An interesting alternative is to use data-driven approaches. This research attempts to improve the quality of Subasa, an existing n-gram based data driven spell checker using minimum edit distance techniques and to make the system freely available online. Our empirical results show that the proposed design improvements succeeded in improving the spell checking coverage. In addition, we also compare the performance of this system with others in the literature.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::c0e53ad14d73523e485ebee2d4268f8d https://doi.org/10.1109/icter.2012.6422063 Zobrazit plný text záznamu