Data-driven spell checking: The synergy of two algorithms for spelling error detection and correction
Autor: | Asanka Wasala, Ruvan Weerasinghe, Eranga Jayalatharachchi |
---|---|
Rok vydání: | 2012 |
Předmět: |
business.industry
Computer science Group method of data handling media_common.quotation_subject Spell Rule-based system computer.software_genre Spelling Data-driven Quality (business) Edit distance Artificial intelligence Error detection and correction business computer Natural language processing media_common |
Zdroj: | International Conference on Advances in ICT for Emerging Regions (ICTer2012). |
DOI: | 10.1109/icter.2012.6422063 |
Popis: | Sinhala, the majority language of Sri Lanka, is still in its infancy with respect to natural language processing research and applications. Spell checking is an important application which has received inadequate attention. One of the major issues with implementing a Sinhala spell checker is the deficiency of resources such as morphological analyzers, tagged corpora and comprehensive lexica. Due to the richness of Sinhala morphology, using an entirely rule based approach is deficient. An interesting alternative is to use data-driven approaches. This research attempts to improve the quality of Subasa, an existing n-gram based data driven spell checker using minimum edit distance techniques and to make the system freely available online. Our empirical results show that the proposed design improvements succeeded in improving the spell checking coverage. In addition, we also compare the performance of this system with others in the literature. |
Databáze: | OpenAIRE |
Externí odkaz: |