Automatic Extraction of Locations from News Articles Using Domain Knowledge
Autor: | Loitongbam Sanayai Meetei, Ringki Das, Thoudam Doren Singh, Sivaji Bandyopadhyay |
---|---|
Rok vydání: | 2020 |
Předmět: |
Word embedding
Computer science business.industry Digital data computer.software_genre Lingua franca language.human_language Task (project management) Semantic similarity Assamese language Domain knowledge Word2vec Artificial intelligence business computer Natural language processing computer.programming_language |
Zdroj: | Communications in Computer and Information Science ISBN: 9783030626242 |
DOI: | 10.1007/978-3-030-62625-9_4 |
Popis: | With the increasing amount of digital data, it is becoming increasingly hard to extract useful information from text data, especially for resource-constrained languages. In this work, we report the task of language-independent automatic extraction of locations from news articles using domain knowledge. The work is tested on four languages namely, English and three resource-constrained languages: Assamese, Manipuri and Mizo, the lingua francas of three neighboring North-Eastern states of India namely Assam, Manipur, and Mizoram respectively. Our architecture is based on semantic similarity between similar words based on the popular word embedding, word2vec model coupled with the domain knowledge of the aforementioned regions. The model is able to detect the best possible detailed locations. |
Databáze: | OpenAIRE |
Externí odkaz: |