Automatic Extraction of Locations from News Articles Using Domain Knowledge

Autor: Loitongbam Sanayai Meetei, Ringki Das, Thoudam Doren Singh, Sivaji Bandyopadhyay
Rok vydání: 2020
Předmět:
Zdroj: Communications in Computer and Information Science ISBN: 9783030626242
DOI: 10.1007/978-3-030-62625-9_4
Popis: With the increasing amount of digital data, it is becoming increasingly hard to extract useful information from text data, especially for resource-constrained languages. In this work, we report the task of language-independent automatic extraction of locations from news articles using domain knowledge. The work is tested on four languages namely, English and three resource-constrained languages: Assamese, Manipuri and Mizo, the lingua francas of three neighboring North-Eastern states of India namely Assam, Manipur, and Mizoram respectively. Our architecture is based on semantic similarity between similar words based on the popular word embedding, word2vec model coupled with the domain knowledge of the aforementioned regions. The model is able to detect the best possible detailed locations.
Databáze: OpenAIRE