Automatic Extraction of Locations from News Articles Using Domain Knowledge

Autor:	Loitongbam Sanayai Meetei, Ringki Das, Thoudam Doren Singh, Sivaji Bandyopadhyay
Rok vydání:	2020
Předmět:	Word embedding Computer science business.industry Digital data computer.software_genre Lingua franca language.human_language Task (project management) Semantic similarity Assamese language Domain knowledge Word2vec Artificial intelligence business computer Natural language processing computer.programming_language
Zdroj:	Communications in Computer and Information Science ISBN: 9783030626242
DOI:	10.1007/978-3-030-62625-9_4
Popis:	With the increasing amount of digital data, it is becoming increasingly hard to extract useful information from text data, especially for resource-constrained languages. In this work, we report the task of language-independent automatic extraction of locations from news articles using domain knowledge. The work is tested on four languages namely, English and three resource-constrained languages: Assamese, Manipuri and Mizo, the lingua francas of three neighboring North-Eastern states of India namely Assam, Manipur, and Mizoram respectively. Our architecture is based on semantic similarity between similar words based on the popular word embedding, word2vec model coupled with the domain knowledge of the aforementioned regions. The model is able to detect the best possible detailed locations.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::cada6ca129f3cede6f1cd61cb1d8587b https://doi.org/10.1007/978-3-030-62625-9_4 Zobrazit plný text záznamu