Generic Feature Selection Methodology to Named Entity Detection from Indian and European Languages
Autor: | S. L. Devi, C. S. Malarkodi |
---|---|
Rok vydání: | 2019 |
Předmět: |
lcsh:Computer engineering. Computer hardware
General Computer Science Computer science Feature extraction lcsh:TK7885-7895 Feature selection computer.software_genre Fuzzy logic Domain (software engineering) Named-entity recognition Electrical and Electronic Engineering Named entity detection signal processing Signal processing business.industry feature extraction ComputingMethodologies_PATTERNRECOGNITION classification fuzzy logic lcsh:Electrical engineering. Electronics. Nuclear engineering Artificial intelligence business optimization lcsh:TK1-9971 computer Natural language processing |
Zdroj: | Advances in Electrical and Computer Engineering, Vol 19, Iss 1, Pp 79-88 (2019) |
ISSN: | 1844-7600 1582-7445 |
DOI: | 10.4316/aece.2019.01011 |
Popis: | This paper describes the development of language and domain independent Named Entity Recognition (NER) system which can identify named entities from any given dataset irrespective of the language and domain. The main novelty of the present work is the generic feature selection methodology which has been applied to 7 Indian languages and 5 European languages. The generic feature selection methodology was done in two ways; first using frequency based approach; secondly k-means++ clustering algorithm was used to validate the patterns obtained in the frequency based approach. The dataset used for the experiments belongs to different genre. To the best of our knowledge we are the first to work on the development of cross-lingual Named Entity (NE) system with 12 languages belongs to different language families. We have done the 10-fold cross validation and the system output has been analyzed for all the languages and causes of error cases was discussed in the error analysis section. The performance of our system is also compared with the existing systems. |
Databáze: | OpenAIRE |
Externí odkaz: |