Quality of Word Vectors and Its Impact on Named Entity Recognition in Czech

Autor:	Martin Süss, František Dařena
Rok vydání:	2020
Předmět:	Czech word embeddings Economics and Econometrics Czech language Computer science Process (engineering) media_common.quotation_subject computer.software_genre Management Information Systems Task (project management) Named-entity recognition Management of Technology and Innovation Preprocessor Quality (business) Business and International Management natural language processing media_common Marketing Artificial neural network business.industry Named Entity Recognition language.human_language language word vectors training Artificial intelligence business computer Finance Word (computer architecture) Natural language processing
Popis:	Named Entity Recognition (NER) focuses on finding named entities in text and classifying them into one of the entity types. Modern state-of-the-art NER approaches avoid using hand-crafted features and rely on feature-inferring neural network systems based on word embeddings. The paper analyzes the impact of different aspects related to word embeddings on the process and results of the named entity recognition task in Czech, which has not been investigated so far. Various aspects of word vectors preparation were experimentally examined to draw useful conclusions. The suitable settings in different steps were determined, including the used corpus, number of word vectors dimensions, used text preprocessing techniques, context window size, number of training epochs, and word vectors inferring algorithms and their specific parameters. The paper demonstrates that focusing on the process of word vectors preparation can bring a significant improvement for NER in Czech even without using additional language independent and dependent resources. info:eu-repo/semantics/openAccess
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::8756b40bc5160cd9f2862a37cad32de2 https://repozitar.mendelu.cz/xmlui/handle/20.500.12698/1545 Zobrazit plný text záznamu