Optimizing Natural Language Processing Pipelines: Opinion Mining Case Study

Autor: Yudivián Almeida-Cruz, Andrés Montoyo, Yoan Gutiérrez, Suilan Estevez-Velarde
Rok vydání: 2019
Předmět:
Zdroj: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications ISBN: 9783030339036
CIARP
DOI: 10.1007/978-3-030-33904-3_15
Popis: This research presents NLP-Opt, an Auto-ML technique for optimizing pipelines of machine learning algorithms that can be applied to different Natural Language Processing tasks. The process of selecting the algorithms and their parameters is modelled as an optimization problem and a technique was proposed to find an optimal combination based on the metaheuristic Population-Based Incremental Learning (PBIL). For validation purposes, this approach is applied to a standard opinion mining problem. NLP-Opt effectively optimizes the algorithms and parameters of pipelines. Additionally, NLP-Opt outputs probabilistic information about the optimization process, revealing the most relevant components of pipelines. The proposed technique can be applied to different Natural Language Processing problems, and the information provided by NLP-Opt can be used by researchers to gain insights on the characteristics of the best-performing pipelines. The source code is made available for other researchers. In contrast with other Auto-ML approaches, NLP-Opt provides a flexible mechanism for designing generic pipelines that can be applied to NLP problems. Furthermore, the use of the probabilistic model provides a more comprehensive approach to the Auto-ML problem that enriches researcher understanding of the possible solutions.
Databáze: OpenAIRE