Optimizing performance of sentiment analysis through design of experiments

Autor: Gary S. W. Goh, Andy J. L. Ang, Allan N. Zhang
Rok vydání: 2016
Předmět:
Zdroj: IEEE BigData
DOI: 10.1109/bigdata.2016.7841042
Popis: Traditional manual design of analytical processes is challenging as it requires a general analyst to have good grasping of numerous algorithms and the interaction effects between each technique and the data across multiple domains. Especially in an increasingly high data variety/multi-domain environment today, this design process can be very laborious/challenging. In this paper, we describe a design optimization approach using design of experiments to determine a suitable design in a standardized text classification process with high classification performance. We focus on sentiment analysis as a use case for this approach, as standard analytical methods in each phase of the sentiment analysis process have been established; from data pre-processing, feature selection and classification. In our proposed approach, we present an automatic and domain-free technique of using design of experiments to this design process, with the sentiment classification evaluation metrics as the performance criteria for optimization. In addition, we show that several interpretable analyses can be made to better understand the complex interaction effects of various analytical techniques with the data, which then can guide a general analyst to select more appropriate process design parameters for better text classification performance.
Databáze: OpenAIRE