Feature selection for helpfulness prediction of online product reviews: An empirical study

Autor: Jia Rong, Sandra Michalska, Hua Wang, Jiahua Du, Yanchun Zhang
Jazyk: angličtina
Rok vydání: 2019
Předmět:
Vocabulary
Computer science
Social Sciences
02 engineering and technology
Empirical Research
Empirical research
0202 electrical engineering
electronic engineering
information engineering

Feature (machine learning)
media_common
Data Management
Marketing
Grammar
Multidisciplinary
05 social sciences
Commerce
Research Assessment
Reproducibility
Semantics
Identification (information)
Helpfulness
Medicine
Algorithms
Research Article
Linguistic Morphology
Computer and Information Sciences
Science
media_common.quotation_subject
Feature extraction
Feature selection
Research and Analysis Methods
Phonology
020204 information systems
0502 economics and business
Humans
Syntax
Selection (genetic algorithm)
Lexicons
Metadata
Internet
Information retrieval
business.industry
Deep learning
Linguistics
050211 marketing
Artificial intelligence
business
Software
Zdroj: PLoS ONE
PLoS ONE, Vol 14, Iss 12, p e0226902 (2019)
ISSN: 1932-6203
Popis: Online product reviews underpin nearly all e-shopping activities. The high volume of data, as well as various online review quality, puts growing pressure on automated approaches for informative content prioritization. Despite a substantial body of literature on review helpfulness prediction, the rationale behind specific feature selection is largely under-studied. Also, the current works tend to concentrate on domain- and/or platform-dependent feature curation, lacking wider generalization. Moreover, the issue of result comparability and reproducibility occurs due to frequent data and source code unavailability. This study addresses the gaps through the most comprehensive feature identification, evaluation, and selection. To this end, the 30 most frequently used content-based features are first identified from 149 relevant research papers and grouped into five coherent categories. The features are then selected to perform helpfulness prediction on six domains of the largest publicly available Amazon 5-core dataset. Three scenarios for feature selection are considered: (i) individual features, (ii) features within each category, and (iii) all features. Empirical results demonstrate that semantics plays a dominant role in predicting informative reviews, followed by sentiment, and other features. Finally, feature combination patterns and selection guidelines across domains are summarized to enhance customer experience in today's prevalent e-commerce environment. The computational framework for helpfulness prediction used in the study have been released to facilitate result comparability and reproducibility.
Databáze: OpenAIRE
Nepřihlášeným uživatelům se plný text nezobrazuje