Popis: |
Product reviews describe customer opinions and experiences to products. Better opinions and experiences in the reviews more attract and help people who want to buy the products. The reviews, including such factors, are called helpful reviews. Many studies have been conducted to detect helpful reviews and proposed many useful factors, such as review-related factors, product-related factors, and reviewer-related factors. Meanwhile, the elapsed time of reviews has been used as a factor in detecting helpful reviews but never considered as sampling methods, despite that it is an essential factor to determine the freshness of the reviews, which influence the people being interested in the product. In this paper, we propose time-based sampling methods, which determine the sample size as small as possible in detecting helpful reviews with high accuracy. To investigate the effect of the time-based sampling methods in detecting helpful reviews, we conducted extensive experiments comparing with total sampling and simple random sampling, using two machine learning methods: XGBoost and CNN which involve text and numerical factors. Experimental results illustrate the validity of the proposed methods. Significantly, in large datasets, our proposed sampling methods outperform the other sampling methods. |