Product Classification Using Partially Abbreviated Product Names, Brands and Dimensions

Autor: Rolf Krieger, Christian Schorr, Oliver Allweyer, Andreas Mohr
Rok vydání: 2021
Předmět:
Zdroj: Communications in Computer and Information Science ISBN: 9783030830137
DATA (Revised Selected Papers)
DOI: 10.1007/978-3-030-83014-4_11
Popis: Retail companies are looking for ways to support or automate the data entry process for product master data which is currently often a time-consuming and cost-intensive manual process. The basis which many attributes and business processes in master data management depend on is the classification of articles to a certain product category. In this paper we propose a machine learning approach to classify articles according to the Global Product Classification schema by their product name, brand name and other attributes such as the product weight and dimensions. One of the challenges in our data set is posed by the product names containing a significant amount of abbreviations, for which we implement several preprocessing strategies. Additionally, the data set suffers from class imbalance and missing values that must be considered. Different classification algorithms, data imputation methods and feature combination strategies are evaluated. We show that automatic classification can be performed successfully based on the partly abbreviated product names despite the challenges mentioned. A simple Support Vector Machine model shows to outperform more sophisticated models and the brand names. The product dimensions and other additional attributes did not increase prediction quality.
Databáze: OpenAIRE