Popis: |
Wines with a clear geographical origin are an issue of interest for consumers and food industries. This paper presents a data mining study of Merlot wines from South America to identify the fingerprint of their geographical origin. A group of samples from Argentina (n = 17), Brazil (n = 12), Chile (n = 48), and Uruguay (n = 6) was analyzed. Twenty chemical compounds were determined by high-performance liquid chromatography (HPLC). These compounds include antioxidant activity, total polyphenols, total anthocyanins, individual anthocyanins and color. Four binary classification problems were performed (Brazil versus non-Brazil, Argentina versus non-Argentina, Chile versus non-Chile, and Uruguay versus non-Uruguay) to investigate the geographic characteristics of each country. Through the evaluation of binary classifications in our dataset it was possible to identify the main variables (chemical compounds) that discriminate between the countries. We used the following algorithms: Synthetic Minority over-sample Technique and under-sampling to balance the dataset of each classification approach, the Relief algorithm to obtain a variable importance ranking and the classifiers Support Vector Machines, Multilayer Perceptron and Radial Basis Function Network with dynamic decay adjustment. SVM model obtained the highest performance measures among the classifiers for each dataset (93.73% of accuracy for the Brazil versus non-Brazil, 91.18% for the Argentina versus non-Argentina, 79.16% for the Chile versus non-Chile, and 91.67% for the Uruguay versus non-Uruguay classification). These accuracies were achieved by the search of the possible variable subsets according to Relief for each classification approach. We found that some variables, such as DPPH, wine color and individual anthocyanins, are among the most important variables in the characterization of Merlot wines. Keywords: Support Vector Machine, Multilayer Perceptron, Anthocyanins, Feature selection, Merlot wines, South America wines |