CLUSTER ANALYSIS FOR DATABASES TYPOLOGIZATION CHARACTERISTICS

Autor: Ya. M. Uzakov, Irina M. Chernukha, Marina A. Nikitina, D. E. Nurmukhanbetova
Rok vydání: 2021
Předmět:
Zdroj: Series of Geology and Technical Sciences. 2:114-121
ISSN: 2224-5278
DOI: 10.32014/2021.2518-170x.42
Popis: The article deals with basic concepts of cluster analysis and data clustering. The authors give brief information on the history of cluster analysis and its first applications. The article gives the classification of methods by the way of data processing and analysis in cluster analysis. The detailed description of the popular, non- hierarchical K-means algorithm is given. When developing databases, their structure should provide for the division of products into clusters based on various characteristics. It is necessary to consider the division into clusters based on other characteristics, such as allergenicity (whether the product contains an allergic component or not) or carbohydrate content (important for diabetics). The content of protein, potassium and phosphates should be taken into account when developing diets for those suffering from kidney diseases. The presence of specific amino acids - for metabolic diseases, etc. In this way, food composition data and product clustering across different categories allow nutritionists to create interchangeable lists of meals with portion sizes, or lists of permitted and prohibited food products in terms of various diseases. The authors give the clustering of the database fragment of chemical composition of food products on the example of cottage cheese products and confectionary by one of the signs – the content of carbohydrates – in the R software environment by k-means. Food clusters based on carbohydrate content are very important in shaping the diet for diabetics. A visual gradation of products into clusters is demonstrated in the form of a dendrogram showing the degree of proximity of individual clusters. The resulting dendrogram contains 5 clusters. Cluster 4 includes the largest number of products (170 items) with an average carbohydrate content of 1.8 g with a variation range from 0 to 7.1 g. Food products and dishes that fall into this cluster are the least dangerous for people with diabetes. Cluster 5 includes only 8 products with a distribution of carbohydrates within the cluster from 62.60 to 80.40 g. This category of food should be excluded when preparing a diet for people with diabetes.
Databáze: OpenAIRE