Scalable Data Analysis Application to Web Usage Data

Autor: Hocine Chebi
Rok vydání: 2021
Předmět:
Zdroj: Multimedia and Sensory Input for Augmented, Mixed, and Virtual Reality
DOI: 10.4018/978-1-7998-4703-8.ch014
Popis: The number of hits to web pages continues to grow. The web has become one of the most popular platforms for disseminating and retrieving information. Consequently, many website operators are encouraged to analyze the use of their sites in order to improve their response to the expectations of internet users. However, the way a website is visited can change depending on a variety of factors. Usage models must therefore be continuously updated in order to accurately reflect visitor behavior. This remains difficult when the time dimension is neglected or simply introduced as an additional numeric attribute in the description of the data. Data mining is defined as the application of data analysis and discovery algorithms on large databases with the goal of discovering non-trivial models. Several algorithms have been proposed in order to formalize the new models discovered, to build more efficient models, to process new types of data, and to measure the differences between the data sets. However, the most traditional algorithms of data mining assume that the models are static and do not take into account the possible evolution of these models over time. These considerations have motivated significant efforts in the analysis of temporal data as well as the adaptation of static data mining methods to data that evolves over time. The review of the main aspects of data mining dealt with in this thesis constitutes the body of this chapter, followed by a state of the art of current work in this field as well as a discussion of the major issues that exist there. Interest in temporal databases has increased considerably in recent years, for example in the fields of finance, telecommunications, surveillance, etc. A growing number of prototypes and systems are being implemented to take into account the time dimension of data explicitly, for example to study the variability over time of analysis results. To model an application, it is necessary to choose a common language, precise and known by all members of a team. UML (unified modeling language, in English, or unified modeling language, in French) is an object-oriented modeling language standardized by the OMG. This chapter aims to present the modeling with the diagrams of packages and classes built using UML. This chapter presents the conceptual model of the data, and finally, the authors specify the SQL queries used for the extraction of descriptive statistical variables of the navigations from a warehouse containing the preprocessed usage data.
Databáze: OpenAIRE