Data pre-processing for web log mining: Case study of commercial bank website usage analysis

Autor: Jozef Kapusta, Anna Pilková, Michal Munk, Peter Švec
Jazyk: angličtina
Rok vydání: 2013
Předmět:
Zdroj: Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis, Vol 61, Iss 4, Pp 973-979 (2013)
Druh dokumentu: article
ISSN: 1211-8516
2464-8310
DOI: 10.11118/actaun201361040973
Popis: We use data cleaning, integration, reduction and data conversion methods in the pre-processing level of data analysis. Data processing techniques improve the overall quality of the patterns mined. The paper describes using of standard pre-processing methods for preparing data of the commercial bank website in the form of the log file obtained from the web server. Data cleaning, as the simplest step of data pre-processing, is non–trivial as the analysed content is highly specific. We had to deal with the problem of frequent changes of the content and even frequent changes of the structure. Regular changes in the structure make use of the sitemap impossible. We presented approaches how to deal with this problem. We were able to create the sitemap dynamically just based on the content of the log file. In this case study, we also examined just the one part of the website over the standard analysis of an entire website, as we did not have access to all log files for the security reason. As the result, the traditional practices had to be adapted for this special case. Analysing just the small fraction of the website resulted in the short session time of regular visitors. We were not able to use recommended methods to determine the optimal value of session time. Therefore, we proposed new methods based on outliers identification for raising the accuracy of the session length in this paper.
Databáze: Directory of Open Access Journals