Popis: |
Data mining on web log files is called Web Usage Mining (WUM). User clustering based on access patterns is an important part of WUM. Most papers just consider web pages hits, but ignore the succession of pages during user clustering. Therefore, a new user similarity measurement method is put forward in this paper, which takes not only web page hits but also the succession of pages into account. And at the same time, a new clustering algorithm named DBSCAN&Chameleon based on DBSCAN and Chameleon is introduced in this paper. Finally, experiments show that the clustering quality of this algorithm is much higher than DBSCAN and Chameleon. |