Popis: |
With the huge volume of data available as input, modern-day statistical analysis leverages clustering techniques to limit the volume of data to be processed. These input data mainly sourced from social media channels and typically have high dimensions due to the diverse features it represents. This is normally referred to as the curse of dimensionality as it makes the clustering process highly computational intensive and less efficient. Dimensionality reduction techniques are proposed as a solution to address this issue. This paper covers an empirical analysis done on the impact of applying dimensionality reduction during the data transformation phase of the clustering process. We measured the impacts in terms of clustering quality and clustering performance for three most common clustering algorithms k-means clustering, clustering large applications (CLARA), and agglomerative hierarchical clustering (AGNES). The clustering quality is compared by using four internal evaluation criteria, namely Silhouette index, Dunn index, Calinski-Harabasz index, and Davies-Bouldin index, and average execution time is verified as a measure of clustering performance. |