Data Clustering on Taiwan Crop Sales under Hadoop Platform
Autor: | Mohammad Riza Nurtam, 牛仁正 |
---|---|
Rok vydání: | 2013 |
Druh dokumentu: | 學位論文 ; thesis |
Popis: | 101 Hadoop is one of the most promising cloud computing platforms to execute a Big Data analytics task which is a process of discovering hidden patterns, unknown correlations, and other valuable information from an extremely large distributed dataset. In this thesis, a data clustering learning was implemented under Hadoop platform to study a large crop sales dataset collected distributedly in Taiwan. Hadoop infrastructure was established to give access of the distributed data centers. An online clustering algorithm utilizing Mahout, a scalable machine learning library, was performed to analyze crop price and yield data from the distributed datasets. This clustering analysis is usually exhausting and time consuming if a single machine is in charge of the whole process. Therefore, in this research, the clustering jobs were handled under an experimental distributed Hadoop environment. The experimental result shows the price and sale volume can grouped by couple clusters. The result can be used on the decision making of crop planning by forecasting or detecting demand changes in the market as early as possible. |
Databáze: | Networked Digital Library of Theses & Dissertations |
Externí odkaz: |