Evaluation of a Web Mining Application Based on Cloud Computing Architecture

Autor: TANG,CHIA-CHUN, 唐嘉駿
Rok vydání: 2016
Druh dokumentu: 學位論文 ; thesis
Popis: 105
With the improvement of cloud computing and big data technologies, data collection, storage, and analysis have become more and more efficient. This research is based on the use of cloud big data infrastructure to decrease the time cost of web crawling and link analyzing. We refine a Wikipedia-based topic map constructing time (i.e., a visualization tool of WikiMap+), which was developed by our laboratory. The main research question involved in making a Wikipedia-based topic map is the amount of time required to conduct a link analysis. Link analysis entails an evaluation of relationships between articles; exponential growth occurs when the system follows the links deeper and deeper to analyze the relationships between articles. It also takes time to do future semantic analysis between articles. Thus, we simulate the cloud environment with several virtual machines of computers. We adopt a Hadoop cloud platform to write MapReduce programs that tackle the efficiency problem of link mining, (i.e., in-links, out-links and co-citation links analysis, and semantic analysis). We aim to generate a dynamic topic map tool in real time and help users search in Wikipedia and accomplish tasks easily. We will compare the time cost among a single machine, Apache Spark, Hadoop MapReduce cloud platform running on several machines. The research results will provide a reference for big data research in Web mining by adopting a Hadoop cloud platform.
Databáze: Networked Digital Library of Theses & Dissertations
načítá se...