Computing a Similarity Coefficient for Mining Massive Data Sets

Autor: Mihai Gabroveanu, Adriana Sbircea, Mirel Cosulschi
Rok vydání: 2016
Předmět:
Zdroj: Intelligent Computing Systems ISBN: 9783662491775
DOI: 10.1007/978-3-662-49179-9_15
Popis: Large amounts of data can be found today in all areas as a result of various processes like e-commerce transactions, banking or credit card transactions, or web navigation user sessions (recorded into web server logs). The development and implementation of algorithms able to process huge amounts of data have become more affordable due to cloud computing and the MapReduce programming model, which, in turn, enabled the development of some open-source frameworks, such as Apache Hadoop. Based on the values obtained by computing the Jaccard similarity coefficients for two very large graphs, we have analysed in this paper the connections and influences that certain nodes have over other nodes. Also, we have illustrated how the Apache Hadoop framework and the MapReduce programming model can be used for a large amount of computations.
Databáze: OpenAIRE