Computing a Similarity Coefficient for Mining Massive Data Sets

Autor:	Mihai Gabroveanu, Adriana Sbircea, Mirel Cosulschi
Rok vydání:	2016
Předmět:	Web server Jaccard index business.industry Computer science Big data Cloud computing Virtualization computer.software_genre Credit card Programming paradigm Web navigation Data mining business computer
Zdroj:	Intelligent Computing Systems ISBN: 9783662491775
DOI:	10.1007/978-3-662-49179-9_15
Popis:	Large amounts of data can be found today in all areas as a result of various processes like e-commerce transactions, banking or credit card transactions, or web navigation user sessions (recorded into web server logs). The development and implementation of algorithms able to process huge amounts of data have become more affordable due to cloud computing and the MapReduce programming model, which, in turn, enabled the development of some open-source frameworks, such as Apache Hadoop. Based on the values obtained by computing the Jaccard similarity coefficients for two very large graphs, we have analysed in this paper the connections and influences that certain nodes have over other nodes. Also, we have illustrated how the Apache Hadoop framework and the MapReduce programming model can be used for a large amount of computations.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::73e40a3e35bfb612a4a26f7249641ee5 https://doi.org/10.1007/978-3-662-49179-9_15 Zobrazit plný text záznamu