Nobody ever got fired for using Hadoop on a cluster

Autor:	Greg O'Shea, Dushyanth Narayanan, Antony Rowstron, Andrew Douglas, Austin Donnelly
Rok vydání:	2012
Předmět:	business.industry Computer science Big data computer.software_genre nobody Ask price Analytics Server Scalability Operating system Data analysis business Programmer computer
Zdroj:	Proceedings of the 1st International Workshop on Hot Topics in Cloud Data Processing.
Popis:	The norm for data analytics is now to run them on commodity clusters with MapReduce-like abstractions. One only needs to read the popular blogs to see the evidence of this. We believe that we could now say that "nobody ever got fired for using Hadoop on a cluster"!We completely agree that Hadoop on a cluster is the right solution for jobs where the input data is multi-terabyte or larger. However, in this position paper we ask if this is the right path for general purpose data analytics? Evidence suggests that many MapReduce-like jobs process relatively small input data sets (less than 14 GB). Memory has reached a GB/$ ratio such that it is now technically and financially feasible to have servers with 100s GB of DRAM. We therefore ask, should we be scaling by using single machines with very large memories rather than clusters? We conjecture that, in terms of hardware and programmer time, this may be a better option for the majority of data processing jobs.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::84839dbce7956897ca99cb8efee92794 https://doi.org/10.1145/2169090.2169092 Zobrazit plný text záznamu