Autor: |
Mukherjee, Anirban, Datta, Joydip, Jorapur, Raghavendra, Singhvi, Ravi, Haloi, Saurav, Akram, Wasim |
Zdroj: |
2012 19th International Conference on High Performance Computing; 2012, p1-6, 6p |
Abstrakt: |
Big Data is a term applied to data sets whose size is beyond the ability of traditional software technologies to capture, store, manage and process within a tolerable elapsed time. The popular assumption around Big Data analytics is that it requires internet scale scalability: over hundreds of compute nodes with attached storage. In this paper., we debate on the need of a massively scalable distributed computing platform for Big Data analytics in traditional businesses. For organizations which don't need a horizontal., internet order scalability in their analytics platform., Big Data analytics can be built on top of a traditional POSIX Cluster File Systems employing a shared storage model. In this study., we compared a widely used clustered file system: VERITAS Cluster File System (SF-CFS) with Hadoop Distributed File System (HDFS) using popular Map-reduce benchmarks like Terasort., DFS-IO and Gridmix on top of Apache Hadoop. In our experiments VxCFS could not only match the performance of HDFS., but also outperformed in many cases. This way., enterprises can fulfill their Big Data analytics need with a traditional and existing shared storage model without migrating to a different storage model in their data centers. This also includes other benefits like stability & robustness., a rich set of features and compatibility with traditional analytics applications. [ABSTRACT FROM PUBLISHER] |
Databáze: |
Complementary Index |
Externí odkaz: |
|