BlueDBM
Autor: | Jamey Hicks, Ming Liu, Sang-Woo Jun, John Ankcorn, Myron King, Arvind, Sungjin Lee, Shuotao Xu |
---|---|
Rok vydání: | 2016 |
Předmět: |
010302 applied physics
Sequential access memory Hardware_MEMORYSTRUCTURES General Computer Science business.industry Computer science Big data 02 engineering and technology computer.software_genre 01 natural sciences Flash memory 020202 computer hardware & architecture Flash (photography) Analytics Universal memory 0103 physical sciences 0202 electrical engineering electronic engineering information engineering Operating system business computer Auxiliary memory Flash file system |
Zdroj: | ACM Transactions on Computer Systems. 34:1-31 |
ISSN: | 1557-7333 0734-2071 |
DOI: | 10.1145/2898996 |
Popis: | Complex data queries, because of their need for random accesses, have proven to be slow unless all the data can be accommodated in DRAM. There are many domains, such as genomics, geological data, and daily Twitter feeds, where the datasets of interest are 5TB to 20TB. For such a dataset, one would need a cluster with 100 servers, each with 128GB to 256GB of DRAM, to accommodate all the data in DRAM. On the other hand, such datasets could be stored easily in the flash memory of a rack-sized cluster. Flash storage has much better random access performance than hard disks, which makes it desirable for analytics workloads. However, currently available off-the-shelf flash storage packaged as SSDs does not make effective use of flash storage because it incurs a great amount of additional overhead during flash device management and network access. In this article, we present BlueDBM, a new system architecture that has flash-based storage with in-store processing capability and a low-latency high-throughput intercontroller network between storage devices. We show that BlueDBM outperforms a flash-based system without these features by a factor of 10 for some important applications. While the performance of a DRAM-centric system falls sharply even if only 5% to 10% of the references are to secondary storage, this sharp performance degradation is not an issue in BlueDBM. BlueDBM presents an attractive point in the cost/performance tradeoff for Big Data analytics. |
Databáze: | OpenAIRE |
Externí odkaz: |