Towards Analytics-as-a-Service Using an In-Memory Column Database.

Autor: Schaffner, Jan, Eckart, Benjamin, Schwarz, Christian, Brunnert, Jan, Jacobs, Dean, Zeier, Alexander
Zdroj: New Frontiers in Information & Software as Services; 2011, p257-282, 26p
Abstrakt: For traditional data warehouses, mostly large and expensive server and storage systems are used. For small- and medium size companies, it is often too expensive to implement and run such systems. Given this situation, the SaaS model comes in handy, since these companies might opt to run their OLAP as a service. The challenge is then for the analytics service provider to minimize TCO by consolidating as many tenants onto as few servers as possible, a technique often referred to as multi-tenancy. In this article, we report on three different results on our research around building a cluster of multi-tenant main memory column databases for analytics as a service. For this purpose we ported SAP΄s in-memory column database TREX to run in the Amazon cloud. We evaluated the relation between data size of a tenant and number of queries per second and created a formula which allows us to estimate how many tenants with different sizes and request rates can be put on one instance for our main memory database. We discuss findings on cost/performance tradeoffs between reliably storing the data of a tenant on a single node using a highly-available network attached storage, such as Amazon EBS, vs. replication of tenant data to a secondary node where the data resides on less resilient storage. We also describe a mechanism to provide support for historical queries across older snapshots of tenant data which is lazy-loaded from Amazon΄s S3 near-line archiving storage and cached on the local VM disks. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index