Popis: |
Since their inception, computer systems for managing and processing data are in a constant need for evolution to meet the requirements of an increasingly more connected world, and efficiently utilize a rapidly changing hardware. Driven by the internet and the growing popularity of utility computing, a current challenge is handling workloads from data processing services that are exported to many users. In this case, what is desirable is a robust performance across a wide spectrum of workloads both in terms of size and complexity. For instance, as part of strict service level agreements, response time guarantees for all requests must be met even at peeks times under high query and update loads. Moreover, with the large and diverse set of users, the format and shape of user requests evolve more rapidly. This entails the need for data processing systems and methods that provide high and robust performance for dynamic and unpredictable workloads. Making efficient use of available hardware resources is also becoming an increasingly more difficult and an important challenge, as we are at the onset of the end of an era of predictable yearly gains in performance. This dissertation addresses these challenges and presents several techniques for efficient handling large and complex workloads with analytical and transactional parts on modern hardware. The first part presents and evaluates a method for shared evaluation of relational join operations that enables efficient handling of large analytical workloads. This is achieved by using techniques that: 1) minimize redundant work by aggressively batching and sharing data and computation across a large set of concurrent analytical operations; and 2) make efficient use of hardware by minimizing the amount of computation and memory transfers performed per processed data item. The second part presents how large analytical and transactional workloads are handled in isolation, both within a machine and across multiple machines, by minimizing the effect that workloads have on each other’s performance. This is achieved by combining the following techniques: primary-secondary replication, dedicating hardware resources, i efficient and lightweight method of update propagation, and batch scheduling of updates and queries. This allows for strong performance guarantees for both the transactional and analytical parts of complex workloads. Finally, this dissertation investigates ways of dealing with increased parallelism of multicore hardware by analyzing methods for concurrent access to indexing data-structures used in data management systems. In particular, the use of hardware transactional memory is considered as a way to simplify development of concurrent indexes as opposed to the existing error-prone approaches of lock-free programming. It is shown that hardware transactional memory can not replace lock-free approaches, however it can be used to simplify their development. Implemented in a single system called BatchDB, these techniques allow for an efficient use of hardware and high and robust performance for large analytical and transactional workloads with strong guarantees for performance isolation. |