Zobrazeno 1 - 10
of 202
pro vyhledávání: '"KRASKA, TIM"'
Autor:
Yu, Geoffrey X., Wu, Ziniu, Kossmann, Ferdi, Li, Tianyu, Markakis, Markos, Ngom, Amadou, Madden, Samuel, Kraska, Tim
Modern organizations manage their data with a wide variety of specialized cloud database engines (e.g., Aurora, BigQuery, etc.). However, designing and managing such infrastructures is hard. Developers must consider many possible designs with non-obv
Externí odkaz:
http://arxiv.org/abs/2407.15363
Autor:
Liu, Chunwei, Russo, Matthew, Cafarella, Michael, Cao, Lei, Chen, Peter Baille, Chen, Zui, Franklin, Michael, Kraska, Tim, Madden, Samuel, Vitagliano, Gerardo
A long-standing goal of data management systems has been to build systems which can compute quantitative insights over large corpora of unstructured data in a cost-effective manner. Until recently, it was difficult and expensive to extract facts from
Externí odkaz:
http://arxiv.org/abs/2405.14696
Retrieval-augmented generation (RAG) can enhance the generation quality of large language models (LLMs) by incorporating external token databases. However, retrievals from large databases can constitute a substantial portion of the overall generation
Externí odkaz:
http://arxiv.org/abs/2403.05676
Autor:
Wu, Ziniu, Marcus, Ryan, Liu, Zhengchun, Negi, Parimarjan, Nathan, Vikram, Pfeil, Pascal, Saxena, Gaurav, Rahman, Mohammad, Narayanaswamy, Balakrishnan, Kraska, Tim
Query performance (e.g., execution time) prediction is a critical component of modern DBMSes. As a pioneering cloud data warehouse, Amazon Redshift relies on an accurate execution time prediction for many downstream tasks, ranging from high-level opt
Externí odkaz:
http://arxiv.org/abs/2403.02286
Autor:
Kossmann, Ferdinand, Wu, Ziniu, Lai, Eugenie, Tatbul, Nesime, Cao, Lei, Kraska, Tim, Madden, Samuel
Publikováno v:
Proc. VLDB Endow. 16, 9 (May 2023), 2302-2315
Social media, self-driving cars, and traffic cameras produce video streams at large scales and cheap cost. However, storing and querying video at such scales is prohibitively expensive. We propose to treat large-scale video analytics as a data wareho
Externí odkaz:
http://arxiv.org/abs/2310.04830
Autor:
Chen, Zui, Cao, Lei, Madden, Sam, Kraska, Tim, Shang, Zeyuan, Fan, Ju, Tang, Nan, Gu, Zihui, Liu, Chunwei, Cafarella, Michael
Data curation tasks that prepare data for analytics are critical for turning data into actionable insights. However, due to the diverse requirements of applications in different domains, generic off-the-shelf tools are typically insufficient. As a re
Externí odkaz:
http://arxiv.org/abs/2310.00749
Autor:
Kristo, Ani, Kraska, Tim
External sorting is at the core of many operations in large-scale database systems, such as ordering and aggregation queries for large result sets, building indexes, sort-merge joins, duplicate removal, sharding, and record clustering. Unlike in-memo
Externí odkaz:
http://arxiv.org/abs/2305.05671
Cardinality estimation is one of the most fundamental and challenging problems in query optimization. Neither classical nor learning-based methods yield satisfactory performance when estimating the cardinality of the join queries. They either rely on
Externí odkaz:
http://arxiv.org/abs/2212.05526
Learned index structures have been shown to achieve favorable lookup performance and space consumption compared to their traditional counterparts such as B-trees. However, most learned index studies have focused on the primary indexing setting, where
Externí odkaz:
http://arxiv.org/abs/2205.05769
We introduce the RadixStringSpline (RSS) learned index structure for efficiently indexing strings. RSS is a tree of radix splines each indexing a fixed number of bytes. RSS approaches or exceeds the performance of traditional string indexes while usi
Externí odkaz:
http://arxiv.org/abs/2111.14905