Zobrazeno 1 - 10
of 13
pro vyhledávání: '"Dixin Tang"'
Many tools empower analysts and data scientists to consume analysis results in a visual interface, such as a dashboard. When the underlying data changes, these results need to be updated, but this update can take a long time -- all while the user con
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::089265ae4d76bd96fb2de85adff9bbf3
Autor:
Doris Jung-Lin Lee, Dixin Tang, Kunal Agarwal, Thyne Boonmark, Caitlyn Chen, Jake Kang, Ujjaini Mukhopadhyay, Jerry Song, Micah Yong, Marti A. Hearst, Aditya G. Parameswaran
Publikováno v:
Proceedings of the VLDB Endowment. 15:727-738
Exploratory data science largely happens in computational notebooks with dataframe APIs, such as pandas, that support flexible means to transform, clean, and analyze data. Yet, visually exploring data in dataframes remains tedious, requiring substant
Autor:
Devin Petersohn, Dixin Tang, Rehan Durrani, Areg Melik-Adamyan, Joseph E. Gonzalez, Anthony D. Joseph, Aditya G. Parameswaran
Publikováno v:
Proceedings of the VLDB Endowment. 15:739-751
Dataframes have become universally popular as a means to represent data in various stages of structure, and manipulate it using a rich set of operators---thereby becoming an essential tool in the data scientists' toolbox. However, dataframe systems,
Publikováno v:
Proceedings of the VLDB Endowment. 13:2937-2940
Existing stream processing and continuous query processing systems eagerly maintain standing queries by consuming all available resources to finish the jobs at hand, which can be a major source of wasting CPU cycles and memory resources. However, use
Publikováno v:
SIGMOD Conference
Shared query execution can reduce resource consumption by sharing common sub-expressions across concurrent queries. We show that this is not always the case when regularly querying a dataset under change. Depending on latency goals, how eagerly to in
Publikováno v:
Proceedings of the VLDB Endowment. 12:1427-1441
Many applications ingest data in an intermittent, yet largely predictable, pattern. Existing systems tend to ignore how data arrives when making decisions about how to update (or refresh) an ongoing query. To address this shortcoming we propose a new
Publikováno v:
ICDE
Aaron Elmore
Aaron Elmore
Data loading has been one of the most common performance bottlenecks for many big data applications, especially when they are running on inefficient human-readable formats, such as JSON or CSV. Parsing, validating, integrity checking and data structu
Publikováno v:
SIGMOD Conference
Many applications schedule queries before all data is ready. To return fast query results, database systems can eagerly process existing data and incrementally incorporate new data into prior intermediate results, which often relies on incremental vi
Autor:
Krystyna Reisteter, Cristian Diaconu, Sandeep Lingam, Dixin Tang, Umar Farooq Minhas, Jack Hu, Vijendra Purohit, Alejandro Hernandez Saenz, Naveen Prakash, Hugh Qu, Sheetal Shrotri, Chaitanya Sreenivas Ravella, Alex Budovski, Hanuma Kodavalla, Vikram Wakade, Donald Kossmann, Panagiotis Antonopoulos
Publikováno v:
SIGMOD Conference
The database-as-a-service paradigm in the cloud (DBaaS) is becoming increasingly popular. Organizations adopt this paradigm because they expect higher security, higher availability, and lower and more flexible cost with high performance. It has becom
Publikováno v:
NAS
With the highly demanded requirements for manipulating large scientific datasets, scientists are in need of flexible cluster-level software to execute fast scientific data analysis. In this paper, we discuss whether the Apache Spark framework is suit