Zobrazeno 1 - 10
of 441
pro vyhledávání: '"MILLER, RENEE"'
The widespread use of machine learning and data-driven algorithms for decision making has been steadily increasing over many years. The areas in which this is happening are diverse: healthcare, employment, finance, education, the legal system to name
Externí odkaz:
http://arxiv.org/abs/2405.12312
We introduce the problem of Table Reclamation. Given a Source Table and a large table repository, reclamation finds a set of tables that, when integrated, reproduce the source table as closely as possible. Unlike query discovery problems like Query-b
Externí odkaz:
http://arxiv.org/abs/2403.14128
Given a set of deep learning models, it can be hard to find models appropriate to a task, understand the models, and characterize how models are different one from another. Currently, practitioners rely on manually-written documentation to understand
Externí odkaz:
http://arxiv.org/abs/2403.02327
Most research on data discovery has so far focused on improving individual discovery operators such as join, correlation, or union discovery. However, in practice, a combination of these techniques and their corresponding indexes may be necessary to
Externí odkaz:
http://arxiv.org/abs/2310.02656
Data management has traditionally relied on synthetic data generators to generate structured benchmarks, like the TPC suite, where we can control important parameters like data size and its distribution precisely. These benchmarks were central to the
Externí odkaz:
http://arxiv.org/abs/2308.03883
We demonstrate a novel table discovery pipeline called DIALITE that allows users to discover, integrate and analyze open data tables. DIALITE has three main stages. First, it allows users to discover tables from open data platforms using state-of-the
Externí odkaz:
http://arxiv.org/abs/2304.08285
Autor:
Shraga, Roee, Miller, Renée J.
In multi-user environments in which data science and analysis is collaborative, multiple versions of the same datasets are generated. While managing and storing data versions has received some attention in the research literature, the semantic nature
Externí odkaz:
http://arxiv.org/abs/2301.13095
Dataset discovery from data lakes is essential in many real application scenarios. In this paper, we propose Starmie, an end-to-end framework for dataset discovery from data lakes (with table union search as the main use case). Our proposed framework
Externí odkaz:
http://arxiv.org/abs/2210.01922
Autor:
Khatiwada, Aamod, Fan, Grace, Shraga, Roee, Chen, Zixuan, Gatterbauer, Wolfgang, Miller, Renée J., Riedewald, Mirek
Existing techniques for unionable table search define unionability using metadata (tables must have the same or similar schemas) or column-based metrics (for example, the values in a table should be drawn from the same domain). In this work, we intro
Externí odkaz:
http://arxiv.org/abs/2209.13589
Autor:
Leventidis, Aristotelis, Di Rocco, Laura, Gatterbauer, Wolfgang, Miller, Renée J., Riedewald, Mirek
Modern data lakes are deeply heterogeneous in the vocabulary that is used to describe data. We study a problem of disambiguation in data lakes: how can we determine if a data value occurring more than once in the lake has different meanings and is th
Externí odkaz:
http://arxiv.org/abs/2103.09940