Výsledky vyhledávání - "MILLER, RENÉE J."

Report

A Principled Approach for a New Bias Measure

Autor: Scarone, Bruno, Viola, Alfredo, Miller, Renée J., Baeza-Yates, Ricardo

The widespread use of machine learning and data-driven algorithms for decision making has been steadily increasing over many years. The areas in which this is happening are diverse: healthcare, employment, finance, education, the legal system to name

Externí odkaz: http://arxiv.org/abs/2405.12312

Zobrazit plný text záznamu

Report

Gen-T: Table Reclamation in Data Lakes

Autor: Fan, Grace, Shraga, Roee, Miller, Renée J.

We introduce the problem of Table Reclamation. Given a Source Table and a large table repository, reclamation finds a set of tables that, when integrated, reproduce the source table as closely as possible. Unlike query discovery problems like Query-b

Externí odkaz: http://arxiv.org/abs/2403.14128

Zobrazit plný text záznamu

Report

Model Lakes

Autor: Pal, Koyena, Bau, David, Miller, Renée J.

Given a set of deep learning models, it can be hard to find models appropriate to a task, understand the models, and characterize how models are different one from another. Currently, practitioners rely on manually-written documentation to understand

Externí odkaz: http://arxiv.org/abs/2403.02327

Zobrazit plný text záznamu

Report

Blend: A Unified Data Discovery System

Autor: Esmailoghli, Mahdi, Schnell, Christoph, Miller, Renée J., Abedjan, Ziawasch

Data discovery is an iterative and incremental process that necessitates the execution of multiple data discovery queries to identify the desired tables from large and diverse data lakes. Current methodologies concentrate on single discovery tasks su

Externí odkaz: http://arxiv.org/abs/2310.02656

Zobrazit plný text záznamu

Report

Generative Benchmark Creation for Table Union Search

Autor: Pal, Koyena, Khatiwada, Aamod, Shraga, Roee, Miller, Renée J.

Data management has traditionally relied on synthetic data generators to generate structured benchmarks, like the TPC suite, where we can control important parameters like data size and its distribution precisely. These benchmarks were central to the

Externí odkaz: http://arxiv.org/abs/2308.03883

Zobrazit plný text záznamu

Report

DIALITE: Discover, Align and Integrate Open Data Tables

Autor: Khatiwada, Aamod, Shraga, Roee, Miller, Renée J.

We demonstrate a novel table discovery pipeline called DIALITE that allows users to discover, integrate and analyze open data tables. DIALITE has three main stages. First, it allows users to discover tables from open data platforms using state-of-the

Externí odkaz: http://arxiv.org/abs/2304.08285

Zobrazit plný text záznamu

Report

Explaining Dataset Changes for Semantic Data Versioning with Explain-Da-V (Technical Report)

Autor: Shraga, Roee, Miller, Renée J.

In multi-user environments in which data science and analysis is collaborative, multiple versions of the same datasets are generated. While managing and storing data versions has received some attention in the research literature, the semantic nature

Externí odkaz: http://arxiv.org/abs/2301.13095

Zobrazit plný text záznamu

Report

SANTOS: Relationship-based Semantic Table Union Search

Autor: Khatiwada, Aamod, Fan, Grace, Shraga, Roee, Chen, Zixuan, Gatterbauer, Wolfgang, Miller, Renée J., Riedewald, Mirek

Existing techniques for unionable table search define unionability using metadata (tables must have the same or similar schemas) or column-based metrics (for example, the values in a table should be drawn from the same domain). In this work, we intro

Externí odkaz: http://arxiv.org/abs/2209.13589

Zobrazit plný text záznamu

Report

DomainNet: Homograph Detection for Data Lake Disambiguation

Autor: Leventidis, Aristotelis, Di Rocco, Laura, Gatterbauer, Wolfgang, Miller, Renée J., Riedewald, Mirek

Modern data lakes are deeply heterogeneous in the vocabulary that is used to describe data. We study a problem of disambiguation in data lakes: how can we determine if a data value occurring more than once in the lake has different meanings and is th

Externí odkaz: http://arxiv.org/abs/2103.09940

Zobrazit plný text záznamu

Report

Knowledge Translation: Extended Technical Report

Autor: Bashardoost, Bahar Ghadiri, Miller, Renée J., Lyons, Kelly, Nargesian, Fatemeh

We introduce Kensho, a tool for generating mapping rules between two Knowledge Bases (KBs). To create the mapping rules, Kensho starts with a set of correspondences and enriches them with additional semantic information automatically identified from

Externí odkaz: http://arxiv.org/abs/2008.01208

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání