Zobrazeno 1 - 10
of 255
pro vyhledávání: '"Raúl, Castro"'
Autor:
Wang, Qiming, Fernandez, Raul Castro
Reading comprehension models answer questions posed in natural language when provided with a short passage of text. They present an opportunity to address a long-standing challenge in data management: the extraction of structured data from unstructur
Externí odkaz:
http://arxiv.org/abs/2408.09226
Data sharing is central to a wide variety of applications such as fraud detection, ad matching, and research. The lack of data sharing abstractions makes the solution to each data sharing problem bespoke and cost-intensive, hampering value generation
Externí odkaz:
http://arxiv.org/abs/2408.04092
Autor:
Zhu, Zhiru, Fernandez, Raul Castro
The data-driven economy has created tremendous value in our society. Individuals share their data with platforms in exchange for services such as search, social networks, and health recommendations. Platforms use the data to provide those services an
Externí odkaz:
http://arxiv.org/abs/2408.01580
As users migrate their analytical workloads to cloud databases, it is becoming just as important to reduce monetary costs as it is to optimize query runtime. In the cloud, a query is billed based on either its compute time or the amount of data it pr
Externí odkaz:
http://arxiv.org/abs/2408.00253
Autor:
Han, Minbiao, Light, Jonathan, Xia, Steven, Galhotra, Sainyam, Fernandez, Raul Castro, Xu, Haifeng
Data fuels machine learning (ML) - rich and high-quality training data is essential to the success of ML. However, to transform ML from the race among a few large corporations to an accessible technology that serves numerous normal users' data analys
Externí odkaz:
http://arxiv.org/abs/2310.17843
Autor:
Zhu, Zhiru, Fernandez, Raul Castro
Differential privacy (DP) enables private data analysis but is hard to use in practice. For data controllers who decide what output to release, choosing the amount of noise to add to the output is a non-trivial task because of the difficulty of inter
Externí odkaz:
http://arxiv.org/abs/2310.13104
Publikováno v:
VLDB 2023
Recent data search platforms use ML task-based utility measures rather than metadata-based keywords, to search large dataset corpora. Requesters submit a training dataset and these platforms search for augmentations (join or union compatible datasets
Externí odkaz:
http://arxiv.org/abs/2307.00432
High-quality machine learning models are dependent on access to high-quality training data. When the data are not already available, it is tedious and costly to obtain them. Data markets help with identifying valuable training data: model consumers p
Externí odkaz:
http://arxiv.org/abs/2306.02543
AutoML services provide a way for non-expert users to benefit from high-quality ML models without worrying about model design and deployment, in exchange for a charge per hour ($21.252 for VertexAI). However, existing AutoML services are model-centri
Externí odkaz:
http://arxiv.org/abs/2305.10419
Autor:
Xia, Siyuan, Zhu, Zhiru, Zhu, Chris, Zhao, Jinjin, Chard, Kyle, Elmore, Aaron J., Foster, Ian, Franklin, Michael, Krishnan, Sanjay, Fernandez, Raul Castro
Pooling and sharing data increases and distributes its value. But since data cannot be revoked once shared, scenarios that require controlled release of data for regulatory, privacy, and legal reasons default to not sharing. Because selectively contr
Externí odkaz:
http://arxiv.org/abs/2305.03842