Výsledky vyhledávání - "A. Varoquaux"

Report

Hype, Sustainability, and the Price of the Bigger-is-Better Paradigm in AI

Autor: Varoquaux, Gaël, Luccioni, Alexandra Sasha, Whittaker, Meredith

With the growing attention and investment in recent AI approaches such as large language models, the narrative that the larger the AI system the more valuable, powerful and interesting it is is increasingly seen as common sense. But what is this assu

Externí odkaz: http://arxiv.org/abs/2409.14160

Zobrazit plný text záznamu

Report

What is the Role of Small Models in the LLM Era: A Survey

Autor: Chen, Lihu, Varoquaux, Gaël

Large Language Models (LLMs) have made significant progress in advancing artificial general intelligence (AGI), leading to the development of increasingly large models such as GPT-4 and LLaMA-405B. However, scaling up model sizes results in exponenti

Externí odkaz: http://arxiv.org/abs/2409.06857

Zobrazit plný text záznamu

Report

Imputation for prediction: beware of diminishing returns

Autor: Morvan, Marine Le, Varoquaux, Gaël

Missing values are prevalent across various fields, posing challenges for training and deploying predictive models. In this context, imputation is a common practice, driven by the hope that accurate imputations will enhance predictions. However, rece

Externí odkaz: http://arxiv.org/abs/2407.19804

Zobrazit plný text záznamu

Report

Teaching Models To Survive: Proper Scoring Rule and Stochastic Optimization with Competing Risks

Autor: Alberge, Julie, Maladière, Vincent, Grisel, Olivier, Abécassis, Judith, Varoquaux, Gaël

When data are right-censored, i.e. some outcomes are missing due to a limited period of observation, survival analysis can compute the "time to event". Multiple classes of outcomes lead to a classification variant: predicting the most likely event, k

Externí odkaz: http://arxiv.org/abs/2406.14085

Zobrazit plný text záznamu

Report

CARTE: Pretraining and Transfer for Tabular Learning

Autor: Kim, Myung Jun, Grinsztajn, Léo, Varoquaux, Gaël

Pretrained deep-learning models are the go-to solution for images or text. However, for tabular data the standard is still to train tree-based models. Indeed, transfer learning on tables hits the challenge of data integration: finding correspondences

Externí odkaz: http://arxiv.org/abs/2402.16785

Zobrazit plný text záznamu

Report

Retrieve, Merge, Predict: Augmenting Tables with Data Lakes

Autor: Cappuzzo, Riccardo, Coelho, Aimee, Lefebvre, Felix, Papotti, Paolo, Varoquaux, Gael

We present an in-depth analysis of data discovery in data lakes, focusing on table augmentation for given machine learning tasks. We analyze alternative methods used in the three main steps: retrieving joinable tables, merging information, and predic

Externí odkaz: http://arxiv.org/abs/2402.06282

Zobrazit plný text záznamu

Report

Reconfidencing LLMs from the Grouping Loss Perspective

Autor: Chen, Lihu, Perez-Lebel, Alexandre, Suchanek, Fabian M., Varoquaux, Gaël

Large Language Models (LLMs), including ChatGPT and LLaMA, are susceptible to generating hallucinated answers in a confident tone. While efforts to elicit and calibrate confidence scores have proven useful, recent findings show that controlling uncer

Externí odkaz: http://arxiv.org/abs/2402.04957

Zobrazit plný text záznamu

Report

Learning High-Quality and General-Purpose Phrase Representations

Autor: Chen, Lihu, Varoquaux, Gaël, Suchanek, Fabian M.

Phrase representations play an important role in data science and natural language processing, benefiting various tasks like Entity Alignment, Record Linkage, Fuzzy Joins, and Paraphrase Classification. The current state-of-the-art method involves fi

Externí odkaz: http://arxiv.org/abs/2401.10407

Zobrazit plný text záznamu

Report

Vectorizing string entries for data processing on tables: when are larger language models better?

Autor: Grinsztajn, Léo, Oyallon, Edouard, Kim, Myung Jun, Varoquaux, Gaël

There are increasingly efficient data processing pipelines that work on vectors of numbers, for instance most machine learning models, or vector databases for fast similarity search. These require converting the data to numbers. While this conversion

Externí odkaz: http://arxiv.org/abs/2312.09634

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání