Zobrazeno 1 - 10
of 51
pro vyhledávání: '"Seedat, Nabeel"'
Real-world machine learning systems often encounter model performance degradation due to distributional shifts in the underlying data generating process (DGP). Existing approaches to addressing shifts, such as concept drift adaptation, are limited by
Externí odkaz:
http://arxiv.org/abs/2411.00186
Schema matching -- the task of finding matches between attributes across disparate data sources with different tables and hierarchies -- is critical for creating interoperable machine learning (ML)-ready data. Addressing this fundamental data-centric
Externí odkaz:
http://arxiv.org/abs/2410.24105
The predominant de facto paradigm of testing ML models relies on either using only held-out data to compute aggregate evaluation metrics or by assessing the performance on different subgroups. However, such data-only testing methods operate under the
Externí odkaz:
http://arxiv.org/abs/2410.24005
Pseudo-labeling is a popular semi-supervised learning technique to leverage unlabeled data when labeled samples are scarce. The generation and selection of pseudo-labels heavily rely on labeled data. Existing approaches implicitly assume that the lab
Externí odkaz:
http://arxiv.org/abs/2406.13733
Constructing valid prediction intervals rather than point estimates is a well-established approach for uncertainty quantification in the regression setting. Models equipped with this capacity output an interval of values in which the ground truth tar
Externí odkaz:
http://arxiv.org/abs/2406.03258
Characterizing samples that are difficult to learn from is crucial to developing highly performant ML models. This has led to numerous Hardness Characterization Methods (HCMs) that aim to identify "hard" samples. However, there is a lack of consensus
Externí odkaz:
http://arxiv.org/abs/2403.04551
Autor:
Huynh, Nicolas, Berrevoets, Jeroen, Seedat, Nabeel, Crabbé, Jonathan, Qian, Zhaozhi, van der Schaar, Mihaela
Identification and appropriate handling of inconsistencies in data at deployment time is crucial to reliably use machine learning models. While recent data-centric methods are able to identify such inconsistencies with respect to the training set, th
Externí odkaz:
http://arxiv.org/abs/2402.17599
Bayesian optimization (BO) is a powerful approach for optimizing complex and expensive-to-evaluate black-box functions. Its importance is underscored in many applications, notably including hyperparameter tuning, but its efficacy depends on efficient
Externí odkaz:
http://arxiv.org/abs/2402.03921
Machine Learning (ML) in low-data settings remains an underappreciated yet crucial problem. Hence, data augmentation methods to increase the sample size of datasets needed for ML are key to unlocking the transformative potential of ML in data-deprive
Externí odkaz:
http://arxiv.org/abs/2312.12112
Evaluating the value of a hypothetical target policy with only a logged dataset is important but challenging. On the one hand, it brings opportunities for safe policy improvement under high-stakes scenarios like clinical guidelines. On the other hand
Externí odkaz:
http://arxiv.org/abs/2311.14110