Universal Feature Selection for Simultaneous Interpretability of Multitask Datasets

Autor: Raymond, Matt, Saldinger, Jacob Charles, Elvati, Paolo, Scott, Clayton, Violi, Angela
Rok vydání: 2024
Předmět:
Druh dokumentu: Working Paper
Popis: Extracting meaningful features from complex, high-dimensional datasets across scientific domains remains challenging. Current methods often struggle with scalability, limiting their applicability to large datasets, or make restrictive assumptions about feature-property relationships, hindering their ability to capture complex interactions. BoUTS's general and scalable feature selection algorithm surpasses these limitations to identify both universal features relevant to all datasets and task-specific features predictive for specific subsets. Evaluated on seven diverse chemical regression datasets, BoUTS achieves state-of-the-art feature sparsity while maintaining prediction accuracy comparable to specialized methods. Notably, BoUTS's universal features enable domain-specific knowledge transfer between datasets, and suggest deep connections in seemingly-disparate chemical datasets. We expect these results to have important repercussions in manually-guided inverse problems. Beyond its current application, BoUTS holds immense potential for elucidating data-poor systems by leveraging information from similar data-rich systems. BoUTS represents a significant leap in cross-domain feature selection, potentially leading to advancements in various scientific fields.
Comment: Main text: 14 pages, 3 figures, 1 table; SI: 7 pages, 1 figure, 4 tables, 3 algorithms
Databáze: arXiv