Autor: |
Scott H. Snyder, Patricia A. Vignaux, Mustafa Kemal Ozalp, Jacob Gerlach, Ana C. Puhl, Thomas R. Lane, John Corbett, Fabio Urbina, Sean Ekins |
Jazyk: |
angličtina |
Rok vydání: |
2024 |
Předmět: |
|
Zdroj: |
Communications Chemistry, Vol 7, Iss 1, Pp 1-11 (2024) |
Druh dokumentu: |
article |
ISSN: |
2399-3669 |
DOI: |
10.1038/s42004-024-01220-4 |
Popis: |
Abstract Recent advances in machine learning (ML) have led to newer model architectures including transformers (large language models, LLMs) showing state of the art results in text generation and image analysis as well as few-shot learning (FSLC) models which offer predictive power with extremely small datasets. These new architectures may offer promise, yet the ‘no-free lunch’ theorem suggests that no single model algorithm can outperform at all possible tasks. Here, we explore the capabilities of classical (SVR), FSLC, and transformer models (MolBART) over a range of dataset tasks and show a ‘goldilocks zone’ for each model type, in which dataset size and feature distribution (i.e. dataset “diversity”) determines the optimal algorithm strategy. When datasets are small ( |
Databáze: |
Directory of Open Access Journals |
Externí odkaz: |
|
Nepřihlášeným uživatelům se plný text nezobrazuje |
K zobrazení výsledku je třeba se přihlásit.
|