Investigating Data Hierarchies in Multifidelity Machine Learning for Excitation Energies

Autor:	Vinod, Vivin, Zaspel, Peter
Rok vydání:	2024
Předmět:	Physics - Chemical Physics Computer Science - Machine Learning Physics - Computational Physics
Druh dokumentu:	Working Paper
Popis:	Recent progress in machine learning (ML) has made high-accuracy quantum chemistry (QC) calculations more accessible. Of particular interest are multifidelity machine learning (MFML) methods where training data from differing accuracies or fidelities are used. These methods usually employ a fixed scaling factor, $\gamma$, to relate the number of training samples across different fidelities, which reflects the cost and assumed sparsity of the data. This study investigates the impact of modifying $\gamma$ on model efficiency and accuracy for the prediction of vertical excitation energies using the QeMFi benchmark dataset. Further, this work introduces QC compute time informed scaling factors, denoted as $\theta$, that vary based on QC compute times at different fidelities. A novel error metric, error contours of MFML, is proposed to provide a comprehensive view of model error contributions from each fidelity. The results indicate that high model accuracy can be achieved with just 2 training samples at the target fidelity when a larger number of samples from lower fidelities are used. This is further illustrated through a novel concept, the $\Gamma$-curve, which compares model error against the time-cost of generating training samples, demonstrating that multifidelity models can achieve high accuracy while minimizing training data costs.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2410.11392 Zobrazit plný text záznamu View this record from Arxiv