Adaptive atomic basis sets

Autor: Khan, Danish, Ach, Maximilian L., von Lilienfeld, O. Anatole
Rok vydání: 2024
Předmět:
Druh dokumentu: Working Paper
Popis: Atomic basis sets are widely employed within quantum mechanics based simulations of matter. We introduce a machine learning model that adapts the basis set to the local chemical environment of each atom, prior to the start of self consistent field (SCF) calculations. In particular, as a proof of principle and because of their historic popularity, we have studied the Gaussian type orbitals from the Pople basis set, i.e. the STO-3G, 3-21G, 6-31G and 6-31G*. We adapt the basis by scaling the variance of the radial Gaussian functions leading to contraction or expansion of the atomic orbitals.A data set of optimal scaling factors for C, H, O, N and F were obtained by variational minimization of the Hartree-Fock (HF) energy of the smallest 2500 organic molecules from the QM9 database. Kernel ridge regression based machine learning (ML) prediction errors of the change in scaling decay rapidly with training set size, typically reaching less than 1 % for training set size 2000. Overall, we find systematically lower variance, and consequently the larger training efficiencies, when going from hydrogen to carbon to nitrogen to oxygen. Using the scaled basis functions obtained from the ML model, we conducted HF calculations for the subsequent 30'000 molecules in QM9. In comparison to the corresponding default Pople basis set results we observed improved energetics in up to 99 % of all cases. With respect to the larger basis set 6-311G(2df,2pd), atomization energy errors are lowered on average by ~31, 107, 11, and 11 kcal/mol for STO-3G, 3-21G, 6-31G and 6-31G*, respectively -- with negligible computational overhead. We illustrate the high transferability of adaptive basis sets for larger out-of-domain molecules relevant to addiction, diabetes, pain, aging.
Databáze: arXiv