COMPAS-2: a dataset of cata-condensed hetero-polycyclic aromatic systems

Autor: Eduardo Mayo Yanes, Sabyasachi Chakraborty, Renana Gershoni-Poranne
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: Scientific Data, Vol 11, Iss 1, Pp 1-11 (2024)
Druh dokumentu: article
ISSN: 2052-4463
DOI: 10.1038/s41597-024-02927-8
Popis: Abstract Polycyclic aromatic systems are highly important to numerous applications, in particular to organic electronics and optoelectronics. High-throughput screening and generative models that can help to identify new molecules to advance these technologies require large amounts of high-quality data, which is expensive to generate. In this report, we present the largest freely available dataset of geometries and properties of cata-condensed poly(hetero)cyclic aromatic molecules calculated to date. Our dataset contains ~500k molecules comprising 11 types of aromatic and antiaromatic building blocks calculated at the GFN1-xTB level and is representative of a highly diverse chemical space. We detail the structure enumeration process and the methods used to provide various electronic properties (including HOMO-LUMO gap, adiabatic ionization potential, and adiabatic electron affinity). Additionally, we benchmark against a ~50k dataset calculated at the CAM-B3LYP-D3BJ/def2-SVP level and develop a fitting scheme to correct the xTB values to higher accuracy. These new datasets represent the second installment in the COMputational database of Polycyclic Aromatic Systems (COMPAS) Project.
Databáze: Directory of Open Access Journals