Popis: |
The search for new high-performance organic semiconducting molecules is challenging due to the vastness of the chemical space, machine learning methods, particularly deep learning models like graph neural networks (GNNs), have shown promising potential to address such challenge. However, practical applications of GNNs for chemistry are often limited by the availability of labelled data. Meanwhile, unlabelled molecular data is abundant and could potentially be utilized to alleviate the scarcity issue of labelled data. Here, we advocate the use of self-supervised learning to improve the performance of GNNs by pre-training them with unlabeled molecular data. We investigate regression problems involving ground and excited state properties, both relevant for optoelectronic properties of organic semiconductors. Additionally, we extend the self-supervised learning strategy to molecules in non-equilibrium configurations which are important for studying the effects of disorder. In all cases, we obtain considerable performance improvement over results without pre-training, in particular when labelled training data is limited, and such improvement is attributed to the capability of self-supervised learning in identifying structural similarity among unlabeled molecules. |