Autor: |
Kearnes SM; Relay Therapeutics, Cambridge, Massachusetts 02139, United States., Maser MR; Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States., Wleklinski M; Chemistry Capabilities Accelerating Therapeutics, Merck & Co., Inc., Kenilworth, New Jersey 07033, United States., Kast A; Google LLC, Mountain View, California 94043, United States., Doyle AG; Department of Chemistry & Biochemistry, University of California at Los Angeles, Los Angeles, California 90095, United States., Dreher SD; Chemistry Capabilities Accelerating Therapeutics, Merck & Co., Inc., Kenilworth, New Jersey 07033, United States., Hawkins JM; Chemical Research and Development, Pfizer Worldwide Research and Development, Groton, Connecticut 06340, United States., Jensen KF; Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States., Coley CW; Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.; Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States. |
Abstrakt: |
Chemical reaction data in journal articles, patents, and even electronic laboratory notebooks are currently stored in various formats, often unstructured, which presents a significant barrier to downstream applications, including the training of machine-learning models. We present the Open Reaction Database (ORD), an open-access schema and infrastructure for structuring and sharing organic reaction data, including a centralized data repository. The ORD schema supports conventional and emerging technologies, from benchtop reactions to automated high-throughput experiments and flow chemistry. The data, schema, supporting code, and web-based user interfaces are all publicly available on GitHub. Our vision is that a consistent data representation and infrastructure to support data sharing will enable downstream applications that will greatly improve the state of the art with respect to computer-aided synthesis planning, reaction prediction, and other predictive chemistry tasks. |