Image2SMILES: Transformer-based Molecular Optical Recognition Engine
Autor: | Sergey Sosnin, Maxim Fedorov, Lev Krasnov, Ivan Khokhlov |
---|---|
Rok vydání: | 2021 |
Předmět: |
Artificial neural network
business.industry Computer science Deep learning computer.software_genre Task (project management) Variety (cybernetics) Key (cryptography) Artificial intelligence Data mining business Representation (mathematics) computer Transformer (machine learning model) Generator (mathematics) |
DOI: | 10.26434/chemrxiv.14602716.v1 |
Popis: | The rise of deep learning in various scientific and technology areas promotes the development of AI-based tools for information retrieval. Optical recognition of organic structures is a key part of the automated extraction of chemical information. However, this is a challenging task because there is a large variety of representation styles. In this research, we present a Transformer-based artificial neural network to convert images of organic structures to molecular structures. To train the model, we created a comprehensive data generator that stochastically simulates various drawing styles, functional groups, functional group placeholders (R-groups), and visual contamination. We demonstrate that the Transformer-based architecture can gather chemical insights from our generator with almost absolute confidence. That means that, with Transformer, one can fully concentrate on data simulation to build a good recognition model. A web demo of our optical recognition engine is available online at Syntelly platform. |
Databáze: | OpenAIRE |
Externí odkaz: |