Rapid Computation of the Assembly Index of Molecular Graphs

Autor: Seet, Ian, Patarroyo, Keith Y., Siebert, Gage, Walker, Sara I., Cronin, Leroy
Rok vydání: 2024
Předmět:
Druh dokumentu: Working Paper
Popis: Determining the assembly index of a molecule, which aims to find the least number of steps required to make its molecular graph by recursively using previously made structures, is a novel problem seeking to quantify the minimum number of constraints required to build a given molecular graph which has wide applications from biosignature detection to cheminformatics including drug discovery. In this article, we consider this problem from an algorithmic perspective and propose an exact algorithm to efficiently find assembly indexes of large molecules including some natural products. To achieve this, we start by identifying the largest possible duplicate sub-graphs during the sub-graph enumeration process and subsequently implement a dynamic programming strategy with a branch and bound heuristic to exploit already used duplicates and reject impossible states in the enumeration. To do so efficiently, we introduce the assembly state data-structure as an array of edge-lists that keeps track of the graph fragmentation, by keeping the last fragmented sub-graph as its first element. By a precise manipulation of this data-structure we can efficiently perform each fragmentation step and reconstruct an exact minimal pathway construction for the molecular graph. These techniques are shown to compute assembly indices of many large molecules with speed and memory efficiency. Finally, we demonstrate the strength of our approach with different benchmarks, including calculating assembly indices of hundreds of thousands molecules from the COCONUT natural product database.
Comment: 30 pages, 7 figures, 33 references
Databáze: arXiv