RNA secondary structure packages evaluated and improved by high-throughput experiments.

Autor: Wayment-Steele HK; Department of Chemistry, Stanford University, Stanford, CA, USA.; Eterna Massive Open Laboratory, Stanford, CA, USA., Kladwang W; Eterna Massive Open Laboratory, Stanford, CA, USA.; Department of Biochemistry, Stanford University, Stanford, CA, USA., Strom AI; Department of Biochemistry, Stanford University, Stanford, CA, USA.; Department of Chemistry and Biochemistry, San Diego State University, San Diego, CA, USA., Lee J; Eterna Massive Open Laboratory, Stanford, CA, USA.; Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA., Treuille A; Eterna Massive Open Laboratory, Stanford, CA, USA.; Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA., Becka A; Department of Biochemistry, Stanford University, Stanford, CA, USA., Das R; Eterna Massive Open Laboratory, Stanford, CA, USA. rhiju@stanford.edu.; Department of Biochemistry, Stanford University, Stanford, CA, USA. rhiju@stanford.edu.; Department of Physics, Stanford University, Stanford, CA, USA. rhiju@stanford.edu.
Jazyk: angličtina
Zdroj: Nature methods [Nat Methods] 2022 Oct; Vol. 19 (10), pp. 1234-1242. Date of Electronic Publication: 2022 Oct 03.
DOI: 10.1038/s41592-022-01605-0
Abstrakt: Despite the popularity of computer-aided study and design of RNA molecules, little is known about the accuracy of commonly used structure modeling packages in tasks sensitive to ensemble properties of RNA. Here, we demonstrate that the EternaBench dataset, a set of more than 20,000 synthetic RNA constructs designed on the RNA design platform Eterna, provides incisive discriminative power in evaluating current packages in ensemble-oriented structure prediction tasks. We find that CONTRAfold and RNAsoft, packages with parameters derived through statistical learning, achieve consistently higher accuracy than more widely used packages in their standard settings, which derive parameters primarily from thermodynamic experiments. We hypothesized that training a multitask model with the varied data types in EternaBench might improve inference on ensemble-based prediction tasks. Indeed, the resulting model, named EternaFold, demonstrated improved performance that generalizes to diverse external datasets including complete messenger RNAs, viral genomes probed in human cells and synthetic designs modeling mRNA vaccines.
(© 2022. The Author(s), under exclusive licence to Springer Nature America, Inc.)
Databáze: MEDLINE