Bipartite networks represent causality better than simple networks: evidence, algorithms, and applications

Autor:	Bingran Shen, Gloria M. Coruzzi, Dennis Shasha
Jazyk:	angličtina
Rok vydání:	2024
Předmět:	RNA sequencing gene regulatory network causal inference random forest bipartite network Genetics QH426-470
Zdroj:	Frontiers in Genetics, Vol 15 (2024)
Druh dokumentu:	article
ISSN:	1664-8021
DOI:	10.3389/fgene.2024.1371607
Popis:	A network, whose nodes are genes and whose directed edges represent positive or negative influences of a regulatory gene and its targets, is often used as a representation of causality. To infer a network, researchers often develop a machine learning model and then evaluate the model based on its match with experimentally verified “gold standard” edges. The desired result of such a model is a network that may extend the gold standard edges. Since networks are a form of visual representation, one can compare their utility with architectural or machine blueprints. Blueprints are clearly useful because they provide precise guidance to builders in construction. If the primary role of gene regulatory networks is to characterize causality, then such networks should be good tools of prediction because prediction is the actionable benefit of knowing causality. But are they? In this paper, we compare prediction quality based on “gold standard” regulatory edges from previous experimental work with non-linear models inferred from time series data across four different species. We show that the same non-linear machine learning models have better predictive performance, with improvements from 5.3% to 25.3% in terms of the reduction in the root mean square error (RMSE) compared with the same models based on the gold standard edges. Having established that networks fail to characterize causality properly, we suggest that causality research should focus on four goals: (i) predictive accuracy; (ii) a parsimonious enumeration of predictive regulatory genes for each target gene g; (iii) the identification of disjoint sets of predictive regulatory genes for each target g of roughly equal accuracy; and (iv) the construction of a bipartite network (whose node types are genes and models) representation of causality. We provide algorithms for all goals.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/a8f8250e63a84b3fb9aca64f164d09fa Zobrazit plný text záznamu View record in DOAJ