BioVDB: biological vector database for high-throughput gene expression meta-analysis.
Autor: | Winnicki MJ; Genes and Human Disease Research Program, Oklahoma Medical Research Foundation, Oklahoma City, OK, United States., Brown CA; Genes and Human Disease Research Program, Oklahoma Medical Research Foundation, Oklahoma City, OK, United States.; Oklahoma Center for Neuroscience, University of Oklahoma Health Sciences Center, Oklahoma City, OK, United States., Porter HL; Genes and Human Disease Research Program, Oklahoma Medical Research Foundation, Oklahoma City, OK, United States., Giles CB; Genes and Human Disease Research Program, Oklahoma Medical Research Foundation, Oklahoma City, OK, United States., Wren JD; Genes and Human Disease Research Program, Oklahoma Medical Research Foundation, Oklahoma City, OK, United States.; Oklahoma Center for Neuroscience, University of Oklahoma Health Sciences Center, Oklahoma City, OK, United States.; Department of Biochemistry and Molecular Biology, University of Oklahoma Health Sciences Center, Oklahoma City, OK, United States.; Oklahoma Nathan Shock Center, Oklahoma City, OK, United States. |
---|---|
Jazyk: | angličtina |
Zdroj: | Frontiers in artificial intelligence [Front Artif Intell] 2024 Mar 08; Vol. 7, pp. 1366273. Date of Electronic Publication: 2024 Mar 08 (Print Publication: 2024). |
DOI: | 10.3389/frai.2024.1366273 |
Abstrakt: | High-throughput sequencing has created an exponential increase in the amount of gene expression data, much of which is freely, publicly available in repositories such as NCBI's Gene Expression Omnibus (GEO). Querying this data for patterns such as similarity and distance, however, becomes increasingly challenging as the total amount of data increases. Furthermore, vectorization of the data is commonly required in Artificial Intelligence and Machine Learning (AI/ML) approaches. We present BioVDB, a vector database for storage and analysis of gene expression data, which enhances the potential for integrating biological studies with AI/ML tools. We used a previously developed approach called Automatic Label Extraction (ALE) to extract sample labels from metadata, including age, sex, and tissue/cell-line. BioVDB stores 438,562 samples from eight microarray GEO platforms. We show that it allows for efficient querying of data using similarity search, which can also be useful for identifying and inferring missing labels of samples, and for rapid similarity analysis. Competing Interests: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The author(s) declared that they were an editorial board member of Frontiers, at the time of submission. This had no impact on the peer review process and the final decision. (Copyright © 2024 Winnicki, Brown, Porter, Giles and Wren.) |
Databáze: | MEDLINE |
Externí odkaz: |