Pheno-Ranker: a toolkit for comparison of phenotypic data stored in GA4GH standards and beyond.
Autor: | Leist IC; Centro Nacional de Análisis Genómico, C/Baldiri Reixac 4, 08028, Barcelona, Spain.; Universitat de Barcelona (UB), Barcelona, Spain., Rivas-Torrubia M; Pfizer-University of Granada-Junta de Andalucía Centre for Genomics and Oncological Research, Granada, Spain., Alarcón-Riquelme ME; Pfizer-University of Granada-Junta de Andalucía Centre for Genomics and Oncological Research, Granada, Spain.; Institute of Environmental Medicine, Karolinska Institute, Stockholm, Sweden., Barturen G; Pfizer-University of Granada-Junta de Andalucía Centre for Genomics and Oncological Research, Granada, Spain.; Department of Genetics, Faculty of Science, University of Granada, 18071, Granada, Spain.; Bioinformatics Laboratory, Centro de Investigación Biomédica, Biotechnology Institute, PTS, Avda del Conocimiento S/N, 18100, Granada, Spain., Consortium PC; Pfizer-University of Granada-Junta de Andalucía Centre for Genomics and Oncological Research, Granada, Spain., Gut IG; Centro Nacional de Análisis Genómico, C/Baldiri Reixac 4, 08028, Barcelona, Spain.; Universitat de Barcelona (UB), Barcelona, Spain., Rueda M; Centro Nacional de Análisis Genómico, C/Baldiri Reixac 4, 08028, Barcelona, Spain. manuel.rueda@cnag.eu.; Universitat de Barcelona (UB), Barcelona, Spain. manuel.rueda@cnag.eu. |
---|---|
Jazyk: | angličtina |
Zdroj: | BMC bioinformatics [BMC Bioinformatics] 2024 Dec 04; Vol. 25 (1), pp. 373. Date of Electronic Publication: 2024 Dec 04. |
DOI: | 10.1186/s12859-024-05993-2 |
Abstrakt: | Background: Phenotypic data comparison is essential for disease association studies, patient stratification, and genotype-phenotype correlation analysis. To support these efforts, the Global Alliance for Genomics and Health (GA4GH) established Phenopackets v2 and Beacon v2 standards for storing, sharing, and discovering genomic and phenotypic data. These standards provide a consistent framework for organizing biological data, simplifying their transformation into computer-friendly formats. However, matching participants using GA4GH-based formats remains challenging, as current methods are not fully compatible, limiting their effectiveness. Results: Here, we introduce Pheno-Ranker, an open-source software toolkit for individual-level comparison of phenotypic data. As input, it accepts JSON/YAML data exchange formats from Beacon v2 and Phenopackets v2 data models, as well as any data structure encoded in JSON, YAML, or CSV formats. Internally, the hierarchical data structure is flattened to one dimension and then transformed through one-hot encoding. This allows for efficient pairwise (all-to-all) comparisons within cohorts or for matching of a patient's profile in cohorts. Users have the flexibility to refine their comparisons by including or excluding terms, applying weights to variables, and obtaining statistical significance through Z-scores and p-values. The output consists of text files, which can be further analyzed using unsupervised learning techniques, such as clustering or multidimensional scaling (MDS), and with graph analytics. Pheno-Ranker's performance has been validated with simulated and synthetic data, showing its accuracy, robustness, and efficiency across various health data scenarios. A real data use case from the PRECISESADS study highlights its practical utility in clinical research. Conclusions: Pheno-Ranker is a user-friendly, lightweight software for semantic similarity analysis of phenotypic data in Beacon v2 and Phenopackets v2 formats, extendable to other data types. It enables the comparison of a wide range of variables beyond HPO or OMIM terms while preserving full context. The software is designed as a command-line tool with additional utilities for CSV import, data simulation, summary statistics plotting, and QR code generation. For interactive analysis, it also includes a web-based user interface built with R Shiny. Links to the online documentation, including a Google Colab tutorial, and the tool's source code are available on the project home page: https://github.com/CNAG-Biomedical-Informatics/pheno-ranker . Competing Interests: Declarations. Ethics approval and consent to participate: The Ethical Review Boards of the 18 participating institutions approved the protocol of the cross-sectional study (see Additional File 4: Part B for the names of the ethics committees). In addition, the boards of the 6 sites involved approved the inception study protocol. The studies adhered to the standards set by the International Conference on Harmonization and Good Clinical Practice (ICH-GCP), and to the ethical principles that have their origin in the Declaration of Helsinki (2013). All study participants provided written informed consent prior to their enrolment in the PRECISESADS project. The protection of the confidentiality of records that could identify the included subjects is ensured as defined by the EU Directive 2001/20/EC and the applicable national and international requirements relating to data protection in each participating country. The CS study is registered with number NCT02890121, and the inception study with number NCT02890134 in ClinicalTrials.gov. Consent for publication: Not applicable. Competing interests: The authors declare that they have no competing interest. (© 2024. The Author(s).) |
Databáze: | MEDLINE |
Externí odkaz: | |
Nepřihlášeným uživatelům se plný text nezobrazuje | K zobrazení výsledku je třeba se přihlásit. |