A system for phenotype harmonization in the National Heart, Lung, and Blood Institute Trans-Omics for Precision Medicine (TOPMed) program

Autor:	Tanika N. Kelly, May E Montasser, Alyna T. Khan, Laura M. Raffield, Carla Wilson, Elizabeth C. Oelsner, Kerri L. Wiggins, Ming-Huei Chen, Gina M. Peloso, Adolfo Correa, Andrew D. Johnson, Donna K. Arnett, Xiuqing Guo, Jai G. Broome, Daniel E. Weeks, Rebecca D. Jackson, Lucia Juarez, Stephen T. McGarvey, Pradeep Natarajan, Braxton D. Mitchell, Kent D. Taylor, Bruce M. Psaty, Santhi K Ganesh, Cathy C. Laurie, Nicola L. Hawley, Leslie S. Emery, Adrienne M. Stilp, Alanna C. Morrison, Jennifer A Smith, Charles Kooperberg, Catherine M. D’Augustine, Jan Graffelman, Paul S. de Vries, Chancellor Hohensee, Sharon L R Kardia, Patricia A Peyser, Wan-Ling Hsu, Erin J Buth, Kathleen C. Barnes, Susan R. Heckbert, Ramachandran S. Vasan, Nathan Pankratz, Karen M. Mutalik, Quenna Wong, Brian E. Cade, Jingmin Liu, Joshua C. Bis, Cecelia A. Laurie, Kari E. North, Fei Fei Wang, Mariza de Andrade, Nancy L. Heard-Costa, William Craig Johnson, L. Adrienne Cupples, Scott T. Weiss, Seyed Mehdi Nouraie, Patrick T. Ellinor, Jerome I. Rotter, Weiniu Gan, Shannon Kelly, Stephen S. Rich, Cashell E. Jaquish, Dongquan Chen, Nora Franceschini, Lisa R. Yanek, Jiwon Lee, Alexander P. Reiner, Megan L. Grove, Stella Aslibekyan, Myriam Fornage, Lawrence F Bielak, Rasika A. Mathias
Přispěvatelé:	Universitat Politècnica de Catalunya. Departament d'Estadística i Investigació Operativa, Universitat Politècnica de Catalunya. COSDA-UPC - COmpositional and Spatial Data Analysis
Jazyk:	angličtina
Rok vydání:	2021
Předmět:	0301 basic medicine Program evaluation Computer science Epidemiology common data elements hematologic disease Matemàtiques i estadística::Matemàtica aplicada a les ciències [Àrees temàtiques de la UPC] Medical and Health Sciences Mathematical Sciences 0302 clinical medicine Documentation cardiovascular disease and Blood Institute (U.S.) 030212 general & internal medicine Phenomics Precision Medicine Lung lung diseases Sleep-wake disorders phenotypes 92 Biology and other natural sciences::92B Mathematical biology in general [Classificació AMS] Common data elements Cardiovascular disease Phenotype Phenotypes Biomatemàtica Information Dissemination Harmonization 62 Statistics::62D05 Sampling theory sample surveys [Classificació AMS] Hematologic disease 03 medical and health sciences Data Aggregation Clinical Research Controlled vocabulary Genetics Humans AcademicSubjects/MED00860 sleep-wake disorders Sampling (Statistics) Genetic Association Studies Lung diseases Biomathematics Data collection Study Design Matemàtiques i estadística::Estadística aplicada::Estadística biosanitària [Àrees temàtiques de la UPC] Information dissemination Human Genome National Heart Precision medicine Data science United States 030104 developmental biology Good Health and Well Being National Heart Lung and Blood Institute (U.S.) Mostreig (Estadística) Program Evaluation
Zdroj:	UPCommons. Portal del coneixement obert de la UPC Universitat Politècnica de Catalunya (UPC) American Journal of Epidemiology American journal of epidemiology, vol 190, iss 10
Popis:	Genotype-phenotype association studies often combine phenotype data from multiple studies to increase statistical power. Harmonization of the data usually requires substantial effort due to heterogeneity in phenotype definitions, study design, data collection procedures, and data-set organization. Here we describe a centralized system for phenotype harmonization that includes input from phenotype domain and study experts, quality control, documentation, reproducible results, and data-sharing mechanisms. This system was developed for the National Heart, Lung, and Blood Institute’s Trans-Omics for Precision Medicine (TOPMed) program, which is generating genomic and other -omics data for more than 80 studies with extensive phenotype data. To date, 63 phenotypes have been harmonized across thousands of participants (recruited in 1948–2012) from up to 17 studies per phenotype. Here we discuss challenges in this undertaking and how they were addressed. The harmonized phenotype data and associated documentation have been submitted to National Institutes of Health data repositories for controlled access by the scientific community. We also provide materials to facilitate future harmonization efforts by the community, which include 1) the software code used to generate the 63 harmonized phenotypes, enabling others to reproduce, modify, or extend these harmonizations to additional studies, and 2) the results of labeling thousands of phenotype variables with controlled vocabulary terms.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::0380088474cfe1cc5218a3fad8da984a http://hdl.handle.net/2117/359840 Zobrazit plný text záznamu