Patterns of Unwanted Biological and Technical Expression Variation Among 49 Human Tissues.

Autor: Nieuwenhuis TO; Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, Maryland; McKusick-Nathans Institute, Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland., Giles HH; McKusick-Nathans Institute, Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland., Arking JVA; Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, Maryland., Patil AH; Lieber Institute for Brain Development, Baltimore, Maryland., Shi W; McKusick-Nathans Institute, Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland., McCall MN; Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, New York; Department of Biomedical Genetics, University of Rochester Medical Center, Rochester, New York., Halushka MK; Pathology and Laboratory Medicine Institute, Cleveland Clinic, Cleveland, Ohio. Electronic address: halushm@ccf.org.
Jazyk: angličtina
Zdroj: Laboratory investigation; a journal of technical methods and pathology [Lab Invest] 2024 Jun; Vol. 104 (6), pp. 102069. Date of Electronic Publication: 2024 Apr 24.
DOI: 10.1016/j.labinv.2024.102069
Abstrakt: Tissue gene expression studies are impacted by biological and technical sources of variation, which can be broadly classified into wanted and unwanted variation. The latter, if not addressed, results in misleading biological conclusions. Methods have been proposed to reduce unwanted variation, such as normalization and batch correction. A more accurate understanding of all causes of variation could significantly improve the ability of these methods to remove unwanted variation while retaining variation corresponding to the biological question of interest. We used 17,282 samples from 49 human tissues in the Genotype-Tissue Expression data set (v8) to investigate patterns and causes of expression variation. Transcript expression was transformed to z-scores, and only the most variable 2% of transcripts were evaluated and clustered based on coexpression patterns. Clustered gene sets were assigned to different biological or technical causes based on histologic appearances and metadata elements. We identified 522 variable transcript clusters (median: 11 per tissue) among the samples. Of these, 63% were confidently explained, 16% were likely explained, 7% were low confidence explanations, and 14% had no clear cause. Histologic analysis annotated 46 clusters. Other common causes of variability included sex, sequencing contamination, immunoglobulin diversity, and compositional tissue differences. Less common biological causes included death interval (Hardy score), disease status, and age. Technical causes included blood draw timing and harvesting differences. Many of the causes of variation in bulk tissue expression were identifiable in the Tabula Sapiens data set of single-cell expression. This is among the largest explorations of the underlying sources of tissue expression variation. It uncovered expected and unexpected causes of variable gene expression and demonstrated the utility of matched histologic specimens. It further demonstrated the value of acquiring meaningful tissue harvesting metadata elements to use for improved normalization, batch correction, and analysis of both bulk and single-cell RNA-seq data.
(Copyright © 2024 The Authors. Published by Elsevier Inc. All rights reserved.)
Databáze: MEDLINE