Popis: |
Tumors are complex masses composed of malignant and non-malignant cells. Variation in tumor purity (malignant cell fraction) can both confound integrative analysis and enable studies of tumor heterogeneity. Here we developed PUREE, which uses a weakly supervised learning approach to infer tumor purity from a tumor gene expression profile. PUREE was trained on gene expression data and genomic consensus purity estimates from approximately 8000 solid tumor samples. Using a linear model based on 170 input genes, PUREE predicted purity with high accuracy across distinct solid tumor types and generalized to tumor samples from unseen tumor types. PUREE input genes features were further validated using single-cell RNA-seq data from distinct tumor types. In a comprehensive benchmark, PUREE outperformed all existing transcriptome-based purity estimation approaches. We also show that the accuracy of a pan-cancer model is comparable to models optimized for individual tumor types, highlighting compositional properties of the tumor microenvironment conserved across tumor types. Overall, PUREE is a highly accurate and versatile method for estimating tumor purity and interrogating tumor heterogeneity from bulk tumor gene expression data. Citation Format: Egor Revkov, Ken W.-K. Sung, Anders J. Skanderup. Accurate pan-cancer tumor purity estimation from gene expression data [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr 1941. |