A cross-organism framework for supervised enhancer prediction with epigenetic pattern recognition and targeted validation

Autor:	Ingrid Plajzer-Frick, Anne N. Harrington, Chengfei Yan, Iros Barozzi, Len A. Pennacchio, Joel Rozowsky, Yoko Fukuda-Yuzawa, Diane E. Dickel, Kevin Y. Yip, Axel Visel, Elizabeth Lee, Richard E. Sutton, Momoe Kato, Jennifer A. Akiyama, Quan Pham, Tyler H. Garvin, Mark Gerstein, Veena Afzal, Brandon J. Mannion, Catherine S. Pickle, Mengting Gu, Landon L Chan, Emrah Gumusgoz, Anurag Sethi, Koon-Kiu Yan
Rok vydání:	2018
Předmět:	0303 health sciences Computer science Human Genome Promoter Computational biology 03 medical and health sciences 0302 clinical medicine Pattern recognition (psychology) Genetics Epigenetics Generic health relevance Enhancer 030217 neurology & neurosurgery 030304 developmental biology Biotechnology
Popis:	Author(s): Sethi, Anurag; Gu, Mengting; Gumusgoz, Emrah; Chan, Landon; Yan, Koon-Kiu; Rozowsky, Joel; Barozzi, Iros; Afzal, Veena; Akiyama, Jennifer; Plajzer-Frick, Ingrid; Yan, Chengfei; Pickle, Catherine; Kato, Momoe; Garvin, Tyler; Pham, Quan; Harrington, Anne; Mannion, Brandon; Lee, Elizabeth; Fukuda-Yuzawa, Yoko; Visel, Axel; Dickel, Diane; Yip, Kevin; Sutton, Richard; Pennacchio, Len; Gerstein, Mark \| Abstract: Enhancers are important noncoding elements, but they have been traditionally hard to characterize experimentally. Only a few mammalian enhancers have been validated, making it difficult to train statistical models for their identification properly. Instead, postulated patterns of genomic features have been used heuristically for identification. The development of massively parallel assays allows for the characterization of large numbers of enhancers for the first time. Here, we developed a framework that uses Drosophila STARR-seq data to create shape-matching filters based on enhancer-associated meta-profiles of epigenetic features. We combined these features with supervised machine learning algorithms (e.g., support vector machines) to predict enhancers. We demonstrated that our model could be applied to predict enhancers in mammalian species (i.e., mouse and human). We comprehensively validated the predictions using a combination of in vivo and in vitro approaches, involving transgenic assays in mouse and transduction-based reporter assays in human cell lines. Overall, the validations involved 153 enhancers in 6 mouse tissues and 4 human cell lines. The results confirmed that our model can accurately predict enhancers in different species without re-parameterization. Finally, we examined the transcription-factor binding patterns at predicted enhancers and promoters in human cell lines. We demonstrated that these patterns enable the construction of a secondary model effectively discriminating between enhancers and promoters.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::bfb2912ca109fa381dfaefd7237bde97 https://escholarship.org/uc/item/08b2r0px Zobrazit plný text záznamu