The Experimentally Obtained Functional Impact Assessments of 5' Splice Site GT>GC Variants Differ Markedly from Those Predicted

Autor: Emmanuelle Masson, David N. Cooper, Matthew J. Hayden, Zhuan Liao, Jin-Huan Lin, Jian-Min Chen, Claude Férec
Přispěvatelé: Génétique, génomique fonctionnelle et biotechnologies (UMR 1078) (GGB), Institut Brestois Santé Agro Matière (IBSAM), Université de Brest (UBO)-Université de Brest (UBO)-EFS-Institut National de la Santé et de la Recherche Médicale (INSERM), Changhai Hospital, Shanghai Second Military Medical University, Shanghai Institute of Pancreatic Diseases [Shanghai, China], Centre Hospitalier Régional Universitaire de Brest (CHRU Brest), Cardiff University, PODEUR, Sophie
Jazyk: angličtina
Rok vydání: 2020
Předmět:
Zdroj: Current Genomics
Current Genomics, Bentham Science Publishers, 2020, 21 (1), pp.56-66. ⟨10.2174/1389202921666200210141701⟩
ISSN: 1389-2029
Popis: Introduction: 5' splice site GT>GC or +2T>C variants have been frequently reported to cause human genetic disease and are routinely scored as pathogenic splicing mutations. However, we have recently demonstrated that such variants in human disease genes may not invariably be pathogenic. Moreover, we found that no splicing prediction tools appear to be capable of reliably distinguishing those +2T>C variants that generate wild-type transcripts from those that do not. Methodology: Herein, we evaluated the performance of a novel deep learning-based tool, SpliceAI, in the context of three datasets of +2T>C variants, all of which had been characterized functionally in terms of their impact on pre-mRNA splicing. The first two datasets refer to our recently described “in vivo” dataset of 45 known disease-causing +2T>C variants and the “in vitro” dataset of 103 +2T>C substitutions subjected to full-length gene splicing assay. The third dataset comprised 12 BRCA1 +2T>C variants that were recently analyzed by saturation genome editing. Results: Comparison of the SpliceAI-predicted and experimentally obtained functional impact assessments of these variants (and smaller datasets of +2T>A and +2T>G variants) revealed that although SpliceAI performed rather better than other prediction tools, it was still far from perfect. A key issue was that the impact of those +2T>C (and +2T>A) variants that generated wild-type transcripts represents a quantitative change that can vary from barely detectable to an almost full expression of wild-type transcripts, with wild-type transcripts often co-existing with aberrantly spliced transcripts. Conclusion: Our findings highlight the challenges that we still face in attempting to accurately identify splice-altering variants.
Databáze: OpenAIRE