Dataset of bulged G-quadruplex forming sequences in the human genome.

Autor: Papp C; Department of Urology, Department of Biochemistry and Molecular Biology, SUNY Upstate Medical University, Syracuse, NY 13210, USA., Jenjaroenpun P; Division of Bioinformatics and Data Management for Research, Research Group and Research Network Division, Research Department, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand.; Bioinformatics Institute, A*STAR Biomedical Institutes, Singapore, Singapore., Mukundan VT; School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore., Phan AT; School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore.; NTU Institute of Structural Biology, Nanyang Technological University, Singapore 636921, Singapore., Kuznetsov VA; Department of Urology, Department of Biochemistry and Molecular Biology, SUNY Upstate Medical University, Syracuse, NY 13210, USA.; Bioinformatics Institute, A*STAR Biomedical Institutes, Singapore, Singapore.
Jazyk: angličtina
Zdroj: Data in brief [Data Brief] 2023 Sep 06; Vol. 50, pp. 109550. Date of Electronic Publication: 2023 Sep 06 (Print Publication: 2023).
DOI: 10.1016/j.dib.2023.109550
Abstrakt: When several continuous guanine runs are present closely in a nucleic acid sequence, a secondary structure called G-quadruplex can form (G4s). Such structures in the genome could serve as structural and functional regulators in gene expression, DNA-protein binding, epigenetic modification, and genotoxic stress. Several types of G4-forming DNA sequences exist, including bulged G4-forming sequences (G4-BS). Such bulges occur due to the presence of non-guanine bases in specific locations (G-runs) in the G4-forming sequences. At present, search algorithms do not identify stable G4-BS conformations, making genome-wide studies of G4-like structures difficult. Data provided in this study are related to a published article "Stable bulged G-quadruplexes in the human genome: Identification, experimental validation and functionalization" published by Nucleic Acids Research [DIO.org/10.193/nar/gkad252]. Based on our studies in vitro and G4-seq and G4 CUT&Tag data analysis, we have specified and validated three pG4-BS models. In this article, a large collection of 'raw' (unfiltered) dataset is presented, which includes three subfamilies of pG4-BS. For each of pG4-BS, we provide strand-specific genomic boundaries. Data on pG4-BS might be useful in elucidating their structural, functional, and evolutionary roles. Furthermore, they may provide insight into the pathobiology of G4-like structures and their potential therapeutic applications.
(Published by Elsevier Inc.)
Databáze: MEDLINE