LCD-Composer: an intuitive, composition-centric method enabling the identification and detailed functional mapping of low-complexity domains.

Autor: Cascarina SM; Department of Biochemistry and Molecular Biology, Colorado State University, Fort Collins, CO 80523, USA., King DC; Department of Biochemistry and Molecular Biology, Colorado State University, Fort Collins, CO 80523, USA., Osborne Nishimura E; Department of Biochemistry and Molecular Biology, Colorado State University, Fort Collins, CO 80523, USA., Ross ED; Department of Biochemistry and Molecular Biology, Colorado State University, Fort Collins, CO 80523, USA.
Jazyk: angličtina
Zdroj: NAR genomics and bioinformatics [NAR Genom Bioinform] 2021 May 26; Vol. 3 (2), pp. lqab048. Date of Electronic Publication: 2021 May 26 (Print Publication: 2021).
DOI: 10.1093/nargab/lqab048
Abstrakt: Low complexity domains (LCDs) in proteins are regions predominantly composed of a small subset of the possible amino acids. LCDs are involved in a variety of normal and pathological processes across all domains of life. Existing methods define LCDs using information-theoretical complexity thresholds, sequence alignment with repetitive regions, or statistical overrepresentation of amino acids relative to whole-proteome frequencies. While these methods have proven valuable, they are all indirectly quantifying amino acid composition, which is the fundamental and biologically-relevant feature related to protein sequence complexity. Here, we present a new computational tool, LCD-Composer, that directly identifies LCDs based on amino acid composition and linear amino acid dispersion. Using LCD-Composer's default parameters, we identified simple LCDs across all organisms available through UniProt and provide the resulting data in an accessible form as a resource. Furthermore, we describe large-scale differences between organisms from different domains of life and explore organisms with extreme LCD content for different LCD classes. Finally, we illustrate the versatility and specificity achievable with LCD-Composer by identifying diverse classes of LCDs using both simple and multifaceted composition criteria. We demonstrate that the ability to dissect LCDs based on these multifaceted criteria enhances the functional mapping and classification of LCDs.
(© The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.)
Databáze: MEDLINE