Popis: |
Protein sequences can be broadly categorized into two classes: those which adopt stable secondary structure and fold into a domain (i.e., globular proteins), and those that do not. This latter class of sequences are conformationally heterogeneous and are described as being intrinsically disordered. Bioinformatics enable sub-classification of globular sequences by domain type, an approach that has revolutionized how we understand and predict protein functionality. Conversely, it is unknown if sequences of disordered protein regions are subject to broadly generalizable organizational principles that would enable their sub-classification. Here we report the development of a statistical approach that quantifies linear variance in amino acid composition across a sequence. With multiple examples we provide evidence that intrinsically disordered regions, but not globular domains, are organized into statistically non-random modules of specific compositional bias. Modularity is observed for both low and high complexity sequences and, in certain cases, we find that modules are organized in repetitive patterns. These data demonstrate that disordered sequences are non-randomly organized and motivate future experiments to comprehensively classify module types and to determine if modules, like the globular protein’s domain, represent functionally separable units. |