Popis: |
BACKGROUND.CASP experiment, ''critical assessment of structure predictions'', intended to discover advances in an ability of scientific groups to predict a structure of unknown protein from its sequence. The target sequences of proteins to be folded are chosen on each round. The challenge to fold a target from CASP is complicated and the structures of CASP targets are in some way different from an overall pool of known protein structures. The purpose of the study was to detect and quantify a difference between CASP targets and typical structures from the protein databank.METHODS. An averaged local complexity of a protein fold was measured in units of entropy using several metrics which reduce a fragment of a fold to a binary distribution. A complexity was measured for targets from the previous rounds of CASP. A subset of PDB structures was prepared and an averaged complexity of PDB structures was estimated. The choice of the metrics in the measurement of complexity did simulate some of the approaches which were used to predict structures in CASP competition. A measurement of a modified complexity was performed, which was based on averaged distributions for fold fragments in common PDB structures.RESULTS. A difference of CASP targets was detected by a metrics which uses hashing of distances between closely located residues. And a modified version of this metrics which emulates wide-range distance maps was shown to be most easily adjusted to utilize the difference between CASP targets and typical PDB structures. This means that, for the case of CASP targets, the methods which were trained on templates from PDB by similar metrices will guess the template structures in a new round of CASP more successfully – with an increased gap in their ability to predict neutrally selected protein structures. This means that software, which relies on inter-residue distances and performs well in CASP, will perform poorly in general-purpose structure prediction. |