Popis: |
The EVA structural descriptor, based upon calculated fundamental molecular vibrational frequencies, has proved to be an effective descriptor for both QSAR and database similarity calculations. The descriptor is sensitive to 3D structure but has an advantage over field-based 3D-QSAR methods inasmuch as structural superposition is not required. The original technique involves a standardisation method wherein uniform Gaussians of fixed standard deviation (sigma) are used to smear out frequencies projected onto a linear scale. The smearing function permits the overlap of proximal frequencies and thence the extraction of a fixed dimensional descriptor regardless of the number and precise values of the frequencies. It is proposed here that there exist optimal localised values of sigma in different spectral regions; that is, the overlap of frequencies using uniform Gaussians may, at certain points in the spectrum, either be insufficient to pick up relationships where they exist or mix up information to such an extent that significant correlations are obscured by noise. A genetic algorithm is used to search for optimal localised sigma values using crossvalidated PLS regression scores as the fitness score to be optimised. The resultant models were then validated against a previously unseen test set of compounds and through data scrambling. The performance of EVA_GA is compared to that of EVA and analogous CoMFA studies; in the latter case a brief evaluation is made of the effect of grid resolution upon the stability of CoMFA PLS scores particularly in relation to test set predictions. |