Application of k-means clustering, linear discriminant analysis and multivariate linear regression for the development of a predictive QSAR model on 5-lipoxygenase inhibitors
Autor: | Matías F. Andrada, Mario R. Estrada, Esteban G. Vega-Hissi, Juan Cruz Martinez |
---|---|
Rok vydání: | 2015 |
Předmět: |
5-LIPOXYGENASE INHIBITORS
General linear model Quantitative structure–activity relationship QSAR Otras Ciencias Químicas Process Chemistry and Technology Ciencias Químicas k-means clustering Linear discriminant analysis Computer Science Applications Analytical Chemistry LINEAR DISCRIMINANT ANALYSIS Discriminant function analysis Bayesian multivariate linear regression Statistics K-MEANS CLUSTERING Cluster analysis MULTIVARIATE LINEAR REGRESSION CIENCIAS NATURALES Y EXACTAS Spectroscopy Software Selection (genetic algorithm) Mathematics |
Zdroj: | Chemometrics and Intelligent Laboratory Systems. 143:122-129 |
ISSN: | 0169-7439 |
DOI: | 10.1016/j.chemolab.2015.03.001 |
Popis: | In this work, we performed a quantitative structure activity relationship (QSAR) model for a family of 5-lipoxygenase (5-LOX) inhibitors using k-means clustering and linear discriminant analysis (LDA) for the selection of training and test sets and multivariate linear regression (MLR) for the independent variable selection. With the k-means clustering method, the total set of compounds (58 derivatives of 5-Benzylidene-2-phenylthiazolinones) was divided in two clusters according to a simple discriminant function. We found that piID (conventional bond order ID number) molecular descriptor discriminates correctly 100% of the compounds of each clusters. Thirty different models divided in three series were analyzed and the series with representative training and test sets (series 3) had the most predictive models. The statistical parameters of the best model are Rtrain=0.811 and Rtest=0.801. We found that a rational selection in the setting-up of training and test sets allows to obtain the most predictive models and the random selection is sometimes unsuitable, especially, when the total set of compounds can be classified in different clusters according to structural features. Fil: Andrada, Matias Fernando. Universidad Nacional de San Luis. Facultad de Química, Bioquímica y Farmacia; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina Fil: Vega Hissi, Esteban Gabriel. Universidad Nacional de San Luis. Facultad de Química, Bioquímica y Farmacia; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina Fil: Estrada, Mario Rinaldo. Universidad Nacional de San Luis. Facultad de Química, Bioquímica y Farmacia; Argentina Fil: Garro Martinez, Juan Ceferino. Universidad Nacional de San Luis. Facultad de Química, Bioquímica y Farmacia; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina |
Databáze: | OpenAIRE |
Externí odkaz: |