Current Developments in the Analysis of Proteomic Data: Artificial Neural Network Data Mining Techniques for the Identification of Proteomic Biomarkers Related to Breast Cancer

Autor: Ian O. Ellis, Robert C. Rees, Lee Lancashire, Shahid Mian, Graham Ball
Rok vydání: 2005
Předmět:
Zdroj: Current Proteomics. 2:15-29
ISSN: 1570-1646
Popis: Artificial Neural Network (ANN) techniques are becoming increasing popular in many areas of the biological sciences for the analysis of complex data. Careful selection of key parameters when developing ANN models and algorithms is extremely important in order to create generalised models with real-world applicability. This study applies these approaches to the analysis of proteomic data generated using Surface Enhanced Laser Desorption/Ionisation mass spectrometry profiling of cell lines from patients with breast cancer. Examples of these approaches include constrained architecture, Correlated Activity Pruning (CAPing), appropriate training termination methods and other, more advanced methodologies such as parameterisat ion by weightings analysis and stepwise additive approaches. These approaches, when applied to breast cancer cell lines from actual patients, resulted in the identification of 8 protein/peptide molecular ions which were capable of classifying samples into their respective groups to an accuracy of 94.8 % with an area under the curve value of 0.993 when examined with a receiver operating characteristic curve. Several ions which appear to show a significant up or down-regulation with regards to treatment regimen have also been identified. These results indicate that when coupled with other powerful techniques, the development of these novel methodologies and algorithms using ANNs allows for the development of effective data mining tools in order to analyse complex, non-linear, noisy data. This paper will consider current methodologies for the analysis of proteomic data using Artificial Neural Network (ANN) based methodologies, their advantages, disadvan- tages and limitations, and then will describe an application of novel methodologies developed using actual patient data. ANN techniques have been widely applied to many areas of the physical sciences for the analysis of complex systems. As such, extensive knowledge exists on the application and limitations of these methods. Similarly, methodologies exist to overcome many of these limitations and enhance the pre- dictive capabilities and real-world applicability of developed models. This study applies these approaches to the analysis of proteomic data generated using Surface Enhanced Laser Desorption/Ionisation (SELDI) mass spectrometry (MS) profiling with the aim of identifying candidate biomarkers indicative of treatment regimen for chemosensitive (MCF-7 and T47-D) breast cancer cell lines, in order to develop ANN algorithms to correctly assign samples into their appropriate class of either control or drug treated. Examples of these approaches and important parameters which need to be considered when developing ANN models will be discussed, followed by methodologies employed in order to create generalised models with real- world applicability.
Databáze: OpenAIRE