Training Neural Networks for Protein Secondary Structure Prediction: The Effects of Imbalanced Data Set
Autor: | Jefferson Luiz Brum Marques, Viviane Palodeto, Hernán Terenzi |
---|---|
Rok vydání: | 2009 |
Předmět: | |
Zdroj: | Emerging Intelligent Computing Technology and Applications. With Aspects of Artificial Intelligence ISBN: 9783642040191 ICIC (2) |
DOI: | 10.1007/978-3-642-04020-7_28 |
Popis: | Protein secondary structure prediction (PSSP) is one of the main tasks in computational biology. During the last few decades, much effort has been made towards solving this problem, with various approaches, mainly artificial neural networks (ANN). Generally, in order to predict the protein secondary structure, the ANN training process is performed using CB513 data set. Like protein structures databases, this data set is imbalanced and it can cause a low error rate for the majority class and an undesirable error rate for the minority class. In this paper we evaluate the effects of an imbalanced data set in training and learning of neural networks when they are applied to predict protein secondary structure. For this we applied resampling methods to tackle the imbalance class problem. Results show that imbalanced data sets decrease the helixes predictions rates. Although, protein data set distribution does not affect significantly the global accuracy (Q3). |
Databáze: | OpenAIRE |
Externí odkaz: |