Optimization of Spaced K-mer Frequency Feature Extraction using Genetic Algorithms for Metagenome Fragment Classification
Autor: | Arini Aha Pekuwali, Wisnu Ananta Kusuma, Agus Buono |
---|---|
Rok vydání: | 2018 |
Předmět: |
Information Systems and Management
General Computer Science business.industry Feature extraction Pattern recognition TK5101-6720 Information technology T58.5-58.64 Measure (mathematics) metagenome Dimension (vector space) Fragment (logic) Chromosome (genetic algorithm) naïve Bayesian classifier Feature (computer vision) k-mer spaced k-mers Genetic algorithm genetic algorithm Telecommunication Artificial intelligence Electrical and Electronic Engineering business k-mers Mathematics |
Zdroj: | Journal of ICT Research and Applications, Vol 12, Iss 2 (2018) |
ISSN: | 2338-5499 2337-5787 |
DOI: | 10.5614/itbj.ict.res.appl.2018.12.2.2 |
Popis: | K -mer frequencies are commonly used in extracting features from metagenome fragments. In spite of this, researchers have found that their use is still inefficient. In this research, a genetic algorithm was employed to find optimally spaced k -mers. These were obtained by generating the possible combinations of match positions and don’t care positions (written as *). This approach was adopted from the concept of spaced seeds in PatternHunter. The use of spaced k -mers could reduce the size of the k -mer frequency feature’s dimension. To measure the accuracy of the proposed method we used the naive Bayesian classifier (NBC). The result showed that the chromosome 111111110001, representing spaced k -mer model [111 1111 10001], was the best chromosome, with a higher fitness (85.42) than that of the k -mer frequency feature. Moreover, the proposed approach also reduced the feature extraction time. |
Databáze: | OpenAIRE |
Externí odkaz: |