Abstrakt: |
Genome editing is a novel technique to precisely manipulate genomic nucleotide in various organisms. The type II CRISPR/Cas9 system which is a part of the adaptive immune system of S. Pyogenes bacteria have governed this generation of genome engineering as it is straightforward to program and use. In a CRISPR/Cas system, the short gRNA sequence (20bp) controls the quality (accuracy and precision) of DNA cleavage. Even though various machine learning classifier algorithms are already being developed to evaluate the efficiency of gRNA and to predict off-targets but, there exist a discrepancy between predictions and experimentally observed results. A comprehensive analysis is required to identify a reliable CRISPR/Cas prediction algorithm. In this study, we aim to filter efficient classifier for evaluating CRISPR gRNA efficiency by exploring various classification algorithms on experimentally verified datasets of CRISPR. Also, we did a comparative study of their performances using machine learning software, WEKA. By using a 10-fold cross validation on the CRISPR dataset with 5310 instances and 9 attributes, we assessed the performance of 10 different machine learning algorithms by comparing their execution speed, completion time, precision, accurately and misclassified incidents, kappa statistics (K), mean absolute error, root mean square error, and true values of the confusion matrix. Our analysis suggests that tree-based classification algorithms have better potential to predict the efficiency of sgRNA in case of CRISPR genome editing system. In this research, we elaborate on the application of artificial intelligence to categorize and assess the features of gRNA to predict its efficacy and precision. [ABSTRACT FROM AUTHOR] |