Usilng Machine Learning Technliques to Identify Botnet Traffic

Autor: D. Lapsley, W.T. Strayer, Carolos Livadas, R. Walsh
Rok vydání: 2006
Předmět:
Zdroj: LCN
DOI: 10.1109/lcn.2006.322210
Popis: To date, techniques to counter cyber-attacks have predominantly been reactive; they focus on monitoring network traffic, detecting anomalies and cyber-attack traffic patterns, and, a posteriori, combating the cyber-attacks and mitigating their effects. Contrary to such approaches, we advocate proactively detecting and identifying botnets prior to their being used as part of a cyber-attack (Strayer et al., 2006). In this paper, we present our work on using machine learning-based classification techniques to identify the command and control (C2) traffic of IRC-based botnets - compromised hosts that are collectively commanded using Internet relay chat (IRC). We split this task into two stages: (I) distinguishing between IRC and non-IRC traffic, and (II) distinguishing between botnet and real IRC traffic. For stage I, we compare the performance of J48, naive Bayes, and Bayesian network classifiers, identify the features that achieve good overall classification accuracy, and determine the classification sensitivity to the training set size. While sensitive to the training data and the attributes used to characterize communication flows, machine learning-based classifiers show promise in identifying IRC traffic. Using classification in stage II is trickier, since accurately labeling IRC traffic as botnet and non-botnet is challenging. We are currently exploring labeling flows as suspicious and non-suspicious based on telltales of hosts being compromised
Databáze: OpenAIRE