A statistical analysis of SAMPARK dataset for peer-to-peer traffic and selfish-peer identification.

Autor: Ansari, Md. Sarfaraj Alam, Pal, Kunwar, Govil, Prajjval, Govil, Mahesh Chandra, Awasthi, Lalit Kumar
Předmět:
Zdroj: Multimedia Tools & Applications; Mar2023, Vol. 82 Issue 6, p8507-8535, 29p
Abstrakt: The popularity of peer-to-peer (P2P) network can be attributed to their inherent advantages such as resource utilization, scalability and better response. At the same time modern networks have become highly complex and need better approaches for management and monitoring of traffic. The use of machine learning (ML) techniques is inevitable due to their inherent advantages. The ML-based model needs a reliable dataset for training and testing of the developed approaches. This paper addresses the unavailability of a comprehensive labelled dataset to enable the researcher to evaluate their machine learning based solutions. The proposed SAMPARK dataset is constructed by capturing the traces by running various P2P and Non-P2P applications in real time. The generated dataset consists of the normal traffic pattern and 24 attributes that comprise the basic, flow, and packet-based general features. The major contribution in the work presented lies in building of an exclusive dataset to address important issues in P2P network such as selfish peer, flash crowd, as no dataset is being constructed explicitly to address these important problems in P2P network. The validity of the constructed SAMPARK dataset is carried out by using statistical analysis of probability distribution and feature correlation.The statistical evaluation of SAMPARK dataset shows non-linearity and non-normality characteristics. The correlation rate among features without labelling and with labels are determined using Pearson's Correlation Coefficient (PCC) and Gain Ratio (GR) and the acceptable rates are 84% and 68% respectively. The effectiveness of the dataset is demonstrated by applying machine learning method. The labelling of dataset is done using port-based technique and performance is determining by calculated Accuracy and False Alarm Rate (FAR) for various proposed ML-model developed to identify P2P traffic and selfish peers. The comparative analysis is also done with UNIBS dataset. The highest accuracy achieved for RF technique on SAMPARK dataset is 99.13% which is better compare to UNIBS dataset. The experimental results also exhibit the usefulness and efficacy of the proposed SAMPARK dataset for various analysis of P2P networks. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index