Abstract:
In the network malicious traffic identification task, there is an imbalance between the ratio of the number of malicious traffic samples and the number of normal traffic samples, which leads to poor generalization ability and low recognition accuracy of the trained machine learning model. To solve this problem, this paper proposes a classification method that balances a small number of data classes by using the conditional Wasserstein generative adversarial network (CWGAN-GP) with gradient penalty items based on the visualization of network traffic. This method first uses the network traffic visualization method to segment, fill, and map the original traffic packet capture (PCAP) data into gray-scale images according to the flow as a unit, and then applies the CWGAN-GP method to achieve the balance of the dataset. Finally, in the public dataset USTC-TFC2016 and CICIDS2017, the convolutional neural network (CNN) model is used to classify and test the unbalanced dataset and the balanced dataset. The experimental results show that the balance method using CWGAN-GP is better than the random oversampling, SMOTE, GAN and WGAN balance methods in the three indicators of Precision, Recall, and F1.