FloCon 2020 has ended
Back To Schedule
Tuesday, January 7 • 3:00pm - 3:30pm
Alchemy: Stochastic Data Augmentation for Malicious Network Traffic Detection

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Malware and botnets are abused for various types of cyber-crime such as data exfiltration, distributed denial of service (DDoS), and recently data ransom. Existing signature-based network security techniques are designed to detect pre-defined and rule-based traffic patterns.  However, due to the continuous evolution of malware and botnets, these techniques have trouble defending against the increasing types and volumes of these threats. Machine learning has become a promising alternative approach to network security. Many previous studies have aggregated traffic data into groups by hosts or flows for generating features and training detection models.

However, two problems degrade detection performance. One is the scarcity of training sets due to the rarity of new types of malicious traffic. The other is variations in feature values generated from incomplete data due to the limited amount of observed traffic. Existing solutions aim to increase data to enhance the robustness of detection models against these problems. Unfortunately, the regenerated feature vectors may not represent the nature of traffic well enough, since most of these solutions regenerate synthetic feature vectors only on the basis of existing feature vectors without considering the real distribution of raw traffic.

In this talk, we introduce a stochastic method called Alchemy that regenerates a set of feature vectors by randomly resampling the raw traffic data of each bag into several subsets. Alchemy can increase training sets and robustly represent raw traffic to correct the influence of variations in feature vectors, regardless of types of traffic data and classifiers. We evaluated Alchemy with real-world traffic data of network flows, passive DNS records, and HTTP logs, and demonstrated that it improves detection performance of various classifiers more effectively than the conventional methods in all three types of traffic data.

Attendees Will Learn:
Applying machine learning to network traffic analysis is a promising approach to enhance cybersecurity. In this talk, attendees will gain basic knowledge of how to build machine learning-based detection models with different types of network traffic data (e.g., NetFlow, passive DNS records and HTTP proxy logs) and features. Attendees will also learn how to build more accurate models with less labeled traffic data, which is a common problem in many universities and enterprises that do not have enough positive (malicious) training sets. This method can be used as an add-on application to existing detection models and can help operators to quickly start building their initial machine learning models.

avatar for Bo Hu

Bo Hu

Senior Research Engineer, NTT
Bo Hu received an M.S. in wireless network engineering from Osaka University in 2010 and joined NTT the same year. He has mainly been engaged in researching network security, machine learning, graph mining, and inter-cloud technology. He has developed a machine learning pipeline for... Read More →

Tuesday January 7, 2020 3:00pm - 3:30pm EST
Regency Ballroom Hyatt Regency Savannah 2 W. Bay Street Savannah GA 31401