FloCon 2020 has ended
Back To Schedule
Thursday, January 9 • 8:30am - 9:00am
Look Ma, No Malware!

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
In many cyber security scenarios, we want to associate traffic from malicious actors together, even when we don't have possession of the malware itself. This allows us to understand the overall threat landscape of a particular type of attack and may eventually lead to attribution. This talk uses a specific instance of this problem, DNS-based DDoS attacks, as a case study to highlight how the application of unsupervised learning, and some particular methodologies, can help address this threat intelligence problem.

Over the last few years, we have studied a type of DNS DDoS attack which first appeared at-scale in 2014. Known as a Slow Drip, or Random Qname attack, these attacks were particularly disruptive in the 2014-2015, particularly to the Internet's middle infrastructure. Little malware was ever recovered, and none that explained the breadth and magnitude of the attacks. These attacks continue today, but in the largest known study of the attacks, we found that the threat landscape has changed significantly in the last few years. Through a combination of text and time series features, we are able to characterize the dominant malware and demonstrate that the number of global-scale attack systems is relatively small. These results are based on large-scale global pDNS analysis over eight months.

While the results are useful to organizations needing to understand global DNS-based DDoS threat actors, the methodologies are more universal. We consider the case where a reasonably large amount of data, unlabeled, exists over time; this might be the case for certain DGAs, for example, or DNS tunneling. In our case, this data comes from a strong statistical classifier, but could encompass weaker classifiers.

The observable metadata, in our case the DNS queries, are the source to understand the underlying malware. We use traditional Exploratory Data Analysis (EDA) and feature engineering to gain intuition of how different malware may manifest in our data. The divergence of character distributions between different attacks proves enlightening, but won't scale over time as a production system needs. Identifying archetypical distributions from an initial large sample allows us to overcome this hurdle, and create a distance measure that can be combined with other features to cluster attacks, and the attack generators by extension.

The use of archetypes in unsupervised learning allows us to reliably compare data over time to fixed points, and in a way that scales. We need to be concerned about model drift, where the underlying threat changes, and in our study we did this by considering the application of the unsupervised model to data six months later.

What Will Attendees Learn?
  • Exposure to the Slow Drip attack, it's mechanisms, history, and presence on the network
  • Understanding of the evolution of this attack to where it is today
  • Case study application of unsupervised learning to very-large scale cyber problem for the purpose of threat intelligence 
  • The use of character distribution divergence (jensen shannon distance) for clustering data 
  • The use of archetypes for unsupervised learning -- and then supervised -- over time 
  • Inspiration to try to understand cyber attacks even when no malware exists

avatar for Renee Burton

Renee Burton

Sr. Staff Threat Researcher, Infoblox
Dr. Burton is the Sr. Staff Threat Researcher for the Cyber Intelligence Unit of Infoblox, a leading DDI company. She straddles the boundary between the organization's threat analysts and data scientists, focusing on the design of analytics support threat intelligence and discovery... Read More →

Thursday January 9, 2020 8:30am - 9:00am EST
Regency Ballroom Hyatt Regency Savannah 2 W. Bay Street Savannah GA 31401