PhD Thesis Defense: Alireza Bahramali, Encrypted Network Traffic Analysis
Content
Speaker
Abstract
Traffic analysis aims at extracting sensitive information from network traffic patterns, in particular in scenarios where network traffic is encrypted. Traffic analysis has been used to compromise anonymity in anonymous communication systems through various types of attacks, specifically, website fingerprinting (WF), and flow correlation. In this thesis, we explore two approaches to performing traffic analysis on encrypted network traffic: a model-based approach and a data-driven approach.
The model-based approach focuses on establishing statistical models for traffic characteristics which are used to design effective traffic analysis algorithms. We perform model-based traffic analysis on popular Secure Instant Messaging (SIM) applications. Despite using advanced encryption algorithms, such services do not utilize sophisticated obfuscation algorithms and traffic analysis attacks can infer sensitive information from their traffic patterns. In more dynamic and complex systems such as Tor, general-purpose statistical algorithms cannot capture the nature of noise. Hence, we study a second approach, called data-driven. In this approach, we use Deep Neural Networks (DNNs) to design a flow correlation system called DeepCorr to learn a correlation function tailored to Tor's ecosystem.
To mitigate the aforementioned traffic analysis attacks, we investigate different countermeasure techniques. First, we apply obfuscation-based algorithms to both model-based and data-driven approaches. However, normal obfuscation techniques such as delaying packets, are not effective against DNN-based traffic analysis attacks. Therefore, we propose a second defense mechanism based on adversarial examples which are known to degrade the performance of DNNs.
Although data-driven based traffic analysis algorithms show promising performance, they often lack practical assumptions making them infeasible for real-world. In this thesis, we identify the root cause of such impracticality issues: the attacker's lack of a longitudinal perspective into network traffic during training of WF classifiers. We show that such impracticality issues can be alleviated by augmenting the network traces used to train WF classifiers. Specifically, we introduce NetAugment, an augmentation technique tailored to the specifications of Tor traces. We instantiate NetAugment through semi-supervised and self-supervised learning techniques. Our extensive open-world and close-world experiments demonstrate that under realistic evaluation settings, our WF attacks provide superior performances compared to the state-of-the-art; this is due to their use of augmented network traces for training, which allows them to learn the features of target traffic in unobserved settings (e.g., unknown bandwidth, Tor circuits, etc.).
Advisor