It has been proven that we are fucked. The V2Ray network traffic can be identified by a plain vanilla deep neural network trained by one-month-long home network traffic data. However, there are still some people who deny or even belittle my findings even after I released everything.
Someone in Reddit claimed I may leak the V2Ray server information in the training data. So I moved my V2Ray server to a different cloud service provider, different OS flavor, different UUID in VMess protocol, different IP and port. The previously trained model without learning anything from the new test data still can achieve >99% true positive rate and <1% false positive rate in classification.
V2Ray team suspects that the classifier may misclassify any "random" TCP payload as the VMess protocol over TCP. One of the V2Ray moderators suggest in his comment
replicate some VMess encrypted data and re-fill them with random bytes (keep the length), mark them as non-V2Ray data, and see if there is a performance drop.
Every TCP payload in IMAP, SMTP, TLS, or SSH application are random bytes stream drew from their unique protocol distribution. If you handcraft packets that don’t exist in the world, the classifier for sure may give you an unexpected answer. But it doesn’t mean that the classifier can’t identify the authentic VMess protocol from the real world.
Let me give you an example. We have an image classifier that can identify a human being or a bear from the real world with the same classification power as the V2Ray traffic classifier has. But to prove the classifier sucks, you handcraft a monster with the Emperor’s head and Winnie the Pooh’s body. The image classifier may tell you this fucker maybe 50% chance a human and 50% chance a bear. Can you conclude that the image classifier can not determine a decent human being image or a cute cartoon bear cub image from the real world? No.
Every stream of packets is random. The only differences are that they drew from their unique protocol distribution. It should be recommended that no VMess protocol over TCP settings should be used.
My next plan is to figure out the answer to the million dollars question — how to evade the deep neural network classifier.
Here are two different directions:
- Masquerade V2Ray traffic by TLS protocol.
- Apply the adversarial sample technique.
For the 1st bullet point, I’m collecting V2Ray traffic data that encrypt VMess traffic by TLS transportation layer. Under this setting, V2Ray traffic may be masqueraded itself as TLS traffic. It will take me another month to complete data collection. But from my source code analysis on V2Ray, I don’t have high confidence that the classifier would fail to identify V2Ray in TLS. V2Ray borrows TLS‘s implementation from Golang’s crypto module. While the majority of web servers in the world still uses TLS 1.2 version, Golang already upgrades to TLS 1.3 version. From Wireshark’s packet analysis, this quite stands out among the legitimate TLS traffic.
For the 2nd bullet point, I have done some interesting research. I’m going to publish it in my next blog post.
Stay tuned.
Thank you for this informative post. Concerning the solution, the first one may works, but it requires a valid certificate, besides, excessive data flow to a certain destination server is likely to trigger IDS system reaction, the second method might work its way, but the model from machine learning is indeed a black box, one is hardly able to inspect the features learned, there are indeed ways to challenge it considering ts robustness, it is, however, a labor-taking job.
I found the adversarial example interesting because it is quite a challenge to add back the adversarial noise under several domain specific constraints: how to encode/decode the noise between two parties and how to make the adv noise effective where it can allow to apply to TCP payload only not the IP or TCP’s header. Anyway, if you have a better idea, I’m all ears.