James Stanley - Deanonymising a Tor hidden service with traffic correlation

We've all heard that Tor hidden services can be traced to an IP address by an attacker who has sufficient visibility into global internet traffic, by correlating traffic sent to/from the hidden service with traffic sent to/from real IP addresses. But how easy is it in practice? How much data do you need? And how good do you need to be at statistics? I decided to find out.

Overview

In order to carry out the attack, you need to have a target hidden service, a set of suspected host machines (in the limit case this can be "every machine on the internet"), and access to some information about the networking each of those machines is doing.

This can range from "very easy" if you're the government and you only suspect a small number of machines, to "impossible" if you're just an ordinary person and you have no suspects.

Since we're not the government, we'll have to cheat. I run a hidden service interface to this blog at jesblogfnk2boep4.onion, for which I can obviously observe the networking. I also have several other machines dotted around the internet, for which I can also observe the networking.

(I've put all of my files, including data, in a github repo in case you want to play with it yourself).

Logging traffic counts

ifconfig gives us helpful statistics on networking activity:

$ ifconfig -a
eth0: flags=4163  mtu 1500
        inet w.x.y.z  netmask 255.255.255.0  broadcast w.x.y.255
        inet6 fe80::aaaa:bbbb:cccc:dddd  prefixlen 64  scopeid 0x20
        ether aa:bb:cc:dd:ee:ff  txqueuelen 1000  (Ethernet)
        RX packets 232310804  bytes 57245707805 (53.3 GiB)
        RX errors 0  dropped 4404528  overruns 0  frame 0
        TX packets 245582565  bytes 93237821276 (86.8 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

I wrote log-tx-packets to log packet counts. It loops forever, sleeping until the top of the next second, then shelling out to ifconfig to log the number of "TX packets" since the last sample. I chose "TX packets" instead of "RX packets" because the size of an HTTP response is larger than the size of an HTTP request, so is likely to result in a larger signal. "TX bytes" might work even better than "TX packets", I haven't tried it. This script is to be run in screen on each of the suspect machines, with stdout going to a file. It's also important that you synchronise the clocks with ntpdate or similar before starting. If they're off by even a few hundred ms it might throw things off.

Creating traffic

So now we have second-precision logs of traffic counts available on each of our suspect machines, we need to start making the hidden service produce some traffic that we know about. This is easily done with torsocks and curl:

$ time torsocks curl -s http://jesblogfnk2boep4.onion/ > /dev/null

real 0m1.547s user 0m0.160s sys 0m0.433s

I ran this a bunch of times, and it completed in 1.5 to 1.9 seconds each time. This means if we want to partition our time such that 50% of timeslices have a request being handled, and 50% of timeslices have no request being handled, our timeslices probably want to be 2 seconds. We could go back and modify log-tx-packets to reduce the precision to 2 seconds instead of 1 second, but I just accounted for it later.

I next wrote fetch-site to make automated requests to the hidden service. It loops forever, sleeping until the top of the next second, then shelling out to torsocks curl to make a request to the hidden service, but only on seconds that are divisible by 4. Since we're using 2-second timeslices, and want to send requests in 50% of timeslices, we only want to send a request every 4th second.

So with log-tx-packets already running in screen, I started fetch-site running on my laptop to start trying to put some signal into the packet count logs.

Processing the data

Now we are getting a count of the packets transmitted each second by each machine, but this signal is way too noisy to make any sense of by hand. But we can take advantage of the fact that making a request every 4th second should produce a 0.25 Hz signal on top of the ordinary networking traffic.

Consider a list of uniformly-chosen random numbers. If you keep an "accumulator" value, and add to the accumulator all of the numbers at even-numbered indexes in the list, and subtract from the accumulator all of the numbers at odd-numbered indexes in the list, you would expect the accumulator to hover around 0.

163-481+400-496+940-117+198-640+152-174 = -55

And if we kept going further, it would still hover around 0.

However, in a list where the numbers at even-numbered indexes are slightly higher, you would expect the accumulator to grow continuously over time, because the values that are being added can be expected to be higher than the values that are being subtracted.

263-481+500-496+1040-117+298-640+252-174 = 445

(That's the same list, but with 100 added on to each of the added numbers, while the subtracted numbers remain the same). If we kept going further, it would keep growing larger.

Our log files of packet counts can be considered this way. If we add on the packet counts during even-numbered timeslices (i.e. 2-second periods during which we were doing a request), and subtract packet counts during odd-numbered timeslices, then we can expect the accumulator of the machine that is hosting our hidden service to get large (since it does more networking while we're making requests), while the others hover around 0.

In practice this scheme is thrown off by outlier timeslices during which unusually-large amounts of networking are done, so let's also throw away any timeslices that are more extreme than the 95th percentile. We can also normalise the accumulator by the mean packet size of the machine we're looking at, so that an accumulator of small numbers that is constantly growing isn't dwarfed by an accumulator of large numbers hovering around 0.

So I wrote score-log to read in a packet count log and compute the accumulator value as a score for how well the packet counts correlate with the hidden service requests.

I also wrote a shell script to scp the log files, and one to run score-log on each one.

After the first 5 minutes, we have:

$ ./sync-logs && ./score-logs
scaryspice: -3.33789188491807
sportyspice: 17.5645714285714
babyspice: -0.645836662937511
gingerspice: 4.9340804218853
poshspice: -10.125

sportyspice has a larger accumulator than the others, but poshspice's accumulator has a similar magnitude, so we're probably still not picking up a signal above the noise.

After 30 minutes:

$ ./sync-logs && ./score-logs
scaryspice: 7.66967566341574
sportyspice: 351.34283015898
babyspice: 13.447107176731
gingerspice: 5.10374373477445
poshspice: -105.243956043956

sportyspice still has a larger accumulator than the others, but poshspice still seems quite strongly negatively-correlated.

I left it running overnight and after about 13 hours we have:

$ ./sync-logs && ./score-logs
scaryspice: -53.2196414933282
sportyspice: 12387.0290996054
babyspice: 53.8562369567745
gingerspice: 201.901095205812
poshspice: -398.654982817869

And, indeed, sportyspice corresponds to the machine that is hosting jesblogfnk2boep4.onion! That was easier than I expected, it only took about 100 lines of code, almost no knowledge of statistics, and produced conclusive results in a matter of hours. It is admittedly pretty ghetto, but that just means a non-ghetto version would be even more effective.

poshspice has a surprisingly large and consistent negative correlation. Possibly it is doing some networking of its own that has a period of 4 seconds, I've not looked into this.

Improvements

Probably the "correct" way to detect a 0.25 Hz signal is by taking a Fast Fourier Transform and looking for a peak at 0.25 Hz. This is probably more reliable than my method.

We'd probably want a probability and a p-value, rather than a unit-less "score", if we wanted to be a bit more scientific. More stats knowledge required for that.

Instead of choosing to send a predictable 0.25 Hz signal, we could choose at random, at each timeslice, whether or not to fetch the site. We just need to log which timeslices we were fetching, and use this information when computing the scores. This would make it easier to differentiate our fetching from any other process that might have predictable timing, and might get rid of examples like the phantom correlation that poshspice exhibited, although it would make it incompatible with the FFT approach.

Instead of using short timeslices and only sending 1 request, we could use larger timeslices (say, a minute, or even an hour) and make as many requests as we can get away with during that time. This would produce a much larger signal, as the machine would be forced to do a lot more networking, at greater risk of our correlation attack being detected. This would also make it comparatively less important that the traffic logs are labelled with accurate timestamps.

You could send almost-arbitrarily large amounts of data by sending a POST request with a very large body. If you did this, and looked at "RX packets" instead of "TX packets", this might get a better signal-to-noise ratio, since most clients are just sending simple GET requests.

Instead of making actual HTTP requests, we could probably just make a connection to the Tor instance hosting the hidden service, but without going far enough in the request to cause it to propagate up to the application layer. This would greatly improve our chances of avoiding detection as it would not create evidence in the web server log file. I don't actually know how to do this, or whether it is possible in the general case.

Practical considerations

If the machine hosting the hidden service is very busy, you'll need to do more requests in order to pick out your signal above the noise of its usual networking. This makes the attack take longer.

In practice the set of suspect machines is likely to be larger than 5, which again means you either need to create a stronger signal, or have the attack take longer.

An ISP or hosting company could easily use this attack while monitoring the networking of a customer in order to determine whether their customer is likely to be hosting a given hidden service.

It wouldn't only be the hosting machine that has traffic that is correlated with the hidden service: the Tor relays on the route would also be correlated with it, so a real-world attacker might need to take that into account, depending on how large the set of suspect machines is.

If you have enough control over the networking of a suspect machine, you could try switching it off for a few seconds and seeing if that makes the hidden service unreachable. If the hidden service reliably goes down at times the suspect's networking is switched off, that's a good indication that the suspect is hosting the hidden service. This is probably both easier to carry out and more damning than the correlation attack, but more disruptive and has a higher chance of being discovered. Even if you don't have enough control over the networking of a suspect machine to switch its networking off yourself, you might find some joy in sending an abuse report to the ISP and waiting for them to switch the networking off for you.