[guardian-dev] 81% of Tor users can be de-anonymised by analysing router information, research indicates

Sun Nov 23 12:26:50 EST 2014

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 22/11/14 00:27, str4d wrote:
>> I'm not suggesting that running over Tor or I2P would make any 
>> system *less* effective. What I'm saying is that if we assume an
>>  adversary who can break Tor or I2P's anonymity through traffic 
>> confirmation, then we either need to make Tor or I2P stronger, or
>>  build a separate system that's stronger on its own. If we build
>> a separate system then it won't provide a large anonymity set
>> until it becomes popular, so we could face a chicken-and-egg
>> problem.
> 
> That makes sense. The chicken-and-egg problem is lessened if we 
> restrict the adversary's abilities to breaking Tor or I2P through 
> traffic confirmation on a targeted scale, rather than being able
> to completely break the anonymity of all users all the time - then
> there is a real benefit of having the separate stronger system
> running over Tor or I2P. Whether this is a realistic restriction,
> however...

This restriction seems realistic to me. We might also consider an
adversary who can see some subset of internet traffic and thus carry
out traffic confirmation attacks against some subset of users (e.g. an
ISP). We know these adversaries exist, regardless of whether the
global adversary also exists.

In each case, would it make sense for a new high-latency anonymity
system to use Tor or I2P for its connections rather than plain TCP?

For the global adversary, no. Tor and I2P are transparent to that
adversary.

For the targetted adversary, yes. If the high-latency system has any
connections between targetted and untargetted nodes, using Tor or I2P
will prevent the adversary from identifying and therefore targetting
the untargetted nodes, so the adversary's view of the high-latency
system will remain partial.

For the subset adversary, maybe. If the high-latency system has any
connections between inside and outside nodes, using Tor or I2P will
prevent the adversary from identifying the outside nodes. But whether
identifying the outside nodes without being able to target them is
useful depends on how the adversary's trying to attack the
high-latency system.

So overall it looks like it makes sense to use Tor or I2P instead of
plain TCP, even if you're aiming to resist stronger adversaries than
Tor and I2P can resist on their own.

>> By the way, I've been meaning to ask you about I2P-Bote's 
>> architecture for a while; maybe this is a good opportunity. Is
>> the DHT where the messages are stored specific to I2P-Bote, or is
>> it part of I2P?
> 
> The DHT is a Kademlia DHT specific to I2P-Bote, with a few 
> modifications from standard Kademlia. See section 2 of the
> technical documentation [0] for details. I2P's netDb DHT is only
> for storing network information; applications are expected to
> handle their own data requirements.

Thanks! I couldn't fine the tech docs before, I'll give them a read.

>>> If high-latency tunnels would actually be useful, we can 
>>> implement them and get most of the network supporting delays 
>>> relatively quickly (we usually have 80% of the network on the 
>>> latest release within six weeks).
> 
>> Wow, that would be amazing!
> 
> Re-reading my message, I want to clarify that my last sentence
> needed an additional comma. I was saying that once implemented,
> getting the network to support the changes would be relatively
> quick. Actually implementing delays is a much trickier kettle of
> fish, and the subject of the rest of this message :)

Ah, OK. :-)

>> The longer the delays, the bigger the storage requirements. At
>> some point you have to think about writing the data to disk until
>> it's time to forward it.
> 
> This applies equally to any system with a delay between receiving
> and sending data - I2P/Tor with delays, I2P-Bote, Freenet,
> Tahoe-LAFS...

Yes, absolutely. My point was just that Tor doesn't currently use much
disk space or disk throughput, so adding a disk-based data cache to
Tor would be a big change for relay operators.

> The storage requirement issue raises another question: what
> incentive is there for other routers to store delayed packets?
> There is no guarantee that a participating router is going to honor
> your request. The answer is clearer for a specific app like
> I2P-Bote or Freenet than it is for a generic network transport like
> I2P or Tor. As the required delay increases, so does the incentive
> required. IIRC, previous study indicates that a 10 minute delay is
> the minimum that would make any difference [1], and that can
> quickly become non-trivial.

I agree it's important to think about the resources we're asking
people to contribute, but I prefer not to frame the issue in terms of
incentives, because in the past that led me down a game theory rabbit
hole and it took me years to escape. :-)

People contribute resources to Tor, I2P and Freenet for a wide range
of reasons apart from improving their own anonymity.

> Let's say we turned on a 20 minute delay for all 5000 I2PSnark 
> (torrent) users each with 50 KBps of traffic. That's 60 MB of data
> for each of the 5000 to be buffered somewhere. If you keep it to
> Snark users, thats 60 MB each. If you spread it across all I2P
> routers, maybe 6 MB each. More likely, the data will be spread
> across the fast routers used by the Snark users in their tunnels;
> say there are ~1000 of them (approx number of I2P FFs), that is 300
> MB each. Then throw in the fact that this data is constantly
> churning, with complete turnover every 20 minutes. Things get
> sticky, and routers need a good reason for the additional memory
> and disk load.

I wouldn't expect people to run BitTorrent-like workloads over a
high-latency system - I'm thinking of email-like workloads. But yeah,
we should definitely consider the disk space and disk throughput
requirements, and we can't drop new requirements on relay operators
without warning.

> For I2P and Tor, the incentive is of course cover traffic. Defined 
> more carefully, the incentive a router has for delaying traffic is 
> that it can use that traffic to smooth out its own bandwidth
> curve, hiding its own patterns. This is at odds with having
> user-defined delays on traffic, but that is not a bad thing IMHO.
> There is no point in having a deterministic delay because the
> traffic confirmation attack can easily account for it; and if the
> delay is random, there is no need for the user to specify it. At
> most, the user could provide an indication of how long they would
> ideally like the traffic to be delayed, but the router doing the
> delaying would have the assumed right to send the data whenever it
> desired / required, which might be immediately.

That sounds good. We could also have user-specified minimum and
maximum delays, allowing the relay some flexibility, a bit like a
stop-and-go mix:

http://freehaven.net/anonbib/#stop-and-go

I have a slight preference for user-specified delays over random
delays because they allow the endpoints to choose new delay
distributions without upgrading the relays (end-to-end principle). But
maybe there's a delay distribution that's provably optimal or something.

> Other issues to consider: - The effect of low volumes of
> high-latency traffic on the delaying router's incentive, and how
> this fits in with dummy traffic [2] - Mixing strategies, e.g. [3]

Yeah, I need to go back to the mix literature - it's possible that
mixing provides stronger anonymity than independently delaying
packets. George Danezis is the person to talk to about this.

>> I imagine that would be a big architectural change, but I've
>> never looked at the I2P code - what do you reckon?
> 
> If delays existed in isolation of other network effects,
> implementing them would be trivial: modify the hop processor or a
> related handler [4] to store packets that have a delay, and have a
> job that re-inserts them into the outbound message processor once
> the delay has elapsed.
> 
> An actual implementation would need to play nice with other parts
> of I2P. Over the years we've added more strict expiration
> enforcement to prevent loops and DDoSing, and these would need to
> be modified to handle the delays. We also have session tags that
> enable use of faster crypto once a session is established (AES
> instead of ElGamal) [5], and because these expire we would need to
> force the use of the slower and more expensive crypto. Time-wise
> it's not a problem (the packet is meant to be delayed anyway), but
> the increased crypto processing load may have effects on router
> performance (we've had similar issues before).

I guess there's an architectural question here: should the system deal
with streams of packets, as in Tor and I2P, or independent packets, as
in mix networks? Not something I expect to answer at this stage, but
something to bear in mind as we explore possible designs.

Cheers,
Michael
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iQEcBAEBCAAGBQJUchjaAAoJEBEET9GfxSfM7nwH/iiZN3RcPtbOLymCMaTMbNW9
dbPjXUZc9E4JhU7URHeiJeOPabC0jCwzsgPIk27eZCP+VmRJexwH6kad/VDAy2+V
UxLH9+HOWzKTlEo4ba3mZdkunegK/GCCYM6LUfu+K9qrjnJc1kKCt5k6mXtLB0RU
GW2VWi6VBzrBrrDgO1bCROzEAG2z2hKB39t2PJBFwDY/6w1+gVc+aPJZ9zT2NRi4
pipFDrzEL8pIJg62KiHpAtV7XlofACqo0jKzcrwx/sYHrfJd/6sCo52HeUornLTy
yneWaoD6G1WEFJHRPZMGQHRJhxZAlEuT1SgKVZYCqf5BW8PJ0bk/znre7jwiLbM=
=PGvN
-----END PGP SIGNATURE-----