[guardian-dev] 81% of Tor users can be de-anonymised by analysing router information, research indicates

str4d str4d at i2pmail.org
Fri Nov 21 19:27:51 EST 2014

Hash: SHA512

Michael Rogers wrote:
> On 20/11/14 20:28, str4d wrote:
>>> If we simply use Tor as a low-latency transport for 
>>> asynchronous messaging then we're limited to Tor's threat 
>>> model, i.e. we can't prevent traffic confirmation attacks. If 
>>> we revive one of the remailers or build a new system then
>>> we're limited to a small number of users, i.e. a small
>>> anonymity set. So ideally we'd find some way of adding
>>> high-latency mix-like features to Tor.
>> I2P-Bote (high-latency distributed encrypted email) [0] falls 
>> into the "new system" category (even though it has existed for 5 
>> years). But I am intrigued by this thought. I2P-Bote was
>> designed to use I2P to hide the fact that someone was using
>> I2P-Bote (up to the limit of traffic confirmation), but it also
>> includes its own high-latency relaying system for sending mail.
>> Are you suggesting that its efficacy is reduced by it running
>> over I2P (compared to not running over an anonymizing network at
>> all)?
> I'm not suggesting that running over Tor or I2P would make any 
> system *less* effective. What I'm saying is that if we assume an 
> adversary who can break Tor or I2P's anonymity through traffic 
> confirmation, then we either need to make Tor or I2P stronger, or 
> build a separate system that's stronger on its own. If we build a 
> separate system then it won't provide a large anonymity set until 
> it becomes popular, so we could face a chicken-and-egg problem.

That makes sense. The chicken-and-egg problem is lessened if we
restrict the adversary's abilities to breaking Tor or I2P through
traffic confirmation on a targeted scale, rather than being able to
completely break the anonymity of all users all the time - then there
is a real benefit of having the separate stronger system running over
Tor or I2P. Whether this is a realistic restriction, however...

> By the way, I've been meaning to ask you about I2P-Bote's 
> architecture for a while; maybe this is a good opportunity. Is the 
> DHT where the messages are stored specific to I2P-Bote, or is it 
> part of I2P?

The DHT is a Kademlia DHT specific to I2P-Bote, with a few
modifications from standard Kademlia. See section 2 of the technical
documentation [0] for details. I2P's netDb DHT is only for storing
network information; applications are expected to handle their own
data requirements.

>> The I2P tunnel specification has included support for 
>> high-latency tunnels since it was designed. None of it is 
>> implemented yet because no one had a use for it, but the 
>> underlying tunnel protocol has defined message bits for 
>> supporting user-defined delays both along a tunnel [1] and at
>> the end of it [2].
>> If high-latency tunnels would actually be useful, we can 
>> implement them and get most of the network supporting delays 
>> relatively quickly (we usually have 80% of the network on the 
>> latest release within six weeks).
> Wow, that would be amazing!

Re-reading my message, I want to clarify that my last sentence needed
an additional comma. I was saying that once implemented, getting the
network to support the changes would be relatively quick. Actually
implementing delays is a much trickier kettle of fish, and the subject
of the rest of this message :)

Another point to clarify: the previously-linked specs for user-defined
delays along a tunnel and at the end of it do not allow for hop-by-hop
delays. They only cater for delays imposed at the end of an outbound
tunnel (halfway along the end-to-end communication path), or in the
processing of a Garlic Clove (at the destination, which can be paired
with other routing options for higher-level exotic systems like
cooperating Destinations operating a store-and-forward system). Adding
a hop-by-hop delay flag would certainly be possible; (un)fortunately
it is not the Hard Part(TM) of the problem.

> I had a conversation with Eleanor Saitta this morning about 
> creating a new type of Tor cell with a user-defined delay, and she 
> pointed out that it would increase the amount of traffic "in 
> flight" across the Tor network at any moment, so relays would need 
> more memory to support the same throughput. I guess the same 
> applies to I2P.
> The longer the delays, the bigger the storage requirements. At some
> point you have to think about writing the data to disk until it's
> time to forward it.

This applies equally to any system with a delay between receiving and
sending data - I2P/Tor with delays, I2P-Bote, Freenet, Tahoe-LAFS...

The storage requirement issue raises another question: what incentive
is there for other routers to store delayed packets? There is no
guarantee that a participating router is going to honor your request.
The answer is clearer for a specific app like I2P-Bote or Freenet than
it is for a generic network transport like I2P or Tor. As the required
delay increases, so does the incentive required. IIRC, previous study
indicates that a 10 minute delay is the minimum that would make any
difference [1], and that can quickly become non-trivial.

Let's say we turned on a 20 minute delay for all 5000 I2PSnark
(torrent) users each with 50 KBps of traffic. That's 60 MB of data for
each of the 5000 to be buffered somewhere. If you keep it to Snark
users, thats 60 MB each. If you spread it across all I2P routers,
maybe 6 MB each. More likely, the data will be spread across the fast
routers used by the Snark users in their tunnels; say there are ~1000
of them (approx number of I2P FFs), that is 300 MB each. Then throw in
the fact that this data is constantly churning, with complete turnover
every 20 minutes. Things get sticky, and routers need a good reason
for the additional memory and disk load.

For I2P and Tor, the incentive is of course cover traffic. Defined
more carefully, the incentive a router has for delaying traffic is
that it can use that traffic to smooth out its own bandwidth curve,
hiding its own patterns. This is at odds with having user-defined
delays on traffic, but that is not a bad thing IMHO. There is no point
in having a deterministic delay because the traffic confirmation
attack can easily account for it; and if the delay is random, there is
no need for the user to specify it. At most, the user could provide an
indication of how long they would ideally like the traffic to be
delayed, but the router doing the delaying would have the assumed
right to send the data whenever it desired / required, which might be

Other issues to consider:
- - The effect of low volumes of high-latency traffic on the delaying
router's incentive, and how this fits in with dummy traffic [2]
- - Mixing strategies, e.g. [3]

> I imagine that would be a big architectural change, but I've never 
> looked at the I2P code - what do you reckon?

If delays existed in isolation of other network effects, implementing
them would be trivial: modify the hop processor or a related handler
[4] to store packets that have a delay, and have a job that re-inserts
them into the outbound message processor once the delay has elapsed.

An actual implementation would need to play nice with other parts of
I2P. Over the years we've added more strict expiration enforcement to
prevent loops and DDoSing, and these would need to be modified to
handle the delays. We also have session tags that enable use of faster
crypto once a session is established (AES instead of ElGamal) [5], and
because these expire we would need to force the use of the slower and
more expensive crypto. Time-wise it's not a problem (the packet is
meant to be delayed anyway), but the increased crypto processing load
may have effects on router performance (we've had similar issues before).


[0] https://github.com/i2p/i2p.i2p-bote/blob/master/doc/techdoc.txt
[1] I can't recall which paper this was; it was one of the Tor studies.
[2] https://www.petsymposium.org/2014/papers/Oya.pdf
[3] http://freehaven.net/doc/alpha-mixing/alpha-mixing.pdf
[5] https://geti2p.net/en/docs/how/elgamal-aes

> Cheers, Michael


More information about the Guardian-dev mailing list