[guardian-dev] differential privacy for traffic and map routing?

Nathan of Guardian nathan at guardianproject.info
Mon Jul 10 18:02:54 EDT 2023


This is a great topic, and definitely touches on some of the de-anonymization vectors we are considering with our work on Clean Insights (https://cleaninsights.org/ and “under constructions” docs here-> https://docs.cleaninsights.org/). I have added Benjamin, John, and Iain, who contribute to the project in different ways to this thread (not sure who is on guardian-dev or not)

One of our core ideas with Clean Insights is to send measurements in batches, reducing all timestamps to a single timestamp within the defined measurement window. We also do local averaging and aggregation/tallying into a single value when possible, depending on what is being measured. We also submit all measurements through a “Measurement Proxy” which ensures details about the submitting device are cleaned before they reach any specific back-end like Matomo. Original concept doc is here: https://gitlab.com/cleaninsights/clean-insights-design/-/blob/master/docs/CleanInsights-ClientSpec-v0.0.1.md and still relevant, though the team has done the hard work of actually shipping code (sdks, proxy, etc).

Your idea of storing measurements, and then sending them in random order over a period of time, and potentially through different tor circuits, is also interesting, and potentially enhancement feature that we could implement in Clean Insights, as well.

I’ll let the others chime in if we have any more related ideas or concepts to share.

> On Jul 1, 2023, at 10:39 AM, Greg Troxel <gdt at lexort.com> wrote:
> 
> Sorry if this is too far OT; just tell me and I'll endeavor to remember.
> 
> When a bunch of users want to report a metric, then they can add a
> random value that's big enough to obscure it and report, and the only
> thing that is revealed is that the user is using the app.  That assumes
> that app users report the metric without depending on some user
> behavior.
> 
> Currently, various map programs report traffic data, which is useful for
> helping others.  But it's very concerning privacy wise.  One could
> report over tor, and for things that are infrequent that many people are
> expected to pass, that might be enough (police here, crash there).
> 
> In a discussion about improving open-source routing with OSM, it
> occurred to me that it would be useful to have data about "the speed on
> this road is usually X', vs X posted limit", and "when turning from A to
> B, the time taken is Y seconds longer than would be computed by
> traveling at A's normal speed to the turn point and instantaneously
> turning to go along B and B's normal speed".  Similarly for stop signs
> and traffic lights.
> 
> Different users have different mobility patterns and thus will report
> different things.  This will lead to identifying them, even if the times
> are blinded by differential privacy.  What's needed is to dissociate the
> reports from each other.
> 
> Perhaps, an approach for typical traffic is to save the reports over a
> week, and then in each new week, spread them randomly over that week,
> with a fresh tor circuit for each.
> 
> For crashes/etc., and for speeds well below normal, perhaps a single
> live report over tor is ok, for each user, once a week, perhaps with a
> geofence to exclude.
> 
> Thoughts on how to do this better are welcome.  I know what I said is
> very half baked.
> 
> Greg
> _______________________________________________
> List info: https://lists.mayfirst.org/mailman/listinfo/guardian-dev
> To unsubscribe, email:  guardian-dev-unsubscribe at lists.mayfirst.org


More information about the guardian-dev mailing list