[guardian-dev] getting useful tracking data from our F-Droid repo without leaking privacy

Tom Ritter tom at ritter.vg
Wed May 3 11:28:49 EDT 2017

FWIW, the Tor Project's log sanitization code is here:
https://gitweb.torproject.org/webstats.git/tree/src/sanitize.py and it
publishes the results publicly.

It rounds per-day.


On 3 May 2017 at 07:31, Hans-Christoph Steiner
<hans at guardianproject.info> wrote:
> Hey Nathan,
> I'd like to try to enable some privacy-preserving tracking on the
> Guardian Project F-Droid repo.  Looks like logging was turned off August
> 19, 2014.  What do you think about turning on the Apache logging again,
> but only keeping the logs for one day?
> For the F-Droid repo, there would be a cron'ed script that would take
> just what was downloaded and what time.  I'm tempted to also convert the
> IP address to a country, and store that.
> Rounding off the time to the day seems like a nice balance of useful
> info without giving away too much.  That eliminates the rich time-of-day
> metadata, e.g. night, morning, lunchtime, etc.  But maybe rounding to
> the week would be better since that would also eliminate info about the
> weekly cycle (e.g. downloads on Friday evening are not likely to come
> from an orthodox Muslim, or Saturday for an orthodox Jew; then there are
> holidays, etc.).
> .hc
