[guardian-dev] IOCipher Status Update
david at olivercoady.com
Tue Jan 22 11:00:35 EST 2013
Stephen, fantastic sleuthing and work to fix. Guys - I'm anxious to hear
the results using our tests.
Stephen, great work by your team!
David M. Oliver | david at olivercoady.com | http://olivercoady.com |
http://dmo.tel | @davidmoliver | +1 970 368 2366
On Tue, Jan 22, 2013 at 10:51 AM, Stephen Lombardo
<sjlombardo at zetetic.net>wrote:
> Hello Everyone,
> We've made some solid progress with locking and stabilization of libsqlfs
> under multithreaded use.
> We started by reproducing the problems with the library under load. This
> involved firing up a fuse mount and running three concurrent fsx processes
> against the same sqlfs file system. This test quickly produced file
> corruption, usually within 30 seconds. From there, enhanced tracing showed
> that the underlying cause had to do with the database being locked at
> various points during execution. Even thought the library was opening an
> exclusive transaction, there were still numerous opportunities for the
> transaction to be blocked by a reader, or multiple writers to block each
> other. Similar behavior could also be seen with deferred transactions (i.e.
> standard begin).
> This lead us to make several change to collectively improve the stability
> of the library.
> First, we changed the transaction command in begin_transaction to use
> "begin immediate". This seeks an immediate reserved lock on the database,
> but does not exclusively lock it. This reduces unresolvable contention for
> write locks that would normally occur with deferred transactions, and is
> less restrictive than an exclusive lock, since it will continue to allow
> shared locks for reading.
> It is extremely important that we prevent write operations from failing to
> execute due to busy timeouts, even if another process/thread has the
> database locked. Even using WAL, it is still possible for a command to be
> blocked during attempted concurrent write operations. This causes the write
> operation to fail leading to corruption. While libsqlfs has some "delay()"
> code that provides rudimentary busy handling, it is only in use for a small
> number of operations leaving other critical calls unprotected. Therefore,
> our second change was to register SQLite's internal busy handler with a
> relatively high timeout (currently 10 seconds, but open for discussion) via
> sqlite3_busy_timeout. This provides protection for all operations in
> libsqlfs, reducing the likelihood that a write operation would fail
> outright, though it may be delayed.
> Finally, we enabled WAL mode to speed up write operations and further
> improve concurrency between readers and writers. Note that WAL mode only
> fsync()s on checkpoint operations, so it may be possible to enable NORMAL
> synchronous mode with lower overhead than the standard journal mode (we
> didn't change this yet).
> With these changes in place, three concurrent fsx processes running in
> parallel on a single fuse mount produced no errors in a 24hr test run. The
> tests also shows improved performance on read and writes. In light of these
> results, we'd like to get your feedback on these changes, and request that
> you run your own tests in the multi-threaded Android application to see if
> they resolve the problems that were reported.
> All the changes are available here:
> Please let us know if you have any questions. Thanks!
> On 2013-01-18, Hans-Christoph Steiner wrote:
> > On 01/18/2013 09:24 AM, David Oliver wrote:
> > > Thanks Stephen for this. It seems like we're closing in on completion
> > > with the final step being Stephen's team looking at the lock issue and
> > > advise Guardian as to fixes/changes/etc that we (Guardian) would
> > >
> > > Stephen - can you provide a timeframe for your locking review?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Guardian-dev