[guardian-dev] IOCipher Status Update
sjlombardo at zetetic.net
Tue Jan 22 19:33:35 EST 2013
On 2013-01-22, Abel Luck wrote:
> 1) Can you share your method for the "enhanced tracing" you used to
> troubleshoot the corrupted fsx runs? Breakpoints? printf debugging? strace?
> Knowing how to reproduce this would be quite useful.
In order to move quickly and keep things simple we just used modified trace logging for this. As you know, the sqlfs codebase redefines the INDEX macro throughout the sqlfs.c file. We added a parameter to the show_msg() function to take the INDEX and log it along with the applicable message. This allowed us to capture when various errors occurred after a test run, including all sqlite errors, and then correlate it back to those sections of the code where they occurred.
> 2) What is the argument for raising or lowering the sqlite3_busy_timeout
> from 10 seconds?
In this context 10 seconds would the longest amount of time that a sqlite3 call would block before returning an error. There isn't any impact for single connection applications or idle databases, since the busy handler would never be invoked.
The default busy handler initially sleeps and tries again quickly then progressively backs-off. Making the busy timeout longer will allow more time for competing connections to obtain locks. It's difficult to provide concrete numbers here, but the timeout should be roughly correlated with the amount of load and the number of concurrent handles that would be using the database. For instance, if you turned the busy timeout down to 1 or 2 seconds, you would likely see corruption when using 3 or 4 concurrent fsx tests.
Since we would almost never want a write to error due to a locked database, its probably a good idea to keep this high. That way, if there is potential for a really large number of threads to be competing for access with continuous read/writes, we can still avoid locking errors.
Since we can set the timeout when the database is opened, it might also be desirable to make this a user defined value with a sensible default instead of hard-coding it. This would provide additional flexibility to better accommodate specific circumstances.
> Finally, as a review note, I think we should go ahead and purge delay()
> from the codebase. Thoughts?
I agree. The implementation of delay() is simplistic, inflexible, and isn't called in all of the important places. With the busy timeout in place, delay() can be removed so everything uses consistent logic.
More information about the Guardian-dev