Today my work day was mostly spent on looking into concurrency problems. They are among the hardest problems to solve in computer science. Hopefully I’ve got a lid on it this time. I’ll describe some of the things I did today:
Couchbase Lite, as one of its options, uses a SQLite database file as its persistent store. SQLite comes with its own concurrency control to prevent data corruption but it is not possible for it to prevent incorrect data, such as data that doesn’t make sense to an application, etc. For this reason one Couchbase Lite style database connection to a SQLite database file consists of the following:
- A read connection that will see the current finalized state of the database file
- A write connection that will see the intermediate state of the file, as well as make changes
The read connection is freely shared without restriction between threads because there is no danger of reading incomplete data (SQLite itself guarantees this by using transactions). The write connection is limited to a private thread that operates like a serial dispatch queue (for those familiar with libdispatch). All write operations will get added to a queue, and the private thread will consume the operations on that queue in order. The queuing thread will then wait for the job it submitted to finish.
The problem in the logic was this. When running a transaction, the operations submitted to the write queue got broken into three parts: starting the transaction, running the logic, and finishing the transaction. This broke isolation between multiple threads because another thread could potentially insert some operations at any point between the three. So today I worked on refactoring that so that it is not three parts, but one part that contains all three operations.
Naturally, this seems like asking for deadlocks because now we could potentially have a thread waiting for a job whose logic makes another database call that queues an operation and waits (meaning that an operation in the queue would be waiting for an operation further down the queue). However, there is a trick to knowing when to execute things out of order and fix this. If a queued job tries to queue another job, the enqueue will come from the private thread that is running the write operations. This thread is private and no outside code has access to it, and so I can assume that the only scenario in which a write operation gets queued from the private queue is when a write operation has tried to queue another write operation. In this situation, I set up the scheduler to not queue the operation, but simply run it in place (out of order). Even if that operation queues yet ANOTHER write operation it is still on the same queue and will hit the same shortcut and the entire write operation tree will be executed without queueing.
There is one drawback. If a Task is started inside a write operation which further attempts to make write operations and then Wait() is called before exiting the current operation, a deadlock will happen. So I guess please don’t do that inside a call to RunToTransaction()?