SRCU state double scan

March 6, 2020

The SRCU flavor of RCU uses per-cpu counters to detect that every CPU has passed through a quiescent state for a particular SRCU lock instance (srcu_struct).

There's are total of 4 counters per-cpu. One pair for locks, and another for unlocks. You can think of the SRCU instance to be split into 2 parts. The readers sample srcu_idx and decided which part to use. Each part corresponds to one pair of lock and unlock counters. A reader increments a part's lock counter during locking and likewise for unlock.

During an update, the updater flips srcu_idx (thus attempting to force new readers to use the other part) and waits for the lock/unlock counters on the previous value of srcu_idx to match. Once the sum of the lock counters of all CPUs match that of unlock, the system knows all pre-existing read-side critical sections have completed.

Things are not that simple, however. It is possible that a reader samples the srcu_idx, but before it can increment the lock counter corresponding to it, it undergoes a long delay. We thus we end up in a situation where there are readers in both srcu_idx = 0 and srcu_idx = 1.

To prevent such a situation, a writer has to wait for readers corresponding to both srcu_idx = 0 and srcu_idx = 1 to complete. This depicted with 'A MUST' in the below pseudo-code:

        reader 1        writer                        reader 2
        -------------------------------------------------------
        // read_lock
        // enter
        Read: idx = 0;
        <long delay>    // write_lock
                        // enter
                        wait_for lock[1]==unlock[1]
                        idx = 1; /* flip */
                        wait_for lock[0]==unlock[0]
                        done.
                                                      Read: idx = 1;
        lock[0]++;
                                                      lock[1]++;
                        // write_lock
                        // return
        // read_lock
        // return
        /**** NOW BOTH lock[0] and lock[1] are non-zero!! ****/
                        // write_lock
                        // enter
                        wait_for lock[0]==unlock[0] <- A MUST!
                        idx = 0; /* flip */
                        wait_for lock[1]==unlock[1] <- A MUST!

NOTE: QRCU has a similar issue. However it overcomes such a race in the reader by retrying the sampling of its 'srcu_idx' equivalent.

Q: If you have to wait for readers of both srcu_idx = 0, and 1, then why not just have a single counter and do away with the “flipping” logic? Ans: Because of updater forward progress. If we had a single counter, then it is possible that new readers would constantly increment the lock counter, thus updaters would be waiting all the time. By using the 'flip' logic, we are able to drain pre-existing readers using the inactive part of srcu_idx to be drained in a bounded time. The number of readers of a 'flipped' part would only monotonically decrease since new readers go to its counterpart.