GUS (Global Unbounded Sequences)

April 24, 2020

GUS is a memory reclaim algorithm used in FreeBSD, similar to RCU. It is borrows concepts from Epoch and Parsec. A video of a presentation describing the integration of GUS with UMA (FreeBSD's slab implementation) is here: https://www.youtube.com/watch?v=ZXUIFj4nRjk

The best description of GUS is in the FreeBSD code itself. It is based on the concept of global write clock, with readers catching up to writers.

Effectively, I see GUS as an implementation of light traveling from distant stars. When a photon leaves a star, it is no longer needed by the star and is ready to be reclaimed. However, on earth we can't see the photon yet, we can only see what we've been shown so far, and in a way, if we've not seen something because enough “time” has not passed, then we may not reclaim it yet. If we've not seen something, we will see it at some point in the future. Till then we need to sit tight.

Roughly, an implementation has 2+N counters (with N CPUs): 1. Global write sequence. 2. Global read sequence. 3. Per-cpu read sequence (read from #1 when a reader starts)

On freeing, the object is tagged with the write sequence. Only once global read sequence has caught up with global write sequence, the object is freed. Until then, the free'ing is deferred. The poll() operation updates #2 by referring to #3 of all CPUs. Whatever was tagged between the old read sequence and new read sequence can be freed. This is similar to synchronize_rcu() in the Linux kernel which waits for all readers to have finished observing the object being reclaimed.

Note the scalability drawbacks of this reclaim scheme:

Expensive poll operation if you have 1000s of CPUs. (Note: Parsec uses a tree-based mechanism to improve the situation which GUS could consider)
Heavy-weight memory barriers are needed (SRCU has a similar drawback) to ensure ordering properties of reader sections with respect to poll() operation.
There can be a delay between reading the global write-sequence number and writing it into the per-cpu read-sequence number. This can cause the per-cpu read-sequence to advance past the global write-sequence. Special handling is needed.

One advantage of the scheme could be implementation simplicity.

RCU (not SRCU or Userspace RCU) doesn't suffer from these drawbacks. Reader-sections in Linux kernel RCU are extremely scalable and lightweight.