Stupid RCU Tricks: Detecting Pointer Leaks
Explicitly scoped synchronization mechanisms, including locking, are susceptible to leaks, that is, use of protected values past the end of the synchronization scope. RCU is no exception, as illustrated by the following code fragment:
rcu_read_lock();
p = rcu_dereference(gp);
do_something_with(p);
rcu_read_unlock();
do_something_more(*p);
The structure referenced by gp might be freed immediately after the rcu_read_unlock(), in which case the call to do_something_more() is a use-after-free (UAF) bug.
One way to avoid this sort of bug is use RAII (resource allocation is initialization) primitives, and to declare p within the resulting scope. For example, in the Linux kernel:
scoped_guard(rcu) {
struct foo *p = rcu_dereference(gp);
do_something_with(p);
}
do_something_more(*p);
Because p is confined to the scope of RCU protection, it cannot possibly leak. In fact, the use of p in do_something_more(*p) now gives a compiler error. Thus, these RAII primitives form an excellent suit of armor that prevents a large class of UAF bugs.
Unfortunately, this suit of armor can become a straitjacket that prohibits techniques that are both legal and useful.
Legal Leaks I
Such leaks are further complicated by the fact that not all pointer leaks are bugs. For one example:
rcu_read_lock();
p = rcu_dereference(gp);
do_something_with(p);
spin_lock(p->lock);
rcu_read_unlock();
do_something_more(*p);
spin_unlock(p->lock);
This pointer leak is perfectly fine because the ->lock protects the pointed-to object, and do_something_more() is under the protection of that ->lock. And this trick is not specific to locking: Reference counting may also be used to protect pointers leaked out of RCU read-side critical sections, a technique that is often used to handle long-term references. Plus another common technique is to use RCU to safely acquire either a lock or a reference, in both cases contained within the RCU-protected object. (Why not just directly acquire the lock or reference, without interference from RCU? Because without RCU's “interference”, the object might be freed in the midst of the attempted acquisition!)
All of this means that simple scripting will be hard-pressed to identify the truly buggy leaks.
Which is of course the purpose of the rcu_pointer_handoff() macro, an identity function whose purpose is to document that the protection for the pointer has passed from RCU to some other mechanism:
rcu_read_lock();
p = rcu_dereference(gp);
do_something_with(p);
spin_lock(p->lock);
p = rcu_pointer_handoff(p);
rcu_read_unlock();
do_something_more(*p);
spin_unlock(p->lock);
Unfortunately, rcu_pointer_handoff() is not used much, probably because there is not yet any RCU pointer-leak diagnostic tool that pays attention to it. It is nevertheless an excellent documentation tool.
Legal Leaks II
And it turns out that there are also perfectly legal non-handoff uses of leaked pointers, for example:
rcu_read_lock();
p = rcu_dereference(gp);
do_something_with(p);
rcu_read_unlock();
if (p)
do_something_if_notnull();
This code pattern is useful when do_something_if_notnull() might sleep, which is illegal within Linux-kernel RCU read-side critical sections.
In short, even though a pointer to an RCU-protected object has been leaked from its RCU reader, it may be still used in the same ways as a pointer obtained from rcu_access_pointer().
Why Are RCU Leaks Special?
This same pointer-leak problem exists with locking, so it is only fair to ask why we should worry so much about RCU pointer leaks. One answer is that RCU's grace-period delays make it much harder for tools such as KASAN to catch pointer leaks. So what can be done?
Locating Leaks
One approach is to build the kernel with the debug-only CONFIG_RCU_STRICT_GRACE_PERIOD=y Kconfig option, which reduces grace-period delays at the expense of exorbitant CPU consumption, limited scalability, and many IPIs. This gives KASAN a better (but not perfect) chance of detecting pointer leaks.
Another approach is to build with both CONFIG_SMP=n and CONFIG_PREEMPT_COUNT=y, and then modify RCU to make rcu_read_unlock() invoke all callbacks when preemption is enabled, again giving KASAN an assist. However, not all architectures support CONFIG_SMP=n (looking at you, arm64!) and this approach will fail to find pointer-leak scenarios requiring multiple CPUs to expose the leak, so I don't expect to implement the RCU portion of this technique any time soon.
Yet another approach would be to create an alternative RCU implementation with even more aggressive grace-period-latency-reduction properties than those of CONFIG_RCU_STRICT_GRACE_PERIOD=y. If a clear need for improvement appears, this is a possible way forward.
Lurking Leaks
But even a fixed version of the first example can run into trouble:
rcu_read_lock();
p = rcu_dereference(gp);
do_something_with(p);
do_something_more(*p);
rcu_read_unlock();
To see this, suppose that either do_something_with(p) or do_something_more(*p) executes rcu_read_unlock() followed by rcu_read_lock(). The code looks fine at first glance, but is still broken. And you have to review both do_something_with(p) and do_something_more(*p), along with all the functions that they call, both directly and indirectly, to spot the bug.
Polling for Lurking Leaks
What can be done about this?
Lorenzo Stoakes pointed out that this bug could be spotted by somehow capturing the grace-period state just after the rcu_read_lock() and then somehow checking it just before the rcu_read_unlock(). And Vlastimil Babka pointed out that a call to get_state_synchronize_rcu_full() would produce state that could be checked with WARN_ON_ONCE(poll_state_synchronize_rcu_full()).
This too, is imperfect. After all, the broken RCU reader does not guarantee that that a full grace period will complete before the call to poll_state_synchronize_rcu_full(). But all of the aforementioned tricks to speed up RCU grace periods also apply here.
In addition, consider the following possibility:
start_something(); // Invokes rcu_read_lock().
get_state_synchronize_rcu_full(&rgos);
p = rcu_dereference(gp);
do_something_with(p);
do_something_more(*p);
WARN_ON_ONCE(poll_state_synchronize_rcu_full(&rgos));
end_something(); // Invokes rcu_read_unlock().
In this case, the polling APIs do not cover the full extent of the RCU read-side critical section, which started somewhere in start_something() or some function that it called, and which ended somewhere in end_something() or again some function that it called. Of course, it is possible to push the calls to the two polling APIs down into start_something() and end_something(), respectively, but this require passing a pointer to rgos to those two functions. Which might be more natural for SRCU read-side critical sections, in which the return value from srcu_read_lock() must be passed to srcu_read_unlock(), which in turn suggests creating a structure containing both this return value and the rgos structure.
On the other hand, our pointer p was fetched after start_something() returned and was last used before end_something() was called, so maybe in the common case we don't care about the possibility of the reader being interrupted in either of those functions.
That said, more care is required if start_something() returns a pointer to an RCU-protected object, or if that same object is passed to end_something()!
Final Words
I also hope that better tooling becomes available, and who knows? Perhaps LLMs will help locate pointers that are improperly leaked from RCU read-side critical sections. And perhaps the Rust type system will also make its contribution to the solution of this problem.
I have one final question for you, given that I have been worried about RCU pointer leaks for many years, but I have not heard of many actually happening. Is this because Linux-kernel developers are admirably careful with their RCU read-side critical sections? Or is it because RCU grace periods are normally long enough that these developers are getting away with egregious RCU pointer-leak bugs? ;–)