What on Earth Does Pointer Provenance Have to do With RCU?

August 6, 2025

TL;DR: Unless you are doing very strange things with RCU (read-copy update), not much!!!

So why has the guy most responsible for Linux-kernel spent so much time over the past five years working on the provenance-related lifetime-end pointer zap within the C++ Standards Committee?

But first...

What is Pointer Provenance?

Back in the old days, provenance was for objets d'art and the like, and we did not need them for our pointers, no sirree!!! Pointers had bits, those bits formed memory addresses, and as often as not we didn't even need to worry about these addresses being translated. But life is more complicated now. On the other hand, computing life is also much bigger, faster, more reliable, and (usually) more productive, so be extremely careful what you wish for from back in the Good Old Days!

These days, pointers have provenance as well as addresses, and this has consequences. The C++ Standard (recent draft) states that when an object's storage duration ends, any pointers to that object become invalid. For its part, the C Standard states that when an object's storage duration ends, any pointers to that object become indeterminate. In both standards, the wording is more precise, but this will serve for our purposes.

For the remainder of this document, we will follow C++ and say “invalid”, which is shorter than “indeterminate”. We will balance this out by using C-language example code. Those preferring C++ will be happy to hear that this is the language that I use in my upcoming CPPCON presentation.

Neither standard places any constraints on what a compiler can do with an invalid pointer value, even if all you are doing is loading or storing that value.

Those of us who cut our teeth on assembly language might quite reasonably ask why anyone would even think to make pointers so grossly invalid that you cannot even load or store them. To see the historical reasons, let's start by looking at pointer comparisons using this code fragment:

p = kmalloc(...);
might_kfree(p);         // Pointer might become invalid (AKA "zapped")
q = kmalloc(...);       // Assume that the addresses of p and q are equal.
if (p == q)             // Compiler can optimize as "if (false)"!!!
    do_something();

Both p and q contain addresses, but the compiler also keeps track of the fact that their values were obtained from different invocations of kmalloc(). This information forms part of each pointer's provenance. This means that p and q have different provenance, which in turn means that the compiler does not need to generate any code for the p == q comparison. The two pointers' provenance differs, so no matter what the addresses might be, the result cannot be anything other than false.

And this is one motivation for pointer provenance and invalidity: The results of operations on invalid pointers are not guaranteed, which provides additional opportunities for optimization. This example perhaps seems a bit silly, but modern compilers can use pointer provenance and invalidity to carry out serious points-to and aliasing analysis.

Yes, you can have hardware provenance. Examples include ARM MTE, the CHERI research prototype (which last I checked had issues with C++'s requirement that pointers are trivially copiable), and the venerable IBM System i. Conventional systems provide pointer provenance of a sort via their page tables, which is used by a variety of memory-allocation-use debuggers, for but one example, the efence library. The pointer-provenance features of ARM MTE and IBM System i are not problematic, but last I checked, the jury was still out on CHERI.

Of course, using invalid (AKA “dangling”) pointers is known to be a bad idea. So why are we even talking about it???

Why Would Anyone Use Invalid/Dangling Pointers?

Please allow me to introduce you to the famous and frequently re-invented LIFO Push algorithm. You can find this in many places, but let's focus on the Linux kernel's llist_add_batch() and llist_del_all() functions. The former atomically pushes a list of elements on a linked-list stack, and the latter just as atomically removes the entire contents of the stack:

static inline bool llist_add_batch(struct llist_node *new_first,
                                   struct llist_node *new_last,
                                   struct llist_head *head)
{
    struct llist_node *first = READ_ONCE(head->first);

    do {
        new_last->next = first;
    } while (!try_cmpxchg(&head->first, &first, new_first));

    return !first;
}

static inline struct llist_node *llist_del_all(struct llist_head *head)
{
    return xchg(&head->first, NULL);
}

As lockless concurrent algorithms go, this one is pretty straightforward. The llist_add_batch() function reads the list header, fills in the ->next pointer, then does a compare-and-exchange operation to point the list header at the new first element. The llist_del_all() function is even simpler, doing a single atomic exchange operation to NULL out the list header and returning the elements that were previously on the list. This algorithm also has excellent forward-progress properties: the llist_add_batch() function is lock-free and the llist_del_all() function is wait-free.

So what is not to like?

In assembly language, or with a simple compiler, not much. But more heavily optimized languages have serious pointer-provenance issue with this code. To see them, consider the following sequence of events:

CPU 0 allocates an llist_node B and passes it via both the new_first and new_last parameters of llist_add_batch().
CPU 0 picks up the head->first pointer and places it in the first local variable, then assigns it to new_last->next. This new_last->next pointer now references llist_node A.
CPU 1 invokes llist_del_all(), which returns a list containing llist_node A. The caller of llist_del_all() processes A and passes it to kfree().
CPU 0's new_last->next pointer is now invalid due to llist_node A having been freed. But CPU 0 does not know this, though a sufficiently all-knowing compiler just might.
CPU 1 allocates an llist_node C that happens to have the same address as the old llist_node A. It passes C via both the new_first and new_last parameters of llist_add_batch(), which runs to completion. The head pointer now points to llist_node C, which happens to have the same address as the now storage-duration-ended llist_node A. However, the two pointers reference objects created by different memory-allocation calls, and thus have different provenance, and thus are not necessarily equal.
CPU 0 finally gets around to executing its try_cmpxchg(), which will succeed, courtesy of the fact that try_cmpxchg() compares only the bits actually represented in the pointer, and not any implicit pointer provenance (and please note that the same is true of both the C and C++ compare-and-exchange operations). The llist now contains an llist_node B that contains an invalid pointer to dead llist_node A, but whose address happens to reference the shiny new llist_node C. (We term this invalid pointer a “zombie pointer” because it has in some assembly-language sense come back from the dead.)
Some CPU invokes llist_del_all() and gets back an llist containing an invalid ->next pointer.

One could argue that the Linux-kernel implementation of LIFO Push is simply buggy and should be fixed. Except that there is no reasonable way to fix it. Which of course raises the question...

What Are Unreasonable Fixes?

We can protect pointers from invalidity by storing them as integers, but:

Suppose someone has an element that they are passing to a library function. They should not be required to convert all their ->next pointers to integer just because the library's developers decide to switch to the LIFO Push algorithm for some obscure internal operation.
In addition, switching to integer defeats type-checking, because integers are integers no matter what type of pointer they came from.
We could restore some type-checking capability by wrapping the integer into a differently named struct for each pointer type. Except that this requires a struct with some particular name to be treated as compatible with pointers of some type corresponding to that name, a notion that current compilers do not support.
In C++, we could use template metaprogramming to wrap an integer into a class that converts automatically to and from compatibly typed pointers. But there would then be windows of time in which there was a real pointer, and at that time there would still be the possibility of pointer invalidity.
All of the above hack-arounds put additional obstacles in the way of developers of concurrent software.

Alternatively, in environments such as the Linux kernel that provides their own memory allocators, we can hide them from the compiler. But this is not free, in fact, the patch that exposed the Linux-kernel's memory allocators to the compiler resulted in a small but significant improvement.

However, it is fair to ask...

Why Do We Care About Strange New Algorithms???

Let's take a look at the history, courtesy of Maged Michael's diligent software archaeology.

In 1986, R. K. Treiber presented an assembly language implementation of the LIFO Push algorithm in technical report RJ 5118 entitled “Systems Programming: Coping with Parallelism” while at the IBM Almaden Research Center.

In 1975, an assembly language implementation of this same algorithm (except with pop() instead of popall(), but still having the same ABA properties) was presented in the IBM System 370 Principles of Operation as a method for managing a concurrent freelist.

US Patent 3,886,525 was filed in June 1973, just a few months before I wrote my first line of code, and contains a prior-art reference to the LIFO Push algorithm (again with pop() instead of popall()) as follows: “Conditional swapping of a single address is sufficient to program a last-in, first-out single-user-at-a-time sequencing mechanism.” (If you were to ask a patent attorney, you would likely be told that this 50-year-old patent has long since expired. Which should be no surprise, given that it is even older than Dennis Ritchie's setuid Patent 4,135,240.)

All three of these references describe LIFO push as if it was straightforward and well known.

So we don’t know who first invented LIFO Push or when they invented it, but it was well known in 1973. Which is well over a decade before C was first standardized, more than two decades before C++ was first standardized, and even longer before Rust was even thought of.

And its combination of (relative) simplicity and excellent forward-progress properties just might be why this algorithm was anonymously invented so long ago and why it is so persistently and repeatedly reinvented. This frequent reinvention puts paid to any notion that LIFO Push is strange.

So sorry, but LIFO Push is neither new nor strange.

Nor is it the only situation where lifetime-end pointer zap causes problems. Please see the “Zap-Susceptible Algorithms” section of P1726R5 (“Pointer lifetime-end zap and provenance, too”) for additional use cases.

So What Do We Do?

The lifetime-end pointer-zap story is not yet over, and we are in fact currently pushing for the changes in four working papers.

Nondeterministic Pointer Provenance

P2434R4 (“Nondeterministic pointer provenance”) is the basis for the other three papers. It asks that when converting a pointer to an integer and back, the implementation must choose a qualifying pointed-to object (if there is one) whose storage duration began before or concurrently with the conversion back to a pointer. In particular, the implementation is free to ignore a qualifying pointed-to object when the conversion to pointer happens before the beginning of that object’s storage duration.

The “qualifying” qualifier includes compatible type, as well as sufficiently early and long storage duration.

But why restrict the qualifying pointed-to object's storage duration to begin before or concurrently with the conversion back to a pointer?

An instructive example by Hans Boehm may be found in P2434R4, which shows that reasonable (and more important, very heavily used) optimizations would be invalidated by this approach. Several examples that manage to be even more sobering may be found in David Goldblatt's P3292R0 (“Provenance and Concurrency”).

Pointer Lifetime-End Zap Proposed Solutions: Atomics and Volatile

P2414R10 (“Pointer lifetime-end zap proposed solutions: Atomics and volatile”) is motivated by the observation that atomic pointers are subject to update at any time by any thread, which means that the compiler cannot reasonably do much in the way of optimization. This paper therefore asks (1) that atomic operations be redefined to yield and to store prospective pointers values and (2) that operations on volatile pointers be defined to yield and to store prospective pointer values. The effect is as if atomic pointers were stored internally as integers. This includes the “old” pointer passed by reference to compare_exchange().

This helps, but is not a full solution because atomic pointers are converted to non-atomic pointers prior to use, at which point they are subject to lifetime-end pointer zap. And the standard does not even guarantee that a zapped pointer can even be loaded, stored, passed to a function, or returned from a function. Which brings us to the next paper.

Pointer Lifetime-End Zap Proposed Solutions: Tighten IDB for Invalid Pointers

P3347R4 (“Pointer lifetime-end zap proposed solutions: Tighten IDB for invalid pointers”) therefore asks that all non-comparison non-arithmetic non-dereference computations involving pointers, specifically including normal loads and stores, are fully defined even if the pointers are invalid. This permits invalid pointers to be loaded, stored, passed as arguments, and returned. Fully defining comparisons would rule out optimizations, and fully defining arithmetic would be complex and thus far unneeded. Fully defining dereferencing of invalid pointers would of course be problematic.

If these first three papers are accepted into the standard, the C++ implementation of LIFO Push show above becomes valid code. This is important because this algorithm has been re-invented many times over the past half century, and is often open coded. This frequent open coding makes it infeasible to construct tools that find LIFO Push implementations in existing code.

P3790R1: Pointer Lifetime-End Zap Proposed Solutions: Bag-of-Bits Pointer Class

P3790R1 (“Pointer lifetime-end zap proposed solutions: Bag-of-bits pointer class”) asks that (1) the addition to the C++ standard library of the function launder_ptr_bits() that takes a pointer argument and returns a prospective pointer value corresponding to its argument; and (2) the addition to the C++ standard library of the class template std::ptr_bits<T> that is a pointer-like type that is still usable after the pointed-to object’s lifetime has ended. Of course, such a pointer still cannot be dereferenced unless there is a live object at that pointer's address. Furthermore, some systems, such as ARMv9 with memory tagging extensions (MTE) enabled have provenance as well as address bits in the pointer, and on such systems dereferencing will fail unless the pointer's provenance bits happen to match those of the pointed-to object.

This function and template class is nevertheless quite useful, for example, it may be used to maintain hash maps keyed by pointers after the pointed-to object's lifetime has ended. These can be extremely useful for debugging, especially in cases where the overhead of full-up address sanitizers cannot be tolerated.

Unlike LIFO Push, source-code changes are required for these use cases. This is unfortunate, but we have thus far been unable to come up with a same-source-code approach.

Those who have participated in standards work (or even open-source work) will understand that the names launder_ptr_bits() and std::ptr_bits<T> just might still be subject to bikeshedding.

A Happen Lifetime-End Pointer Zap Ending?

It is still too early to say for certain, but thus far these proposals are making much better progress than did their predecessors. So who knows? Perhaps C++29 will address lifetime-end pointer zap.