ARM32 Page Tables

As I continue to describe in different postings how the ARM32 start-up sequence works, it becomes necessary to explain in-depth the basic kernel concepts around page tables and how it is implemented on ARM32 platforms.

To understand the paging setup, we need to repeat and extend some Linux paging lingo. Some good background is to read Mel Gormans description of the Linux page tables from his book “Understanding the Linux Virtual Memory Manager”. This book was published in 2007 and is based on Mel’s PhD thesis from 2003. Some stuff has happened in the 13 years since then, but the basics still hold. It is necessary to also understand the new layers in the page tables such as the five layers of page tables currently used in the Linux kernel.

First a primer: the ARM32 architecture with a classic MMU has 2 levels of page tables and the more recent LPAE (Large Physical Address Extension) MMU has 3 levels of page tables.

Only some of the ARMv7 architectures have LPAE, and it is only conditionally enabled, i.e. the machines can also use the classic MMU if they want, they have both. It is not enabled by default on the multi_v7 configuration: your machine has to explicitly turn it on during compilation. The layout is so different that the same binary image can never support both classic and LPAE MMU in the same kernel image.

Early implementations of ARMv7-A such as Cortex A8 and Cortex A9 do not support LPAE, rather it was introduced during the lifetime of this architecture. Since this is a compile-time setting, the default configuration for ARMv7 cannot enable it by default, or the older implementations would break. ARMv8 implementations all have LPAE enabled by default.

But let’s first discuss how Linux see the world of paging.

The Linux Kernel View of Page Tables

The generic paging management mechanism in the Linux kernel sees the world like this:

Paging directories Abstract page tables in the Linux kernel proceed from the PGD (page global directory) thru the P4D (fourth level directory), PUD (page upper directory), PMD (page middle directory) and down to the actual page table entries (PTEs) that maps individual pages pages of memory from virtual to physical address space.

Abbreviations:

First and foremost notice that these are all except PTE named something-directory. This is because they all contain several pointers down to objects of the next level. Like any directory. So each of the entities PGD, P4D … etc are arrays of pointers. The PTE despite having a singular form also contain several pointers.

The Linux kernel will act as if 5 levels of page tables exist. This is of course grossly over-engineered for ARM32 which has 2 or 3 levels of page tables, but we need to cater for the rest of the world. One size fits all. In practice, the code is organized such that these page tables “fold” and we mostly skip over the intermediate translation steps when possible.

The other thing you need to know about the page table hierarchy is that on the lowest level the PTEs contains a number of pointers, which are 1-to-1 translation chunks. On ARM32, on the PTE level we always have 512 pointers per PTE, and each pointer translates one 4KB page (0x1000) of memory.

Page Table Entry A PTE on ARM32 contains 512 pointers, i.e. translations between physical and virtual pages. From the generic kernel virtual memory management point of view it is not important how this translation actually happens in hardware, all it needs to know is that a page of size 0x1000 (4KB) is translated by a single PTE pointer entry. This way a PTE is a directory, just like everything else in the page hierarchy. In the example the first pointer in the PTE translates physical address 0x10000000 to virtual address 0xC0000000.

On the PMD level the classic ARM32 MMU has one pointer per PMD and 2048 pointers per PGD. The P4D and PUD levels are “folded” i.e. unused. Having one pointer per PMD seems vaguely pointless: we have level-1 in the 2048 pointers in the PGD and level-2 in the 512 pointers in the PTE. You rightfully ask what the point is to have a “three level” hierarchy with the middle directory having one pointer per PMD. This is a way of fitting the Linux idea about page hierarchy with the actual ARM32 architecture, and will be explained shortly.

Classic ARM MMU page table The classic ARM32 paging setup in Linux folds P4D and PUD, providing 2048 pointers per PGD, 1 pointer per PMD and 512 pointers per PTE. It uses 3 levels of the hierarchy while ARM32 hardware only has two. This is however a good fit, as we shall see. To the left the object relations, to the right an illustration of the tables.

On LPAE the story is simpler: each PGD has 4 pointers covering 1GB each (for a total of 4GB of memory) and 512 pointers per PMD dividing each 1GB into 2MB chunks, then 512 pointers per PTE dividing the 2MB chunks into 4KB chunks (pages). The math should match up. The LPAE MMU can of course put more pointers into the PGD to cover up to 1TB of memory. Currently no ARM32 architectures need this so we just hammer it down to 4GB maximum for kernelspace. 4GB of kernelspace memory should be enough for everyone. Userspace is another story.

LPAE MMU page table The LPAE page table also folds P4D and PUD but has something meaningful on the PMD level: 4 PGD entries covering 1 GB each covers the whole 32bit address space, then there are 512 pointers per PMD and 512 pointers per PTE. Each PTE entry translate 4KB of physical memory to virtual memory. If we would fill the PGD with all the 512 available entries it would span exactly 1 TB of memory.

So when we say that the classic ARM32 MMU has 2 levels of page tables, we are presenting this to Linux in a peculiar way as “3 levels” where the middle one is a single pointer, whereas the LPAE MMU actually has 3 levels. We are sorry for the confusion.

Obtaining Pointers to Directory Entries

The pointer to an index in a directory is obtained with special inlined accessors that take a virtual address as parameter named for example pmd_off() and those traverse the whole hierarchy to get to the offset of the right element in hierarchy:

static inline pmd_t *pmd_off(struct mm_struct *mm, unsigned long va)
{
        return pmd_offset(pud_offset(p4d_offset(pgd_offset(mm, va), va), va), va);
}

On ARM32 the P4D and PUD parts of this ladder will resolve to nothing, get “folded” and optimized out at compile-time. The struct mm_struct *mm argument is the actual memory manager context (for the kernel this is init_mm) which stores the pointer to the memory where the page global directory (PGD) actually resides. And that will the the symbol swapper_pg_dir if we are running kernel code. This is how it actually happens in include/linux/pgtable.h;

#define pgd_offset(mm, address)         pgd_offset_pgd((mm)->pgd, (address))

(...)

static inline pgd_t *pgd_offset_pgd(pgd_t *pgd, unsigned long address)
{
        return (pgd + pgd_index(address));
};

In kernelspace mode the mm argument is init_mm and the ->pgd pointer points to swapper_pg_dir, the kernel memory space root page table and equal to the location in virtual memory of the actual physical global page table. All index pointer accessors operate on this principle.

The ARM32 View of the Page Tables

The virtual address for the kernespace global page directory is PAGE_OFFSET+0x3000 or PAGE_OFFSET+0x4000 and PAGE_OFFSET depends on the kernel VMSPLIT, but is typically 0xC0000000, kernel memory from 0xC0000000..0xFFFFFFFF. On ARM32 this base address, in physical memory, is set in TTBR0 (Translation Table Base Register). Linux uses the symbol swapper_pg_dir for this address. This is 0x5000 bytes in size for LPAE and 0x4000 bytes in size for the classic ARM MMU. If the kernel starts at the physical address 0xnnnn8000 it is at 0xnnnn8000-PG_DIR_SIZE so at 0xnnnn3000..0xnnnn7FFF for LPAE and 0xnnnn4000..0xnnnn7FFF for any other ARM. The most common location is at 0xC0004000..0xC0007FFF in virtual memory.

PTE page table entries and the associated types pte_t and pteval_t is of course purely a Linux concept. And that shows. Because what ARM32 MMUs are thinking about is not PTEs but coarse 2- or 3-level page tables of 4KB pages. It would be neat if Linux was engineered in a more flexible way to allow for a coarse page table to be ONE PTE but it isn’t always the case, for old ARM32 systems it definitely is not.

“Coarse” in this context means we start at a level-1 descriptor and see that for this 1MB of virtual memory we are using some smaller 4K pages so we have to look further: go and walk the page table ladder using the modified virtual address I give you here. ARM MMUs also have a concept of “fine” pages of just 1K. Linux does not use these. Yet.

We are here dealing with PAGE_SIZE granularity, and Linux memory manager expects a PTE to be one page big and full of pointers so we will for example call arm_pte_alloc() to allocate a new level-2 page (or level-3 on LPAE) table of 0x1000 (4096) bytes. That much is simple.

The world is also reasonably simple on LPAE systems, because there the PTE is indeed one page, and it is populated by 512 entries of 64 bits / 8 bytes each, meaning we have 512 * 8 = 4096 bytes and it is a perfect match. Also these 64 bits fulfils the contract between Linux’ memory manager and the architecture of providing some MMU facilities such as a “dirty” bit. One PTE on a LPAE MMU is one page full of these coarse third level descriptors (remember that LPAE has three levels of pages). This nice fit will also be the case on the ARM64 / Aarch64 platform. End of story. Good for them!

What is not so simple is how Linux utilize these 4096 bytes on the classic oldschool ARM MMU, which is described in include/asm/pgtable-2level.h.

What is done on the classic MMU is that we put the course page table index into the modified virtual address (MVA) field of the level-1 descriptor. This “coarse page table index” is in bits 10-31 of the level-1 descriptor (which is half a PGD entry, more about that later) and that corresponds to the highest 21 bits of the physical address of the level-2 page table. What is peculiar about the course page table index is that it does not correspond to a page in memory. Instead it corresponds to a quarter of a page. This is logical because level-2 page tables are 32 bit / 4 bytes and we need 256 of them to cover 1MB which is the coarseness of the level-1 descriptor, thus we need 256 * 4 = 1024 = 0x400 bytes. That’s one quarter of a 4KB page. So the coarse page table index points to a location in virtual memory indexed by the 0x400:th chunk. Which is mildly confusing.

Further we have the following problem: ARMs classic MMU does not have all the “dirty” and “accessed”/”young” bits that the generic Linux virtual memory manager presuppose to exist in the architecture.

One could imagine telling the Linux virtual memory manager that we have only 256 pointers per PTE, put some metadata in the remainder but end up using only half a page and waste lots of memory for the PTEs.

The actual solution, as implemented on ARM32 (I think this was invented by Russell King), is to squeeze in as much as possible in a page as follows:

We then occupy half a page with these two sets of level-2 descriptors. To solve the mapping problem for different “accessed” or “young” bits, we use the half page that is left to hold some metadata about what Linux has requested so we can do a bit of back-and-forward emulation of the features that Linux virtual memory manager wants.

We have effectively posed to the kernel virtual memory manager that we have an MMU with three levels of page tables: PGD (2048 pointers), PMD (one pointer) and PTE (512 pointers), while in fact we have two, and then behind the back of the generic virtual memory manger we do this optimization to adjust the map to reality.

This is not so bad: the virtual memory manager is well aware that not the whole world uses all five levels of page tables it supports, so it has been written to “fold” levels like our artificial PMD already. This will work just fine.

Alas, this is not a very simple solution. But it is very efficient, counterintuitive and complicated. Since operating system development can be pretty hard in the details, this is what we can expect.

Classic page table The classic ARM32 MMU page table layout uses 32bit descriptors in two levels: two level-1 descriptors in the global page table (TTB, what Linux calls PGD, page global directory) are grouped into one PMD (page middle directory). There are 4096 level-1 descriptors grouped in pairs to form a PMD so there are 2048 PMDs. Then 256 + 256 = 512 level-1 page table pointers are grouped into one PTE (page table entry) with some metadata right above it. What Linux sees is a PMD managing 2MB and a PTE managing 256 4K pages which is also 2 MB. This way a PTE fills exactly one page of memory, which simplifies things. The 4K level-2 descriptors are referred to as “coarse”.

Level 1 and Level 2 classic descriptors The rough format of the 32bit Level-1 and Level-2 page descriptors used in the classic ARM32 MMU, which is what most ARM32 Linux kernels use. A “coarse” level-1 descriptor points to “small page” (4KB) page descriptor. As can be seen, the memory area covered at each level is mapped by simply using the high bits of the physical address as is common in MMUs. Some domain and access control is stuffed in in the remaining bits, the bits explicitly set to 0 or 1 should be like so.

An LPAE page table set-up on the other hand is simpler and deeper than the classic MMU: the descriptors are 64bit wide and include all the bits that Linux needs. Four 64bit level-1 descriptors/PGD pointers covering the first 4GB of physical memory (0x00000000-0xFFFFFFFF) at typically at 0xC0003000 points to the 512 level-2 descriptors/PMD pointers at 0xC0004000 and these are 64bit but cover 2MB each so they cover the same amount of virtual memory space as the classic MMU: instead of 1024 32 bit “coarse” entries covering 1MB each, we have 512 64bit entries covering 2MB each. This makes 1 PGD pointer correspond exactly to 1 level-1 page descriptor 1 PMD pointer correspond exactly to one level-2 page descriptor.

The level-2 descriptor points to a level-3 descriptor which is also 64bits wide and covers 4KB. At the third level, 512 4KB descriptors cover 2MB and perfectly fill out exactly one 4KB page of memory. Further, the descriptor layout in the level-3 descriptors fits hand-in-glove with Linux’ idea about what special bits it wants like “dirty”, “young” and “accessed”. PGD, PMD and PTE have identical concepts in hardware and this makes things much more intuitive. ARM64/Aarch64 also uses this style of page descriptors.

Classic page table The LPAE page table layout is simpler and more intuitive than the classic MMU. 64bit pointers on each level in three levels, where one descriptor corresponds to one PGD, PMD or PTE entry, we use 4 PGD pointers consecutive at 0xC0003000, one PMD has 512 64-bit pointers in each of the 4 pages at 0xC0004000, 0xC0005000, 0xC0006000 and 0xC0007000 to PTEs, and one PTE is exactly one page (0x1000 bytes) and consists of exactly 512 64-bit translation pointers.