How the ARM32 Linux kernel decompresses
ARM traditionally uses compressed kernels. This is done for two major reasons:
- It saves space on the flash memory or other storage media holding the kernel, and memory is money. For example for the Gemini platform that I work on, the
vmlinuxuncompressed kernel is 11.8 MB while the compressed
zImageis a mere 4.8 MB, we save more than 50%
- It is faster to load because the time it takes for the decompression to run is shorter than the time that it takes to transfer an uncompressed image from the storage media, such as flash. For NAND flash controllers this can easily be the case.
This is intended as a comprehensive rundown of how the Linux kernel self-decompresses on ARM 32-bit legacy systems. All machines under
arch/arm/* uses this method if they are booted using a compressed kernel, and most of them are using compressed kernels.
The bootloader, whether RedBoot, U-Boot or EFI places the kernel image somewhere in physical memory and executes it passing some parameters in the lower registers.
Russell King defined the ABI for booting the Linux kernel from a bootloader in 2002 in the Booting ARM Linux document. The boot loader puts 0 into register
r0, an architecture ID into register
r1 and a pointer to the ATAGs in register
r2. The ATAGs would contain the location and size of the physical memory. The kernel would be placed somewhere in this memory. It can be executed from any address as long as the decompressed kernel fits. The boot loader then jumps to the kernel in supervisor mode, with all interrupts, MMUs and caches disabled.
On contemporary device tree kernels,
r2 is repurposed as a pointer to the device tree blob (DTB) in physical memory. (In this case
r1 is ignored.) A DTB can also be appended to the kernel image, and optionally amended using the ATAGs from
r2. We will discuss this more below.
Decompression of the zImage
If the kernel is compressed, execution begins in
arch/arm/boot/compressed/head.S in the symbol
start: a little bit down the file. (This is not immediately evident.) It begins with 8 or 7
NOP instructions for legacy reasons. It jumps over some magic numbers and saves the pointer to the ATAGs. So now the kernel decompression code is executing from the physical address of the physical memory where it was loaded.
The decompression code then locates the start of physical memory. On most modern platforms this is done with the Kconfig-selected code
AUTO_ZRELADDR, which means a logical AND between the program counter and
0xf8000000. This means that the kernel readily assumes that it has been loaded and executed in the first part of the first block of physical memory.
There are patches being made that would instead attempt to get this information from the device tree.
TEXT_OFFSET is added to the pointer to the start of physical memory. As the name says, this is where the kernel
.text segment (as output from the compiler) should be located. The
.text segment contains the executable code so this is the actual starting address of the kernel after decompression. The
TEXT_OFFSET is usually
0x8000 so the kernel will be located
0x8000 bytes into the physical memory. This is defined in
0x8000 (32KB) offset is a convention, because usually there is some immobile architecture-specific data placed at
0x00000000 such as interrupt vectors, and many elder systems place the ATAGs at
0x00000100. There also must be some space, because when the kernel finally boots, it will subtract
0x5000 for LPAE) from this address and store the initial kernel page table there.
For some specific platforms the
TEXT_OFFSET will be pushed downwards in memory, notably some Qualcomm platforms will push it to
0x00208000 because the first
0x00200000 (2 MB) of the physical memory is used for shared memory communication with the modem CPU.
Next the decompression code sets up a page table, if it is possible to fit one over the whole uncompressed+compressed kernel image. The page table is not for virtual memory, but for enabling cache, which is then turned on. The decompression will for natural reasons be much faster if we can use cache.
Next the kernel sets up a local stack pointer and
malloc() area so we can handle subroutine calls and small memory allocation going forward, executing code written in C. This is set to point right after the end of the kernel image.
Compressed kernel in memory with an attached DTB.
Next we check for an appended DTB blob enabled by the
ARM_APPENDED_DTB symbol. This is a DTB that is added to the
zImage during build, often with the simple
cat foo.dtb >> zImage. The DTB is identified using a magic number,
If an appended DTB is found, and
CONFIG_ARM_ATAG_DTB_COMPAT is set, we first expand the DTB by 50% and call atagstofdt that will augment the DTB with information from the ATAGs, such as memory blocks and sizes.
Next. the DTB pointer (what was passed in as
r2 in the beginning) is overwritten with a pointer to the appended DTB, we also save the size of the DTB, and set the end of the kernel image after the DTB so the appended DTB (optionally modified with the ATAGs) is included in the total size of the compressed kernel. If an appended DTB was found, we also bump the stack and the
malloc() location so we don’t destroy the DTB.
Notice: if a device tree pointer was passed in in
r2, and an appended DTB was also supplied, the appended DTB “wins” and is what the system will use. This can sometimes be used to override a default DTB passed by a boot loader.
Notice: if ATAGs were passed in in
r2, there certainly was no DTB passed in through that register. You almost always want the
CONFIG_ARM_ATAG_DTB_COMPAT symbol if you use an elder boot loader that you do not want to replace, as the ATAGs properly defines the memory on elder platforms. It is possible to define the memory in the device tree, but more often than not, people skip this and rely on the boot loader to provide this, one way (the bootloader alters the DTB) or another (the ATAGs augment the appended DTB at boot).
The decompressed kernel may overlap the compressed kernel.
Next we check if we would overwrite the compressed kernel with the uncompressed kernel. That would be unfortunate. If this would happen, we check where in the memory the uncompressed kernel would end, and then we copy ourselves (the compressed kernel) past that location.
Then the code simply does a trick to jump back to the relocated address of a label called
restart: which is the start of the code to set up the stack pointer and
malloc() area, but now executing at the new physical address.
This means it will again set up the stack and
malloc() area and look for the appended DTB and everything will look like the kernel was loaded in this location to begin with. (With one difference though: we have already augmented the DTB with ATAGs, so that will not be done again.) This time the uncompressed kernel will not overwrite the compressed kernel.
We move the compressed kernel down so the decompressed kernel can fit.
There is no check for if the memory runs out, i.e. if we would happen to copy the kernel beyond the end of the physical memory. If this happens, the result is unpredictable. This can happen if the memory is 8MB or less, in these situations: do not use compressed kernels.
The compressed kernel is moved below the decompressed kernel.
Now we know that the kernel can be decompressed into a memory that is below the compressed image and that they will not collide during decompression and we execute at the label
We check if we are executing on the address the decompressor was linked to, and possibly alter some pointer tables. This is for the C runtime environment executing the decompressor.
We make sure that the caches are turned on. (There is not certainly space for a page table.)
We clear the BSS area (so all uninitialized variables will be 0), also for the C runtime environment.
Next we call the
decompress_kernel() symbol in
boot/compressed/misc.c which in turn calls
do_decompress() which calls
__decompress() which will perform the actual decompression.
This is implemented in C and the type of decompression is different depending on Kconfig options: the same decompressor as the compression selected when building the kernel will be linked into the image and executed from physical memory. All architectures share the same decompression library. The
__decompress() function called will depend on which of the decompressors in
lib/decompress_*.c that was linked into the image. The selection of decompressor happens in
arch/arm/boot/compressed/decompress.c by simply including the whole decompressor into the file.
All the variables the decompressor needs about the location of the compressed kernel are set up in the registers before calling the decompressor.
After decompression, the decompressed kernel is at
TEXT_OFFSET and the appended DTB (if any) remains where the compressed kernel was.
After the decompression, we call
get_inflated_image_size() to get the size of the final, decompressed kernel. We then flush and turn off the caches again.
We then jump to the symbol
__enter_kernel which sets
r2 as the boot loader would have left them, unless we have an attached device tree blob, in which case
r2 now points to that DTB. We then set the program counter to the start of the kernel, which will be the start of physical memory plus
0x00008000 on a very conventional system, maybe
0x20008000 on some Qualcomm systems.
We are now at the same point as if we had loaded an uncompressed kernel, the vmlinux file, into memory at
TEXT_OFFSET, passing (typically) a device tree in
Kernel startup: executing vmlinux
The uncompressed kernel begins executing at the symbol
stext(), start of text segment. This code can be found in
This is a subject of another discussion. However notice that the code here does not look for an appended device tree! If an appended device tree should be used, you must use a compressed kernel. The same goes for augmenting any device tree with ATAGs. That must also use a compressed kernel image, for the code to do this is part of the assembly that bootstraps a compressed kernel.
Looking closer at a kernel uncompress
Let us look closer at a Qualcomm APQ8060 decompression.
First you need to enable
CONFIG_DEBUG_LL, which enables you to hammer out characters on the UART console without any intervention of any higher printing mechanisms. All it does is to provide a physical address to the UART and routines to poll for pushing out characters. It sets up
DEBUG_UART_PHYS so that the kernel knows where the physical UART I/O area is located. Make sure these definitions are correct.
First enable a Kconfig option called
CONFIG_DEBUG_UNCOMPRESS. All this does is to print the short message “Uncompressing Linux…” before decompressing the kernel and , “done, booting the kernel” after the decompression. It is a nice smoke test to show that the
CONFIG_DEBUG_LL is set up and
DEBUG_UART_PHYS is correct and decompression is working but not much more. This does not provide any low-level debug.
The actual kernel decompression can be debugged and inspected by enabling the
DEBUG define in
arch/arm/boot/compressed/head.S, this is easiest done by tagging on
-DDEBUG to the
AFLAGS (assembler flags) for
head.S in the
arch/arm/boot/compressed/Makefile like this:
AFLAGS_head.o += -DTEXT_OFFSET=$(TEXT_OFFSET) -DDEBUG
We then get this message when booting:
This means that as we were booting I loaded the kernel to
0x40300000 which would collide with the uncompressed kernel. Therefore the kernel was copied to
0x41801D00 which is where the uncompressed kernel will end. Adding some further debug prints we can see that an appended DTB is first found at
0x40DEBA68 and after moving the kernel down it is found at
0x422E56A8, which is where it remains when the kernel is booted.