Why does the Linux kernel build system use incremental linking or ar T thin archives?

Question

While studying the kernel build system, I noticed that before v4.19 the kernel was using incremental linking (ld -r) and then it moved to thin archives (ar T) as shown at: What is the difference between the following kernel Makefile terms: vmLinux, vmlinuz, vmlinux.bin, zimage & bzimage? I noticed

Then, I tried to make a synthetic incremental linking benchmark to see if the link speedup was considerable at: https://stackoverflow.com/questions/29391965/what-is-partial-linking-in-gnu-linker/53959624#53959624 but it wasn't for my benchmark.

Therefore, my question is: why does the kernel use incremental linking or thin archives?

Is it to speed up the build or for some other reason?

Which commit introduced incremental linking? With that I would be able to figure out the rationale from git log. I found the one that moved to thin archives with git log --grep 'thin archive' (a5967db9af51a84f5e181600954714a9e4c69f1f), but could not easily grep the incremental linking one.

If it exists to speed up the build, is there a way to quickly test out link with vs without incremental linking to see the speedup?

score 2 · Answer 1 · answered Jan 17 '19 at 17:29

Reason for thin archives

I've pinged Nicholas Piggin by email, who is one of the authors of the patches, and he explained that the thin archives are not just to reduce disk usage, but they can also prevent a link failure.

The problem is that the incremental linked object files could get so large, that the linker cannot insert even trampoline relocations, which must point to generated code that goes between the objects.

I didn't get a reply for the rationale for the incremental builds yet.

This is his awesome reply:

It's a pretty long answer depending on how much you know. There are a few reasons. Stephen's primary motivation for the patch was to allow very large kernels to link successfully.

Some other benefits are:

It is a "nicer" way to store the intermediate build artifacts, you keep the output code in a single place and track them with references (thin archives) until it's all linked together. So there is less IO and disk space required, particularly with big builds and debug info.

For the average modern workstation just building to a small number of output directories, and Linux is not really a huge project, this will all be in cache and the time to incremental link files is very fast. So build speed benefit is usually pretty small for Linux.

It allows the linker to generate slightly better code. By rearranging files and locating linker stubs more optimally.

It tends to work much better with LTO builds, although there's not much support for LTO builds upstream yet.

But we'll get back to the primary motivation.

When you build a relocatable object file that hasn't been finally linked, you have a blob of code with a bunch of references to symbols for functions and variables that are defined elsewhere.
--- a.S ---
bl      myfunc
---
assembles into
a.o:     file format elf64-powerpcle
Disassembly of section .text:
0000000000000000 <.text>:
   0:   01 00 00 48     bl      0x0
So the code has a branch to NIA+0 (i.e., itself), which is not what we asked for. Dumping relocations shows the missing bit:

Disassembly of section .text:
0000000000000000 <.text>:
   0:   01 00 00 48     bl      0x0
                    0: R_PPC64_REL24        myfunc
The relocation is not in the .text section, it's not code, but it is some ELF metadata which says the instruction at this location has a 24-bit relative offset to a symbol called myfunc.

When you do a "final link" of objects together, the files are basically concatenated together, and these relocations are resolved by adjusting code and data to point to the correct locations.

Linking a.S with b.S that contains myfunc symbol gives this:
c:     file format elf64-powerpcle


Disassembly of section .text:

00000000100000d8 <_start>:
    100000d8:   05 00 00 48     bl      100000dc <myfunc>

00000000100000dc <myfunc>:
    100000dc:   01 00 63 38     addi    r3,r3,1
    100000e0:   20 00 80 4e     blr
Relocation metadata is stripped, branch points to correct offset.

So the linker actually adjusts instructions as it links. It goes one further than that, it generates instructions. If you have a big build and this branch cannot reach myfunc with a 24-bit offset, the linker will put a trampoline (aka stub aka PLT aka procedure linkage table) into the code which can be reached in 24-bits, then the trampoline uses a longer branch that can reach the target.

The linker can't just put these trampolines anywhere in the code, because if you add something in the middle of code, that breaks relative references that go across the middle. The linker does not know about all references in a .o file, only the unresolved ones. So the linker must only put trampolines between .o files when it links them together, before it resolves their references.

The old incremental build approach just combines .o files into bigger .o files as you get closer to the root of the build directory. So you run into a problem when your .o files become so large that the branch can not reach outside its own .o file in order to reach a trampoline. There is no way to resolve this reference.

With thin archives, the final link is done on thousands of very small .o files. This gives the linker maximum flexibility to place these trampolines, which means you never encounter this limitation.

score 1 · Answer 2 · answered Jan 07 '19 at 16:01

1

I don't have an answer to the "why?" part of your question, but incremental linking is used since Linux 0.97 (August 1, 1992) at least:

OBJS= namei.o inode.o file.o dir.o misc.o fat.o

msdos.o: $(OBJS)
    $(LD) -r -o msdos.o $(OBJS)

or

OBJS= bitmap.o freelists.o truncate.o namei.o inode.o \
file.o dir.o symlink.o blkdev.o chrdev.o fifo.o

ext.o: $(OBJS)
    $(LD) -r -o ext.o $(OBJS)

https://github.com/mpe/linux-fullhistory/commit/e60feb868bfa9d248c71a1a3bdd8c2857f1d433d

However, these ancient commits don't mention the rationale, so you'd probably have to ask Linus why exactly is it done like this and not linked all in one go. I suspect it's mainly to keep the buildsystem nicely modular, instead of having one giant link line listing all the objects.

You would have to present a really strong case if you wanted to change it now, as such a big structural change to the buildsystem would not be done just for fun.

answered Jan 07 '19 at 16:01

TooTea

2,298
9
15

Cool, I didn't know about that fullhistory thing! I definitely don't want to waste time patching anything that works, just curious why it works like that to possibly reuse the technique in other projects :-) – Ciro Santilli OurBigBook.com Jan 07 '19 at 16:04
One theory is that it just would blow up the command line max length. – Ciro Santilli OurBigBook.com Jan 07 '19 at 16:08
Linker memory usage may also be (or have been) a factor, but I'm just guessing. – TooTea Jan 07 '19 at 16:11
Got a reply for the thin archives BTW: https://unix.stackexchange.com/a/495127/32558 – Ciro Santilli OurBigBook.com Jan 17 '19 at 17:30

Why does the Linux kernel build system use incremental linking or ar T thin archives?

2 Answers2