Reason for thin archives
I've pinged Nicholas Piggin by email, who is one of the authors of the patches, and he explained that the thin archives are not just to reduce disk usage, but they can also prevent a link failure.
The problem is that the incremental linked object files could get so large, that the linker cannot insert even trampoline relocations, which must point to generated code that goes between the objects.
I didn't get a reply for the rationale for the incremental builds yet.
This is his awesome reply:
It's a pretty long answer depending on how much you know. There are a
few reasons. Stephen's primary motivation for the patch was to allow
very large kernels to link successfully.
Some other benefits are:
It is a "nicer" way to store the intermediate build artifacts, you
keep the output code in a single place and track them with references
(thin archives) until it's all linked together. So there is less IO
and disk space required, particularly with big builds and debug info.
For the average modern workstation just building to a small number of
output directories, and Linux is not really a huge project, this will
all be in cache and the time to incremental link files is very fast.
So build speed benefit is usually pretty small for Linux.
It allows the linker to generate slightly better code. By rearranging
files and locating linker stubs more optimally.
It tends to work much better with LTO builds, although there's not
much support for LTO builds upstream yet.
But we'll get back to the primary motivation.
When you build a relocatable object file that hasn't been finally
linked, you have a blob of code with a bunch of references to symbols
for functions and variables that are defined elsewhere.
--- a.S ---
bl myfunc
---
assembles into
a.o: file format elf64-powerpcle
Disassembly of section .text:
0000000000000000 <.text>:
0: 01 00 00 48 bl 0x0
So the code has a branch to NIA+0 (i.e., itself), which is not what we
asked for. Dumping relocations shows the missing bit:
Disassembly of section .text:
0000000000000000 <.text>:
0: 01 00 00 48 bl 0x0
0: R_PPC64_REL24 myfunc
The relocation is not in the .text section, it's not code, but it is
some ELF metadata which says the instruction at this location has a
24-bit relative offset to a symbol called myfunc.
When you do a "final link" of objects together, the files are basically
concatenated together, and these relocations are resolved by adjusting
code and data to point to the correct locations.
Linking a.S with b.S that contains myfunc symbol gives this:
c: file format elf64-powerpcle
Disassembly of section .text:
00000000100000d8 <_start>:
100000d8: 05 00 00 48 bl 100000dc <myfunc>
00000000100000dc <myfunc>:
100000dc: 01 00 63 38 addi r3,r3,1
100000e0: 20 00 80 4e blr
Relocation metadata is stripped, branch points to correct offset.
So the linker actually adjusts instructions as it links. It goes one
further than that, it generates instructions. If you have a big build
and this branch cannot reach myfunc with a 24-bit offset, the linker
will put a trampoline (aka stub aka PLT aka procedure linkage table)
into the code which can be reached in 24-bits, then the trampoline uses
a longer branch that can reach the target.
The linker can't just put these trampolines anywhere in the code,
because if you add something in the middle of code, that breaks relative
references that go across the middle. The linker does not know about
all references in a .o file, only the unresolved ones. So the linker
must only put trampolines between .o files when it links them together,
before it resolves their references.
The old incremental build approach just combines .o files into bigger
.o files as you get closer to the root of the build directory. So you
run into a problem when your .o files become so large that the branch
can not reach outside its own .o file in order to reach a trampoline.
There is no way to resolve this reference.
With thin archives, the final link is done on thousands of very small .o
files. This gives the linker maximum flexibility to place these
trampolines, which means you never encounter this limitation.