Why are there pre-compiled packages in repositories?

Question

I love (the way) how Linux & Co. lets users install many packages from different repositories. AFAIK, they come also with source-packages, so you can compile them by yourself.

But why even bother to "keep/offer" pre-compiled packages, when you could just compile them yourself? What are the intentions of keeping/offering them?

Is it possible to configure Linux, to only download source packages and let the OS do the rest? (Just like a pre-compiled package installation?)

Thank you for your answers.

I'm not expert in this field, but I would think that it is a service to us, end users. It is much easier and faster to install pre-compiled and tested packages compared to having to compile them. — sudodus, Jul 11 '22 at 11:55
You might be interested in something like [Gentoo](https://www.gentoo.org/). You can compile everything from source and design your own system and compilation options. Just be prepared to spend a lot of time on compilations to keep packages up to date. — doneal24, Jul 11 '22 at 11:57
[actual question on reddit](https://www.reddit.com/r/Gentoo/comments/66517y/libreoffice_has_been_compiling_for_6_hours/): _"Libreoffice has been compiling for 6 hours. Is there any way to speed it up? I need to write a paper in it due tomorrow."_ - I can see how that's not for everyone :P — marcelm, Jul 12 '22 at 08:55
Besides the time wasted, think of all the MWh of extra energy that would be needed if everyone was compiling all their software separately (and effect on battery for mobile devices). — Stéphane Chazelas, Jul 12 '22 at 11:35
@StéphaneChazelas: Thanks for your concern about energy consumption. I feel less alone. — Eric Duminil, Jul 14 '22 at 10:51

Stephen Kitt · Accepted Answer · 2022-07-11T13:19:50.470

57

It’s a trade-off: distributions which provide pre-built packages spend the time building them once (in all the configurations they support), and their users can then install them without spending the time to build them. The users accept the distributions’ binaries as-is. If you consider the number of package installations for some of the larger distributions, the time saved by not requiring recompilation everywhere is considerable.

There are some distributions which ship source and the infrastructure required to build it, and rely on users to build everything locally; see for example Gentoo. This allows users to control exactly how their packages are built.

If you go down this path, even with the time savings you can get by simplifying package builds, you should be ready to spend a lot of time building packages. I don’t maintain the most complex packages in Debian, but one of my packages takes over two hours to build on 64-bit x86 builders, and over twelve hours on slower architectures!

edited Jul 11 '22 at 13:19

answered Jul 11 '22 at 12:19

Stephen Kitt

411,918
54
1,065
1,164

3

Apart from a great answer, thanks for linking that build log. I just scrolled through all of it. Even that took quite a while. It boggles the mind. – Oliphaunt Jul 11 '22 at 21:12
8

@Oliphaunt And GCC is not even one of the particularly bad offenders. LLVM (which has essentially become mandatory for building a GPU accelerated graphical environment on Linux) takes about 10-20% longer in my experience, and most web browsers take more like 50-60% longer _at minimum_ (fun fact, even when throwing all of a Ryzen 9 3950X (32 threads of execution) and 64GB of DDR4-2666 RAM at it, webkit-gtk still takes more than 45 minutes to build). – Austin Hemmelgarn Jul 11 '22 at 23:41
14

The worst offenders are [Chromium](https://buildd.debian.org/status/fetch.php?pkg=chromium&arch=amd64&ver=103.0.5060.114-1&stamp=1657530943&raw=0) (over ten hours on x86) and [LibreOffice](https://buildd.debian.org/status/fetch.php?pkg=libreoffice&arch=amd64&ver=1%3A7.3.5%7Erc1-1&stamp=1656768667&raw=0) (over six hours on x86), far ahead of any compiler (LLVM 13 takes around three hours, just a few minutes more than GCC 12). – Stephen Kitt Jul 12 '22 at 09:35
2

Apart from the time spent, I don't even want to think about trying to build say Chromium with say only 4GB of memory (and if you have only a small SSD the extra disk space required might be a problem as well). – Voo Jul 12 '22 at 09:53
5

@Voo I still have memories of building X11 (or gcc) from source in the early '90s. *If* everything went well the process would take 1-2 days. Not a practice that I want to go back to. – doneal24 Jul 12 '22 at 16:58

Artem S. Tashkinov · Answer 2 · 2022-07-12T12:12:04.943

26

You imply that all people have enough CPU/RAM/storage/time/knowledge to compile packages. Nope, not the case, the opposite is true, absolute most people don't want to wait e.g. hours to compile something. Firefox compilation on a Raspberry Pi may take several weeks. Is this OK? Nope.
The second reason is that Linux distros build packages in a controlled malware-free properly functioning environment which is not guaranteed for the end user.
All the users of the distro in the end run absolutely the same code which aids in reporting bugs and debugging which might not be true for users deciding to compile software on their own.
Many modern Linux distros support secure boot which requires signing packages with keys which cannot be distributed among end users because it would mean your trust chain is completely broken.

You're welcome to build everything using Gentoo, LFS or the AUR repository in Arch Linux. Actually software packages can be compiled in most distros but the above three were created with compilation in mind.

edited Jul 12 '22 at 12:12

answered Jul 11 '22 at 13:24

Artem S. Tashkinov

26,392
4
33
64

2

Arch distributes packages as binaries, unlike Gentoo and LFS. – Compizfox Jul 12 '22 at 11:05
@Compizfox artem is probably referring to the AUR, which does indeed have source packages (or more like makefiles for packages) – Tamoghna Chowdhury Jul 12 '22 at 12:09
1

@Compizfox I've edited the answer to accommodate your notice. – Artem S. Tashkinov Jul 12 '22 at 12:12
1

@TamoghnaChowdhury True, but the AUR does not exclusively contain source packages. Many are binaries. More importantly, the AUR is not the main repository of Arch. It's a completely unofficial, user-maintained repository. – Compizfox Jul 12 '22 at 13:09
@Compizfox If you insist I'll remove Arch from the list. :-) – Artem S. Tashkinov Jul 12 '22 at 13:36
2

Regarding 2, I generally trust the distros I use and their package maintainers, but I would consider user-compiled packages to be strictly safer. If the user is infected, they're already infected. However, leaving the compilation to a distro's maintainers introduces the possibility of someone being a bad actor and including malware in the compiled packages. While I trust both source and pre-built packages to be generally safe, I wouldn't consider pre-built packages to be safer than source packages. The argument for choosing source packages is primarily safety. – JoL Jul 13 '22 at 00:08
6

To give an extreme example of your first point, the oldest computer I've got running Linux is a Pentium MMX. It took four months to compile GCC 6.4; extrapolating from the GCC release timeline and the compilation speed on newer computers, I'd expect GCC 13 (and possibly GCC 14) to be released before it finishes compiling GCC 12. – Mark Jul 13 '22 at 00:09
@JoL it's possible but to this date there have been zero such incidents. – Artem S. Tashkinov Jul 13 '22 at 08:59
1

@JoL I don't understand your point. Why do you think that having a source package would prevent a bad actor from manipulating the package? If you claim that a bad actor can craft a malicious binary package the same actor can just inject malicious code in the sources. In order to check if this is the case you'd have to have a different trustworthy way of obtaining the sources and compare them before compiling... but then why do you need the source package? What prevents this trustworthy source to be a bad actor too? – GACy20 Jul 13 '22 at 12:55
1

@GACy20 in some distributions, maintainers can (or could) upload binary packages alongside the source code, which means that a bad actor could upload a binary containing malicious code without that code appearing in the source code. – Stephen Kitt Jul 13 '22 at 13:25
2

@StephenKitt That doesn't matter from a security perspective. Posting the source together with a binary for security purposes is just a security theater anyway. If you cannot trust the source it doesn't matter if it's just a binary, a binary+sources or just sources. They can all be manipulated. In order to verify such manipulation you need a second trustworthy source in all cases, and at that point just drop the repository and use the source you trust. – GACy20 Jul 13 '22 at 13:37
1

@GACy20 yes, they can all be manipulated, but the visibility is different. Anyway I’m just explaining one scenario where JoL’s comment is relevant. – Stephen Kitt Jul 13 '22 at 13:44
1

@StephenKitt I don't see how it is different. Do you expect an end user to download the linux kernel source and be able to find out that an extraneous driver was included? Madness. The only reasonable ways an end user can verify the sources are 1) compile them and check the produced binary with the downloaded one (which doesn't work in 99.9% of cases due to non-repeatable builds moreover a malicious actor modifying the binary can presumable simply insert the respective malware in the sourcecode) or 2) download a second set of sources and compare them and reject if they aren't identical – GACy20 Jul 13 '22 at 13:54
1

@GACy20 it is different: in such a distribution, I can upload a source package which passes your second test, and a malicious binary which will obviously fail the first but be ignored because of the repeatability problem (which is improving constantly, for example many packages in Debian can be reproduced). – Stephen Kitt Jul 13 '22 at 14:01
@StephenKitt the point is that nobody actually does either of those checks, so in the end, regardless the user doesn't know if there is malicious code/binary and trusts the distribution maintainers – Esther Jul 13 '22 at 20:58
@GACy20 Source can also be manipulated, but malware in source is very much a lot more obvious than malware in binaries. I think your point is that most users can't review the source, and ok, for them it doesn't really make a difference (note: even here, source is not less secure, it just doesn't make a difference). However, for those that can do at least a cursory review, it does very much make a difference. In any case, my point was to refute point 2. There's no way a binary is going to be more secure than the source, from the user's perspective. At most, they're equal. – JoL Jul 14 '22 at 00:19
@ArtemS.Tashkinov "zero such incidents". How confident are you that you'd have heard about it? There's a lot of code out there, and incidents don't necessarily need to reach mainstream news. [Here's an example where a build server was compromised and injected code on built software.](https://www.linuxfoundation.org/blog/preventing-supply-chain-attacks-like-solarwinds/) – JoL Jul 14 '22 at 00:36
@Esther I do those checks. Not for every software I install, but for those that I have a reason to distrust more than normal, like most packages in the AUR. – JoL Jul 14 '22 at 00:41
@GACy20 Also, it isn't always the case that source can be manipulated. I don't know how things are with other distros. Maybe they redistribute the source and so you have to trust both the original source and the distro's redistribution. However, at least in Archlinux, the source isn't redistributed. The build files link to the original sources, and gets them when building. So, if you trust the original source and you don't want to have to trust anyone else, like the distro maintainers, you can get the build files, review their 50-ish lines of building code, and build from the originals. – JoL Jul 14 '22 at 01:09

score 20 · Answer 3 · answered Jul 11 '22 at 20:57

But why even bother to "keep/offer" pre-compiled packages, when you could just compile them yourself? What are the intentions of keeping/offering them?

Simple economics. Compiling an entire distribution's worth of packages takes weeks, even on a large cluster, uses a lot of energy, and produces a lot of heat.

It simply makes sense to do this only once rather than do it over and over and over and over again for every single user.

It also massively increases the size and complexity (and thus the attack surface!!!) of a basic installation, since you have to include every single compiler for every single programming language that every single package uses as part of the base install.

Many, many years ago, I was really into Linux From Scratch and I wrote a script which automates the entire installation process of a base LFS system. It ran for about two days, and just remember that a base LFS system is a really basic system. It includes the kernel, the libc, the shell, the bootloader, and some basic tools … and that is pretty much it. No graphical environment, no web browser, no email program, no office suite, no multimedia player, no Java environment, no Python / Ruby / PHP / Node.js / whatever your preferred programming language is, no games, no photo editor, no scientific tools, nothing of all of the stuff that actually makes a computer "useful".

And some of these are really big and take a really long time to compile. A single package may easily take weeks to compile, depending on the computer you are running on. (Imagine e.g. your router or your smartwatch.)

On x86, a full archive rebuild of Debian can be done in a couple of days on a single (large enough) system. — Stephen Kitt, Jul 12 '22 at 09:17
@StephenKitt That's impressively fast. Out of curiosity, what is "large enough" in this context? A normal single socket server with 128GB or the likes? — Voo, Jul 12 '22 at 15:06
@Voo something like that, yes; I don’t have current figures handy, 128GiB might not be enough, but 256GiB should do it comfortably (with 40-60 cores). — Stephen Kitt, Jul 12 '22 at 15:31
Computers have gotten faster since you did your LFS install. I recently re-compiled my Gentoo system after a CPU upgrade, which took about 14 hours. — Mark, Jul 13 '22 at 00:12

MC68020 · Answer 4 · 2022-07-12T15:33:53.143

For being a gentoo user since I moved from FreeBSD Pfff! Was it ten years ago, I could not agree more with Stephen's and Artem's answers that I indeed upvote.

I have never been amongst those who believed that setting charming personal options to gcc (thinking of -funroll-loops -fomg-optimize for instance) would lead to significant performances gain despite it has been one of the major reason claimed by a vast majority of compiling addicts.

On my core II duo system… of course… compiling chromium will need more than a full day… : Simply… ridiculous ! Even more just stupid if you want to follow the pace of upstream's updates !

Ridiculous ? Well at the end of the day… not necessarily … depending of what you really care about since there are things you cannot just achieve with pre-built packages !

Avoiding pulseaudio with firefox for example.
Preferring to rely on your system shared libs rather than on local implementations (harfbuzz, icu, apng, libevent, not to speak of codecs (av1, vpx, webp… (not to speak of proprietary codecs…)))
And ultimately… wherever you have enough good reasons to diverge from upstream choices. (your harware becoming obsolete being just an example)

If you don't get any of those wills then… well… simply think that it is obviously not an eco-friendly way to maintain your system… a couple of gentoo lead devs have quit for that reason.

USE flags - the ability to tweak the compile settings of packages (not the compiler) - are indeed a very neat feature in Gentoo. — user253751, Jul 12 '22 at 09:58
It's been a while but I ran a production HPC cluster based on Gentoo. The major use cases were molecular modeling and sequence analysis. After building everything, including the kernel, with optimizations for the specific chipset in use, I saw about a 4% decrease in runtime. Maybe significant when a job runs for 4 weeks but otherwise not worth the effort. — doneal24, Jul 12 '22 at 17:04

score 2 · Answer 5 · answered Jul 12 '22 at 15:30

2

I work in a closed environment. Some of our systems are deemed critical and locked down. Compilers and debuggers are not allowed on the system.
The packages are are installed as-is from the vendor.
Compiling/Installing from source code would:

Allow someone to change the source code, and thereby the functionality of the package in a way that could be unknown to others.
Increase the time required to to build the baseline, promote and install it. In a RTOS environment, outages need to be minimized.
The packages are compiled in the same (standard) way all the time. We don't need to worry about variations in compilers, config files, etc.

In short, it guarantees an easy, low resource way to insure every computer has exactly the same package installed the way the vendor intends it to be installed.

answered Jul 12 '22 at 15:30

Scottie H

644
3
11

I don't see how this add up beyond what has already been stated. Point 1 doesn't look convincing, you could just don't give permissions (either verbally to subordinates or via the OS) for people to alter a given set of files. Point 2 is covered by Stephen, and 3 by Artem. – Quasímodo Jul 13 '22 at 21:49
3

@Quasímodo You missed the first sentence: "I work in a closed environment". Part of that phrase means: "Things are strictly controlled." Most people are un-aware of these environments. "So, why not compile anyway?" See the 3 points. BTW: Point #1 is MANDATED in the system design documentation. It is technology, policy and procedure that make this happen. – Scottie H Jul 13 '22 at 22:49

Why are there pre-compiled packages in repositories?

5 Answers5