Skip to content

Bootstrap Reform and aarch64-linux's GCC Upgrade #208412

@tpwrules

Description

@tpwrules

This is the research I have done trying to figure out how to best upgrade aarch64-linux from GCC 9. I've collected everything here to make the problems clear, provide context for those who can help, and help the community decide on the right path forward. Some sources are linked where appropriate, but I have lots more available on request.

The Problem

Upgrading aarch64-linux past GCC 9 breaks large numbers of packages. As a result, there is a specific clause in all-packages.nix which keeps aarch64-linux at GCC 9, while allowing every other platform to use 11 (and soon 12).

GCC 9 is pretty old at this point and important packages, like KDE/Plasma, are demanding later versions to use modern C++ language features. We cannot practicably ship NixOS 23.05 with GCC 9. It must be upgraded before then, with sufficient time to test and fix up packages.

The two main breakages observed with more recent compilers are linker errors (e.g. with pkgs.icu) and random aborts (e.g. pkgs.expect during pkgs.dejagnu's test phase)

The Reason

The nixpkgs bootstrap sequence, which builds the latest GCC and stdenv using prebuilt seed binaries, is a bit sleazy. It compiles glibc with the old GCC, then builds the latest GCC and other utilities using that glibc. This results in the stdenv not being completely compiled by the new GCC.

In addition, and more importantly, GCC's low-level runtime library, libgcc_s.so, ends up simply copied from the GCC used in the bootstrap (currently 9 for aarch64-linux) to the glibc used in the stdenv (which would ordinarily be using GCC 11). This causes programs built with the later version of GCC to use the library of an earlier version, instead of the library expected by that GCC.

The library is linked in automatically by GCC (and can be linked in manually using -lgcc_s) when it needs certain e.g. SIMD math routines or atomics. This going wrong (e.g. more recent GCCs having additional functions) results in linker errors and packages failing to build. It's also loaded in certain circumstances at runtime by glibc, and failure here (e.g. not being available in rpath) results in runtime aborts, possibly with messages like libgcc_s.so.1 must be installed for pthread_exit to work.

This deficiency in the bootstrap happens to cause problems in a visible way for aarch64-linux and GCC 9->11, but copying libgcc_s.so around is unsafe and wrong for all architectures and GCC versions and needs to be fixed. However, it turns out to be a happy accident that libgcc_s.so is always available at runtime for glibc to use, and this needs to be preserved somehow too.

Possible Solutions

1. Ignore reason, upgrade bootstrap

Pros: Possible right now, pretty certain to actually fix the problem

Cons: Commits us to upgrade the bootstrap every time libgcc_s.so breaks compatibility on any architecture, does not solve the underlying reason. We continue to hope that this will never cause a subtle issue and always break visibly for most packages.

2. Remove hack which copies around bootstrap libgcc_s.so, add -lgcc_s to wrapper

Pros: Tested and seems to work now, relatively certain to actually fix the problem

Cons: Could break in the future if e.g. dejagnu is needed in the bootstrap sequence again, adds 7.1 megabytes of GCC's library output to everyone's runtime closure (though this is already the case for C++ programs), doesn't actually improve bootstrap

It might be possible to patch GCC to detect when glibc could need libgcc_s.so too (i.e. if pthread support is enabled or exceptions are used?) and then include it only in that case, but that is kind of risky due to the failure mode. Maybe libgcc_s.so could be split into a separate output to avoid the size penalty.

3. Add extra bootstrap stages to glue together a glibc that has the latest GCC's libgcc_s.so and a GCC which uses them

Pros: Should not add much overhead to the bootstrap process

Cons: Would likely require a lot of patchelfing, doesn't actually improve bootstrap

4. Add extra bootstrap stages to recompile glibc with the latest GCC (and its libgcc_s.so), then possibly GCC with that glibc

Pros: Solves the issue properly, cleanest and most correct bootstrap approach

Cons: Would complicate the life of people who work on the stdenv as bootstrap would be slower, complex to implement

It might be possible to reduce the overhead of this last solution especially if we need to build another GCC, as GCC already builds itself several times. We might be able to build the first GCC just once and the second GCC fewer times to keep the total number of builds less than double. There is also rumored to be a combined mode that can build GCC and glibc together which might be faster and a shortcut for the first GCC. This must also be careful to preserve correct operation for cross-compilation.

The Path Forward

We have the first solution essentially ready right now so that NixOS 23.05 is not held up, but it's the worst. The last solution is the correct one and needs to be done at some point for the benefit of nixpkgs as a whole. But it's also the most work and might cause problems for contributors if not done carefully.

cc: @K900, @trofi

Metadata

Metadata

Assignees

No one assigned

    Labels

    0.kind: bugSomething is broken6.topic: bootstrapBootstrapping, avoiding pre-built binaries. Often overlaps with cross-compilation.
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions