Use a minimal initrd to switch to the full initrd stored in /usr#3241
Use a minimal initrd to switch to the full initrd stored in /usr#3241
Conversation
|
@chewi My idea is to load the "normal" initrd as loopback mount from |
|
Build action triggered: https://github.com/flatcar/scripts/actions/runs/18366905408 |
chewi
left a comment
There was a problem hiding this comment.
Although I shared the same concern about losing functionality that we would have to reimplement, I hadn't yet identified any such functionality, so I'm not quite ready to throw out my proposal to go straight from tiny initrd to real /usr. I'd really like to know what your specific concerns are.
This is an interesting approach in any case. My own alternative would have been to mount /usr as an overlay with the initrd, deleting all the duplicate files from the initrd, but I hadn't fully thought it through.
Regarding verity, I think it only needs to be set up once. I didn't enable verity in my own experiment, but /sysroot/usr was simply a bind mount of /usr. I think that would still work with verity applied.
...ntainer/src/third_party/coreos-overlay/sys-kernel/coreos-kernel/coreos-kernel-6.12.44.ebuild
Outdated
Show resolved
Hide resolved
My intention was to keep most things untouched so that we can focus on the bare task of jumping into the regular initrd and avoid any risk of reimplementing all needed initrd logic. Things that should run from the initrd are: Ignition stages, hostname setup with afterburn (and basic network setup for them while they prepare the final network setup for the real system), setup of the |
|
Okay, but I wasn't proposing rewriting all that. Dracut puts those scripts into an initrd. I was just going to put them in /usr instead. It's more or less the same thing. It's the scripts that Dracut itself provides through its own modules that I was concerned about. |
|
The question is on how these things are started because they run in a context with dependencies. Having only one set of systemd units for both the initrd and the final system doesn't work if we want to make use of systemd in the initrd - it would run all enabled units under |
4306d75 to
0bfc20a
Compare
b250dfa to
647190c
Compare
647190c to
3561af4
Compare
fa3e742 to
5752055
Compare
...ntainer/src/third_party/coreos-overlay/sys-kernel/coreos-kernel/coreos-kernel-6.12.48.ebuild
Outdated
Show resolved
Hide resolved
5752055 to
c9e86c1
Compare
c9e86c1 to
37b7d17
Compare
chewi
left a comment
There was a problem hiding this comment.
Many lines are missing || die. More death threats please! 😁
...ntainer/src/third_party/coreos-overlay/sys-kernel/coreos-kernel/coreos-kernel-6.12.49.ebuild
Outdated
Show resolved
Hide resolved
...ntainer/src/third_party/coreos-overlay/sys-kernel/coreos-kernel/coreos-kernel-6.12.49.ebuild
Outdated
Show resolved
Hide resolved
It's from Gentoo commit 573964683c6f490e3a1ff040ec21c9d3b8d8b154. Signed-off-by: Kai Lueke <kailuke@microsoft.com>
chewi
left a comment
There was a problem hiding this comment.
Looking good! I'm tentatively approving this, just a couple of things to consider.
You can drop the sudo calls. RESTRICT="userpriv" means we're already running as root because Dracut needs it. If we didn't have that, sudo wouldn't work anyway.
I'm now somewhat confused about the compression. The kernel documentation says that you're only supposed to pass a single cpio to CONFIG_INITRAMFS_SOURCE. It also says the early cpio must not be compressed. We have been telling Dracut not to compress, so what we've been providing has been totally uncompressed. Copilot says that the kernel build isn't smart enough to only compress the main part, but it also says that the uncompressed early cpio rule only applies when you're passing the initramfs separately at boot time, not when it's built in. I suppose that must be true, since it appears that we've been compressing the whole thing via the kernel build. What you've proposed doesn't change that, but I thought it would be a good opportunity to write this down and check that we're all on the same page.
|
Ok, dropped the sudo calls. Yes, good question - I assume that the kernel build system knows whether the first cpio can be compressed or not. We could check with real hardware if we get the microcode update applied or not (with any Flatcar release as we didn't change this). |
|
Yes, probably best to check that the microcode actually works, not merely whether we've changed anything. The microcode was actually missing entirely until I fixed that a few months back! See #2837. |
|
Yes, I think I tested it but that was with the truncation - I don't remember the details (Edit: Tested now and it doesn't seem to work either). After the changes with the new lsinitrd extraction I didn't test it again and just see now that it doesn't seem to work. |
|
Confusing, it doesn't work on Alpha either but on Stable I've seen it applied. |
|
Microcode updating is also not working in Beta. Sounds like the behavior changed in the PR you linked. But Beta has your changes and it prints: |
|
So I guess the current way of passing it in does indeed not work and we need to change this. But not in this PR. |
|
I created a bugreport for it: flatcar/Flatcar#1909 |
The growth of binaries over time and the inclusion of new features filled the available boot partition space, so that the kernel+initrd almost couldn't fit twice anymore as required for updates. We employed workarounds such as wrapper scripts for ignition, afterburn and other binaries so that they are loaded from /usr. However, this was still not enough and we would have to do the same for (network) kernel modules and firmware. To avoid making this ever more complex we can use a dedicated initrd focused on loading the full initrd from /usr and then this full initrd can use dracut as before and even drop all the workarounds we accumulated. Generate a minimal initrd to use instead of the full bootengine initrd. The bootengine initrd gets stored as squashfs on /usr. The minimal initrd still includes the early_cpio for amd64 microcode updates. We have a fixed list of modules or module directories to include, only focused on loading /usr and any emergency console interaction. This requires also checking for module dependencies to copy over. The busybox, veritysetup, and kmod binaries are needed and get their required libraries resolved and copied over. They are not static and use shared libraries which should be ok for now. The resulting vmlinuz file is 27 MB for amd64, down from ~60 MB, so we have enough room to include more kernel modules and so on for the next years while we also grow the boot partition and wait for users to redeploy until we can rely on a larger boot partition and eventually drop the minimal initrd again. Pulls in flatcar/bootengine#110 for the minimal initrd script and flatcar/seismograph#12 for making the device mapper discovery for the "rootdev" command more reliable. This also requied a backport of a kernel patch from 2017 that exposes the PARTUUID in the /sys uevent file. Co-authored-by: James Le Cuirot <jlecuirot@microsoft.com> Signed-off-by: Kai Lueke <kailuke@microsoft.com>
The growth of binaries over time and the inclusion of new features
filled the available boot partition space, so that the kernel+initrd
almost couldn't fit twice anymore as required for updates. We employed
workarounds such as wrapper scripts for ignition, afterburn and other
binaries so that they are loaded from /usr. However, this was still not
enough and we would have to do the same for (network) kernel modules and
firmware. To avoid making this ever more complex we can use a dedicated
initrd focused on loading the full initrd from /usr and then this full
initrd can use dracut as before and even drop all the workarounds we
accumulated.
Generate a minimal initrd to use instead of the full bootengine initrd.
The bootengine initrd gets stored as squashfs on /usr. The minimal
initrd still includes the early_cpio for amd64 microcode updates.
We have a fixed list of modules or module directories to include, only
focused on loading /usr and any emergency console interaction. This
requires also checking for module dependencies to copy over.
The busybox, veritysetup, and kmod binaries are needed and get their
required libraries resolved and copied over. They are not static and
use shared libraries which should be ok for now. The resulting vmlinuz
file is 27 MB for amd64, down from ~60 MB, so we have enough room to
include more kernel modules and so on for the next years while we also
grow the boot partition and wait for users to redeploy until we can rely
on a larger boot partition and eventually drop the minimal initrd again.
Pulls in flatcar/bootengine#110 for the
minimal initrd script and flatcar/seismograph#12
for making the device mapper discovery for the "rootdev" command more
reliable.
This also requied a backport of a kernel patch from 2017 that exposes
the PARTUUID in the /sys uevent file.
How to use
Depends on flatcar/bootengine#110 and flatcar/seismograph#12
And flatcar/flatcar-build-scripts#174 for the image size report (but that only works when this is included in the first nightly)
Testing done
On all clouds (Equinix Metal arm64 was manually tested) - The build got gc'ed, a more limited new run is here
The bootengine.img initrd size/content reporting only works after the first nightly is built.
changelog/directory (user-facing change, bug fix, security fix, update)/bootand/usrsize, packages, list files for any missing binaries, kernel modules, config files, kernel modules, etc.