Skip to content

draft: per-dso .so resolution cache on glibc#207893

Closed
pennae wants to merge 3 commits intoNixOS:stagingfrom
pennae:ldsocache
Closed

draft: per-dso .so resolution cache on glibc#207893
pennae wants to merge 3 commits intoNixOS:stagingfrom
pennae:ldsocache

Conversation

@pennae
Copy link
Copy Markdown
Contributor

@pennae pennae commented Dec 27, 2022

Description of changes

this is an alternative to #207061. to reduce stat storms and general slowness during application startup we save a resolution cache into each dso that maps DT_NEEDED entries to their corresponding path in the configured runpath. this should Just Work™.

we think this has many benefits over that, shrinkwrap, and other solutions:

  • unlike shrinkwrap, dependencies can still be replaced with LD_LIBRARY_PATH
  • unlike the guix approach it's in theory possible for a soname to resolve to two different paths in the same process (if that's possible at all in glibc, haven't actually checked. should be possible with multiple namespaces though)
  • no new files in outputs are needed that could conflict when merging
  • works when cross-compiling without qemu-user involvement

we've tested this on a cross-compiled armv7 system (in qemu-user, got bored of the board taking so long to load) running strace -cf /run/current-system/bin/switch-to-configuration test. unpatched nixos takes 44s to do this, with these patches that drops to 29s owing to almost 24000 failing openat and stat calls that are avoided by using the cache.

this is still very rough, full of debug code, tests are missing, and the patchelf code should be upstreamed (but is included here for ease of testing).

Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 23.05 Release Notes (or backporting 22.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
    • (Release notes changes) Ran nixos/doc/manual/md-to-db.sh to update generated release notes
  • Fits CONTRIBUTING.md.

@github-actions github-actions bot added the 6.topic: stdenv Standard environment label Dec 27, 2022
@pennae
Copy link
Copy Markdown
Contributor Author

pennae commented Dec 27, 2022

looks like the ofborg errors were due to patchelf relocating writable sections into non-writable segments. gonna take that down as another potential bug

@ofborg ofborg bot added the 10.rebuild-linux-stdenv This PR causes stdenv to rebuild on Linux and must target a staging branch. label Dec 27, 2022
@ofborg ofborg bot requested review from Ma27, alyssais and edolstra December 27, 2022 19:34
@ofborg ofborg bot added 10.rebuild-darwin: 101-500 This PR causes between 101 and 500 packages to rebuild on Darwin. 10.rebuild-linux: 501+ This PR causes many rebuilds on Linux and should normally target the staging branches. 10.rebuild-linux: 5001+ This PR causes many rebuilds on Linux and must target the staging branches. labels Dec 27, 2022
@nixos-discourse
Copy link
Copy Markdown

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/benchmarking-stdenv-mkderivation-vs-derivation-for-trivial-builds/24298/3

@pennae
Copy link
Copy Markdown
Contributor Author

pennae commented Dec 29, 2022

ran nixpkgs-review on the branch until nix got oomkilled. after 30k of 80k packages built and only 117 failures (most of which were dead links, broken FOD hashes, or requireFile instances) it looks like no errors are going to show up there.

@blaggacao
Copy link
Copy Markdown
Contributor

blaggacao commented Dec 29, 2022

First, let me say that this is all very interesting and I appreciate the things I'm learning here.

I've read through all the linked materials (incl. @fzakaria 's thoughts and prototypes) and wonder what's the fundamental critique of an approach that leverages nix' stable store locator properties to evict a system that originally was made to accommodate unstable locators with magic?

The above comparison to the other approaches doesn't seem to sufficiently convey that critique, in my view.

Especially, since these seem to approach the problem from first principles.

I'd like to better understand those considerations that lead to the critique and I'm unable to infer. Can you detail?

@pennae
Copy link
Copy Markdown
Contributor Author

pennae commented Dec 30, 2022

it's mostly about not behaving too differently from other systems that use glibc. approaches like shrinkwrap work perfectly fine (better than this, depending on your metric), but it doesn't allow replacing libraries using LD_LIBRARY_PATH for testing purposes or other reasons. that's probably a rather rare thing to do, but if we can avoid baking such a difference in behavior into the core of nixos it's probably worth doing.

the guix approach probably has namespacing issues. if you use dlmopen to create different namespaces those will all use the same ld.so.cache the initial executable used, which can conflict with what the runpaths of the libraries being loaded into the namespace link to (thus reintroducing unstable locators!). it also requires a file in the filesystem, which comes with location issues (as discussed there), which comes with possible path traversal attacks and difficulties for downstream derivations that use eg buildEnv or other merging systems. the merging problems could all be fixed, but avoiding the altogether would be quite nice.

@fzakaria
Copy link
Copy Markdown
Contributor

fzakaria commented Dec 30, 2022

Interesting approach - might be worth sharing the ideas with the GNU libc or musl mailing list.
Maybe there's an idea here worth expanding to the linking specification.

Can you show using readelf -x .secion-name to demonstrate the output.

here's a real question:
If we are willing to patch the dynamic loader (ldso) -- why even put all this information into the single executable and have to muck with patchelf?

This is the question I keep asking. We need to challenge the notion that we must even have a single binary.

Very cool PR -- been enjoying following along your debugging adventures :)

@pennae
Copy link
Copy Markdown
Contributor Author

pennae commented Dec 30, 2022

Can you show using readelf -x .secion-name to demonstrate the output.

readelf says:

Hex dump of section '.note.nixos.ldcache':
  0x0094a000 06000000 bb010000 b66ca863 4e69784f .........l.cNixO
  0x0094a010 53000000 6c696261 636c2e73 6f2e3100 S...libacl.so.1.
  0x0094a020 2f6e6978 2f73746f 72652f36 62383832 /nix/store/6b882
  0x0094a030 6a303163 6e327339 786a6673 78763434 j01cn2s9xjfsxv44
  0x0094a040 696d3470 6d346233 6a73722d 61636c2d im4pm4b3jsr-acl-
  0x0094a050 322e332e 312f6c69 622f6c69 6261636c 2.3.1/lib/libacl
  0x0094a060 2e736f2e 31006c69 62617474 722e736f .so.1.libattr.so
  0x0094a070 2e31002f 6e69782f 73746f72 652f7676 .1./nix/store/vv
  0x0094a080 30786e64 63306970 38336637 326e3068 0xndc0ip83f72n0h
  0x0094a090 7a30776c 63663367 386a6873 6a642d61 z0wlcf3g8jhsjd-a
  0x0094a0a0 7474722d 322e352e 312f6c69 622f6c69 ttr-2.5.1/lib/li
  0x0094a0b0 62617474 722e736f 2e31006c 6962632e battr.so.1.libc.
  0x0094a0c0 736f2e36 002f6e69 782f7374 6f72652f so.6./nix/store/
  0x0094a0d0 68736b37 317a3861 646d7667 796b6e37 hsk71z8admvgykn7
  0x0094a0e0 767a6a79 31316466 6e617239 66347231 vzjy11dfnar9f4r1
  0x0094a0f0 2d676c69 62632d32 2e33352d 3136332f -glibc-2.35-163/
  0x0094a100 6c69622f 6c696263 2e736f2e 36006c69 lib/libc.so.6.li
  0x0094a110 62637279 70746f2e 736f2e33 002f6e69 bcrypto.so.3./ni
  0x0094a120 782f7374 6f72652f 77736a70 3361326a x/store/wsjp3a2j
  0x0094a130 67623435 63703972 696e7031 7a387a6b gb45cp9rinp1z8zk
  0x0094a140 71776762 66666161 2d6f7065 6e73736c qwgbffaa-openssl
  0x0094a150 2d332e30 2e372f6c 69622f6c 69626372 -3.0.7/lib/libcr
  0x0094a160 7970746f 2e736f2e 33006c69 62676d70 ypto.so.3.libgmp
  0x0094a170 2e736f2e 3130002f 6e69782f 73746f72 .so.10./nix/stor
  0x0094a180 652f6e77 6c37707a 61666164 76616761 e/nwl7pzafadvaga
  0x0094a190 626b737a 36317267 33623363 7335386e bksz61rg3b3cs58n
  0x0094a1a0 39692d67 6d702d77 6974682d 6378782d 9i-gmp-with-cxx-
  0x0094a1b0 73746167 65342d36 2e322e31 2f6c6962 stage4-6.2.1/lib
  0x0094a1c0 2f6c6962 676d702e 736f2e31 30000000 /libgmp.so.10...

if this goes anywhere patchelf will get a nice formatter to read these.

If we are willing to patch the dynamic loader (ldso) -- why even put all this information into the single executable and have to muck with patchelf?

(it's not just in the executable, it's in every DSO, but more importantly) because nixos relocates and relinks executables sometimes, eg to build stage1. adding info like this outside of patchelf which runs all the time anyway would duplicate a lot of effort into ld.bfd, ld.gold, lld, mold, ... while patchelf is not a great place to have this, it looks like it's the least bad place.

@blaggacao
Copy link
Copy Markdown
Contributor

Ofc, Farid just wanted to avoid another shameless plug, so here it is for what he means: #207061 (comment)

😁

@pennae
Copy link
Copy Markdown
Contributor Author

pennae commented Dec 30, 2022

oh, one more thing. just to be clear about our intentions here.

this thing will remain a draft until there's an agreement that it's either a good idea worth pursuing further or it's killed off by someone. we don't have the energy to do all the touch-ups alluded to in all the todos, add tests etc etc and then throw it all away because no agreement whether it's wanted at all is forthcoming. would be nice to have this merged. great even, given the performance improvements it can bring. but we really can't put in all that work and then have it be lost to time

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should comment why this is required

Copy link
Copy Markdown
Contributor Author

@pennae pennae Dec 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not, that's just a debugging aid (like everything else except the two patches and hooking them into their respective derivations) (and the stdenv hook)

Comment on lines 9 to 23
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least a few comments would not hurt

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will try to add exhaustive documentation for the final patches. not going to do that just yet though, while it's still unclear whether there's any chance for this approach at all.

the basic idea is simple though: add an elf note to each DSO that contains a zero-terminated array of zero-terminated string pairs (soname, path), then check that note before doing filesystem lookups. most of the complexity is (of course) in patchelf, and in working around qemu bugs 😕

@nixos-discourse
Copy link
Copy Markdown

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/reducing-stat-calls-for-library-loading-during-application-startup/24358/1

@nixos-discourse
Copy link
Copy Markdown

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2022-01-03-nixpkgs-architecture-team-meeting-23/24404/1

@edolstra
Copy link
Copy Markdown
Member

Can you describe in a bit more detail what the resolution process at runtime looks like? In particular, what happens when LD_LIBRARY_PATH is set.

In principle this approach seems good to me, though I'm slightly worried about maintaining a non-trivial patch to glibc. I wonder if upstream glibc might be interested in this...

@pennae
Copy link
Copy Markdown
Contributor Author

pennae commented Jan 13, 2023

Can you describe in a bit more detail what the resolution process at runtime looks like? In particular, what happens when LD_LIBRARY_PATH is set.

the resolution cache added here is an additional source of soname -> path mappings that is visited between LD_LIBRARY_PATH and DT_RUNPATH. the patchelf changes look at DT_NEEDED and DT_RUNPATH to generate lookup results that are known to be positive and fixed (ie, not using glibc hwdeps) and bakes those results into an elf note. if a library were replaced through LD_LIBRARY_PATH then that replacement would be used, but changes to DT_RUNPATH would have to either rebuild or invalidate the resolution cache to remain valid. (the patchelf changes don't do this yet because it wasn't needed for testing)

edit the glibc default is to look first at LD_LIBRARY_PATH with its normal search algorithm (colon-separated list of dirs, hwcaps), then repeat the same on DT_RUNPATH if nothing was found. we just cache the results of the DT_RUNPATH lookup where we know that a specific path would be selected for a given needed soname (because hwcaps does not figure in the lookup but it is contained an runpath and thus would be found during the runpath lookup trial). this cache remains valid even when directories are appended to runpath or directories that don't contain any needed sonames are inserted anywhere, but it's better to just rebuild/nuke in all cases for consistency. stage1 currently collects libraries into an extra-utils drv without nuking the cache, leaving a few dead references around--but those don't hurt, since the failing lookups through the resolution cache aren't fatal. lookup proceeds with runpath and finds the moved libs just fine in that case.

though I'm slightly worried about maintaining a non-trivial patch to glibc.

yeah, that's always a concern. luckily this one can just be dropped without losing functionality, just some performance. and the lookups themselves are pretty easy, in a part of the algorithm that's not all that likely to change very much. (although hwcaps did happen. hopefully that's the last one.)

I wonder if upstream glibc might be interested in this...

that would be the ideal case! other systems (we know of guix or spack) could benefit from this too. even FHS distros could benefit, though not nearly as much (since the resolution cache only optimises lookups that are served from RUNPATH).

Copy link
Copy Markdown
Contributor Author

@pennae pennae Feb 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably bail here and not add a cache if the path contains $VARs, eg $ORIGIN. we could also resolve them, but that seems messy and error-prone, if it's not very common bailing out seems better. we could also add a "consult this rpath entry" instead, giving partial caching with no effect on search order. same if hwcaps directories are present.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not just ignore the variables and continue processing the other colon delimited paths?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, that would change the resolution order. imagine a runpath $ORIGIN:/some/dep, where both contain a libfoo.so and the program also needs a libfoo.so. we have to make a note of some kind that $ORIGIN must be visited, otherwise we'd not use the one from $ORIGIN, counter to the normal resolution order

Copy link
Copy Markdown

@hinshun hinshun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're been experimenting with this patch internally with positive results. Compared to the other solutions, this one has a few advantages:

  • Shrinkwrapping or setting absolute sonames causes issues in a local development workflow (outside of nix) where the same shared object from two different nix closures will both be loaded
  • LD_LIBRARY_PATH and LD_PRELOAD continue to function, this is very important for us in existing local / debugging workflows

Without the suggestions in this review, I'm seeing page/section alignment errors from ld.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
+ const auto nameSize = roundUp(strlen(ELF_NOTE_NIXOS) + 1, 4);
+ const auto nameSize = roundUp(strlen(ELF_NOTE_NIXOS) + 1, getPageSize());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
+ const auto descSize = roundUp(noteSize, 4);
+ const auto descSize = roundUp(noteSize, getPageSize());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
+ wri(notePhdr.p_align, 4);
+ wri(notePhdr.p_align, getPageSize());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
+ wri(noteShdr.sh_addralign, 4);
+ wri(noteShdr.sh_addralign, sectionAlignment);

pennae added 3 commits March 5, 2023 05:35
$ORIGIN needs handling. we'll just bail on anything with a $ in it
@pennae
Copy link
Copy Markdown
Contributor Author

pennae commented Mar 5, 2023

rebased to current master, updated to not cache just a path for every needed soname but a search path (so we can properly handle $ORIGIN and other linker variables), tested with qtpass to make sure all of this works. it seems to. haven't seen any alignment errors from the linker, @hinshun how did you get these?

@infinisil infinisil added the 1.severity: significant Novel ideas, large API changes, notable refactorings, issues with RFC potential, etc. label Apr 19, 2023
@nixos-discourse
Copy link
Copy Markdown

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/on-nixpkgs-and-the-ai-follow-up-to-2023-nix-developer-dialogues/37087/2

@wegank wegank added the 2.status: merge conflict This PR has merge conflicts with the target branch label Mar 20, 2024
@pennae pennae closed this by deleting the head repository Jul 1, 2024
@DavHau
Copy link
Copy Markdown
Member

DavHau commented May 3, 2025

In case someone wants to try this out, simply cherry-pick the latest commit from https://github.com/DavHau/nixpkgs/commits/stat-storm

I cleaned it up and applied the latest review suggestions.

@mweinelt
Copy link
Copy Markdown
Member

mweinelt commented May 3, 2025

@DavHau Could you make sure to properly credit @pennae as required by the MIT license?

@pennae
Copy link
Copy Markdown
Contributor Author

pennae commented May 3, 2025

In case someone wants to try this out, simply cherry-pick the latest commit from https://github.com/DavHau/nixpkgs/commits/stat-storm

I cleaned it up and applied the latest review suggestions.

thank you for not crediting anyone in your branch. it really makes us feel appreciated.

DavHau added a commit to DavHau/nixpkgs that referenced this pull request May 3, 2025
originally authored by @pennae via NixOS#207893

See original PR for more information
@DavHau
Copy link
Copy Markdown
Member

DavHau commented May 3, 2025

Please forgive me. Credited now via commit message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

1.severity: significant Novel ideas, large API changes, notable refactorings, issues with RFC potential, etc. 2.status: merge conflict This PR has merge conflicts with the target branch 6.topic: stdenv Standard environment 10.rebuild-darwin: 101-500 This PR causes between 101 and 500 packages to rebuild on Darwin. 10.rebuild-linux: 501+ This PR causes many rebuilds on Linux and should normally target the staging branches. 10.rebuild-linux: 5001+ This PR causes many rebuilds on Linux and must target the staging branches. 10.rebuild-linux-stdenv This PR causes stdenv to rebuild on Linux and must target a staging branch.

Projects

None yet

Development

Successfully merging this pull request may close these issues.