-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
dlopen: add some ELF section magic for declaring dlopen() deps #17416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Here's a small experiment: let's embed dlopen() dependency data in ELF
binary sections. This is useful for packaging tools such as RPM/DEB to
synthesize "recommends" style dependencies from this information.
Example use:
objcopy libsystemd-shared-246.so /dev/null --dump-section=SYSTEMD_DLOPEN_DEP=/dev/stdout | \
tr -s '\000' | \
xargs -0 -r -n 1 echo "Recommends:"
(this generates output in RPM spec file style)
|
This was prompted by the discussion in #17297 |
|
Hm, this gives me the SONAMES of the libraries, but not necessarily the library package name. So I'd need to do a lookup and matching myself, which might be a bit tricky. Afaics I'd basically have to reimplement what dpkg-shlibdeps is doing (at least to some extent). https://www.man7.org/linux/man-pages/man1/dpkg-shlibdeps.1.html What would be trivial to implement on the packaging side is, if we had a dummy executable which is not going to be installed and links against those dlopened libraries. Then something like this should work: A middle way could be, to maintain the dlopen dependencies manually, but write an autopkgtest for every one of those dependencies. Then the CI could automatically catch, if the functionality by the dlopened library is broken. |
|
Ah, rpm is nicer there, the so names are mentioned literally in the package deps, which is pretty cool. |
yeah pretty neat, we don't have such a mechanism in deb land unfortunately (at least not that I know of) |
|
Can't you do: |
Can we teach dpkg-shlibdeps to look for this elf section? If we dropped the SYSTEMD_ prefix, it could become a generalized way to express dlopen recommends. |
|
i really don't care how the section is named. happy to generalize it if that helps. (but knowing debian's development speed i have my doubts if we'd do it like this this would be something that actually lands in debian before the end of the decade ;-)) |
|
hmm @pmatilai, would RPM be interested in adding native support for such an ELF section to synthesize "Recommends:" style deps from, btw? (background: to minimize our dependency footprint in systemd in a container world, we started to load some "leaf" libraries we use with dlopen() instead of regular ELF shared library linking, so that we can gracefully degrade and handle their absence. We are now looking for a way how RPMs/DEBs can still carry dep info on those "weak" library deps though, as "Recommends:" deps. Ideally we want the dep info to be denoted in the C sources directly, so that they are unlikely to get out of date. This PR implements that, under the assumption packagers would create a script that extracts this info from the ELF binaries and turns it into RPM spec file deps. Question would be whether RPM maybe should just do that on its own, generically for all binaries? Would that be relevant to RPM?) |
|
Interesting point. I have no idea if dlopen is used widely enough though that our dpkg maintainer would consider something like this or if he sees that as too much of a niche use case. Not sure if he actively follows github, but let's try: @guillemj ^ |
But you could run it on your local build machine to process the ELF data, and then generate some snippets to incude in the debian packaging metadata you then commit to your git repo and that is used by the build system pre-generated. i.e. I think it should be fine if you convert the ELF data to Debian packaging data locally on your private dev machine every now and then as long as it is automatic, and added to the packaging erpo. There's no need to rerun this on every build I think. After all this is not going to change with every patch, but at most at every release. |
|
Hey,
Not only that, nothing guarantees that there's a single filename matching the SONAME. This is one of the reasons in Debian we do not provide SONAMEs as part of the dependency information.
While I do appreciate the effort to reduce dependency weight, I don't think the path chosen here is the way forward TBH. This has come up several times in the Debian context, see https://lists.debian.org/debian-mentors/2017/11/msg00196.html for a summary and a collection of links to some of those discussions. |
That's a pretty simplistic selection of viewpoints. There are good reasons why people use dlopen() for things like this, and I think we have good reasons too, i.e. we want to minimize footprint of minimal images, and handle missing deps gracefully. Some of the points those mails raise are that this would mean missing deps would not be handled gracefully, because done at runtime. But that is precisely the reason why people want this: to make leaf dependencies not heavier than necessary, and have graceful fallbacks in place. From those linked mails I see the following points made:
So I am deeply unimpressed I must say... |
|
I mean, dlopen() style deps are a fact of life, this is nothing Debian is going to change and patch out everywhere. That's just unrealistic, you don't have the man power for that, to maintain such patches for the whole distro. I think it would be wise to make sure they can be handled gracefully, instead of refusing to entertain the idea outright. |
|
(fun thing: the dlopen() deps we have that use symbol version only ever had a single version exposed. The people who are smart enough to use symbol versioning are apparently also the ones that are careful enough and manage to never break compat in the first place ;-)) |
|
@poettering - the dlopen() case is something I've occasionally pondered too, and this seems like a nice simple way to do something about it. If generalized, yes I think it would be relevant for rpm and should be easy to have the sections automatically processed. |
I think that the fact that debian uses package names is irrelevant. On the executable level, with normal linking, binaries list dependency on libraries by giving the library name, and the linker uses that to find the appropriate file in the file system. All this PR does is that it exposes the same information for libraries which are opened using dlopen. In both cases the information about the library name must be converted to some form that is appropriate for the given package manager. rpm mostly uses the library name formatted in a special way. deb wants to figure out the package name and use that instead. But it must do that also for the information provided by ldd too. I see a slightly different shortcoming in this PR: it exposes information about the library name. This allows us to replicate the old-style unversioned dependencies on library name only. But we really want to express dependencies on the specific versioned symbols that are used. I.e. we want to go from this: to The generation goes like this: we use idn2_lookup_u8, idn2_strerror, idn2_to_unicode_8z8z from libidn2.so.0: $ objdump -T /usr/lib64/libidn2.so.0|rg '\b(idn2_lookup_u8|idn2_strerror|idn2_to_unicode_8z8z)\b'|awk '{print $6}'|sort -u
IDN2_0.0.0and massage that result into To make this possible, in addition to the library name we need to expose the specific symbols we'll access. This should be enough information to postprocess it into proper Requires deps in modern rpm style. (Enough information is already exposes for the purposes of deb deps.) C.f.: https://github.com/rpm-software-management/rpm/blob/master/tools/elfdeps.c |
Yeah. That is ironic. But I think we need to implement support for symbol versions anyway. It is just too pretty not to do it. |
|
btw, the data the current patch exports is actually not a simple list of strings. for the libidn case we actually support two different so versions in parallel, since the apis we care about are unaffected by the compat break. I currently expose this as a string Either way, if symbol versions matter I think we can totally start using dlvsym() and then export more complex strings, that include the version symbol. maybe: (Hmm, actually, it looks to me as if libidn fucked up their symbol versioning. And maybe we are actually better off not relying on it for that lib. i.e. they broke compat by adding a field to an open coded struct, but didn#t bump the symbol version for the functions that use that struct. In fact they only ever used the very same symbol version of LIBIDN_1.0 for everything they ever exported I think) |
This is a bit hackish but gives us a list of package names, which could be fed to |
|
would @smcv 's suggestion (at https://lists.debian.org/debian-devel/2017/03/msg00164.html) be an option: Instead of dlopening the libraries, we have plugins, which link against those libraries (and get proper versioned dependencies this way) and libsystemd-shared would then dlopen those plugins? |
|
In-process plugins suck. We generally don't do that. We have generators, drop-in dirs, services, that's how we do extension in systemd: out-of-process. Also, I don't get it. Plugins in C are implemented via dlopen(), so what's the point? because you are afraid of dlopen() you then dlopen() a module that uses regular ELF so deps for you? that's a pointless game, no? after all you still have dlopen(), but now one extra indirection. that certainly complicates things. |
I wouldn't say it's pointless. On the contrary. This ensures that you get proper shlibs dependencies (with proper version information and everything) and existing tooling knows how to handle that. |
But that's a workaround around what this PR tries to fix in a more explicit, simpler way. The workaround you proposes comes at the price of more complex code, more work, hidden deps. Generally, if you only want to support one single plug-in then don't do a plug-in interface. Plug-ins are supposed to provide a certain level of abstraction, but if you have nothing to abstract then don't do plugins, it's the wrong technology. And here we don't. We just want to use some dep if it's there and not if not. That's all. |
well, if that works, then you could generate a throw-away .so if compat with dh-shlibs is essential and toolchain maintainers are not sympathetic to the problem. i.e. a so that links to the relevant symbols, is built, then dh-shlibs is called on that, and then the .so is dumped and doesn't show up in any package. But yikes, the extra hoops to jump through... I am not sure why we should always wrap our calls on all systems through two levels of shared library linking if one level generally suffices, just for dh-shlibs compat. |
I'd probably rework this then so that we'd actually define the symbols to map and the so name in some static const struct array or so, that you just pass to dlsym_many_and_warn(), so that all duplication goes away because the exported data and the data we pass to dlsym_many_and_warn() is really the same |
|
I think sonames ( |
|
@pmatilai Probably it is possible to feed dependency generator with non-stripped binaries and he can find |
|
@poettering In case of RPM it is not trivial or even possible to both call a custom dependency generation script and all strandard scripts, as far as I understand, because |
|
@mikhailnov - wrapping the dependency generators was always a nasty hack, only for a long time it was the only way to affect the dependency generation in any way. You can still wrap individual generators, but you shouldn't except as a last-gasp workaround for bugs, really. Anyway, AIUI what's being discussed (generic markup for dlopen() dependencies) should be handled by upstream elfdeps generator. |
|
30.11.2020 13:38, Panu Matilainen пишет:
@mikhailnov <https://github.com/mikhailnov> - wrapping the dependency generators was always a nasty hack, only for a long time it was the only way to affect the dependency generation in any way. You can still wrap individual generators, but you shouldn't except as a last-gasp workaround for bugs, really.
Anyway, AIUI what's being discussed (generic markup for dlopen() dependencies) should be handled by upstream elfdeps generator.
Why is this markup _generic_? It is systemd-specific... Why should package-specific hacks be supported in the upstream generator?
|
dlopen() is not an exclusive of systemd though, the problem of tracking optional/runtime dependencies automatically is a generic one. |
This started off as a systemd-specific thing, but moved beyond that a long time ago (see commends around Oct 22) |
The name of the section inside ELF is not standartized. Maybe it is possible to feed an unstripped binary to generator and make it find calls of dlopen. |
To me, that's part of what this is all about, standardizing it.
Even if it were possible, you don't want automatic dependencies for every dlopen() because we can't possibly know the semantics behind them. In some cases dlopen()'s are used to implement alternative backends, in other cases (like here) it's for additional features. For alternatives you don't want to end up recommending all of them. So it needs to be manually tagged one way or the other. |
|
The name of the section inside ELF is not standartized.
To me, that's part of what this is all about, standardizing it.
Hm, what do you understand under standartizing? How do you see this process?
Maybe it is possible to feed an unstripped binary to generator and make it find calls of dlopen.
Even if it were possible, you don't want automatic dependencies for every dlopen() because we can't possibly know the semantics behind them. In some cases dlopen()'s are used to implement alternative backends, in other cases (like here) it's for additional features. For alternatives you don't want to end up recommending all of them. So it needs to be manually tagged one way or the other.
Agree. In systemd package something like `%__DLOPENGENERATOR_recommends*` macros could be used.
Actually it would be nice to teach RPM to parse tag name from generators's stdout, for example, let ONE generator print to stdout:
Requires: libfoo.so.2()(64bit)
Recommends: libcrypto.so.12()(64bit)
and then write in systemd.spec: `%__elfdeps_generator_opts --dlopen=recommends`
P.S. I still do not like the whole idea with dlopen in systemd. In another place it would be OK.
|
|
@mikhailnov the current patch is an idea, a draft only. If there's interest in making this stuff something generic we'd of course name the elf section generically, and drop the systemd specific naming. I'd very much welcome that. Thing is simply that dlopen()-style library deps are not going to go away, people use them for good reasons. You can of course stick your head in the sand and ignore the reasons and always link explicitly, but of course, that comes at a price: you will have a needlessly large set of deps for your minimal OS image. I think in an ideal world we'd have a native concept of "weak shared library deps" in ELF, i.e. where we can list .so objects that are loaded if they exist and not loaded if they don't. the symbols they provide would become weak for the consumer too, so that they resolve to NULL if the library is not installed. I don't see anyone working in that direction in ELF though. I'd absolute prefer this outcome over everything else though, but in a real-world timeframe this appears like nothing I want to wait for. As I understand this type of shared library dependency is prettty common on MacOS, so there would even be prior art for something like this. Until then, I think it would make sense to:
One thing I am very much against though is adding additional "plug-in" interfaces, indirection glue .so or so, whose only purpose is to work around limited packaging dep generators, at the price of slower runtime behaviour and more files to install. |
|
If you're about to re-invent .dynsym, I suggest consulting with people who develop dynamic linkers (e.g. glibc ld.so), otherwise you're likely to end up with something suboptimal. |
|
@ldv-alt "reinvent"? afaik .synsym isn't capable of declaring "weak library dependencies", no? |
|
@mbiebl’s solution seems to be the best one so far. The dummy executable will never actually be installed, but it keeps the packaging tooling happy. |
|
@poettering note that native support in dpkg-gensym is not strictly required. It would be very welcome and desirable of course so that it "just works" out of the box, but the Debian packaging machinery can be extended with discrete debhelper tools. There's a number of those available already, and creating one that looks at the new elf section in all binaries and adds Recommends sounds very easy. It would not work automagically out of the box, but enabling it is as simple as adding a dependency and a If the proposal is adopted by the RPM world but not by dpkg, then I'm willing to volunteer to build, upload and maintain such tool in Debian. |
Or did you ask about "weak DT_NEEDED"? |
|
@ldv-alt the goal here is to have an ELF binary "foo" that has weak dependency on some library "libbar.so", so that "foo" can be run with and without "libbar.so" being installed, and it would handle the absence gracefully. It's not sufficient to have weak symbols — I know we have those — but we want weak library deps. Other archs have them supposedly, in particular MacOS. On Linux we only have dlopen(), but that means the deps are not discoverable anymore for packaging tools, they are hidden in code. |
rpm dependency generator works with file list from package spec. If file is not installed it is also not listed in package spec and so won't be processed by dependency generator. |
|
In the case of NixOS, we usually rely on hardcoded paths to various (hashed) locations in the nix store, and use a slightly longer While this provided an opportunity for us to test whether the "optional" codepaths work (see #18078), it's not ideal. Rebuilds of more minimal version of systemd can coexist in the nix-store, and dynamic probing at runtime is always error-prone (as seen above). There might be reasons for the original change in other distros (probably only scattered through in comments in this issue tracker, but I'd love to get some pointers!) so I doubt you will go back to not doing We'd ideally like to get If I read it correctly, this PR will add all dlopen values in a compilation unit to some custom elf section, so we can use that information in a Nix-specific fixup phase to add things back to rpath? The current situation provides few insight, so we're left with either adding it in places we know (and hope it never moves around), or blindly add the rpath to all ELFs. Both are not ideal, and some semi-standardized way to fix this would be appreciated. |
|
So, I take it downstream distros aren't actually interested in having meta info about this stuff in ELF sections? I guess we can close this PR then, if there's no interest in making use of the info. As I understand the downstream distros where I assumed there might be interest in using this ELF metadata for auto-generating the right package deps would much rather have static deps back and are not concerned by ever growing list of deps we add. (Or at least find dlopen() more concerning than ever-growing dependency lists). I'd be willing to merge a patch that based on some ifdeffery optionally makes the deps static again. But please one knob only, and use macros to map the sym_xyz names to xyz in such a case, so that this doesn't leak everywhere. |
|
Pity, I would have quite liked to have support for OSX-like weak imports. Compiler suites already support it for that platform after all. |
|
@bluca hmm, we could still merge this if a distro you are involved with has a prospect of making use of this? |
|
Let me check with some Yocto folks, will get back |
|
As I understand the downstream distros where I assumed there might be
interest in using this ELF metadata for auto-generating the right
package deps would much rather have static deps back and are not
concerned by ever growing list of deps we add. (Or at least find
dlopen() more concerning than ever-growing dependency lists).
Thinking about it again after a while... Well, metainfo is better than nothing. It will be possible to generate a dependency from soname only, not able to versionize ABI symbols, but we can go without it. If soname of a dlopened library changes, we will be able to detect necessity to rebuild systemd. But there will be no guarantee that ABI of dlopened libraries does match code inside systemd. That is bad. If I remember correctly, Lennart wrote something about performing a build-time test of dlopening. That would probably be enough to solve the problem. But, if we want to track connection of systemd with dlopened libraries, the only way to do it in the current ecosystem is adding them to hard dependencies, what will make dlopening have no sense.
If a minimal systemd is needed, we may build it twice, first with full dependdncies, than without them. We may make a minimalistic systemd for containers. It will require more efforts to test paxkage upgrades and probably will not save a lot of disk space? How many megabytes in uncompressed form may be saved?
So, rethinking this, I would still prefer static depenencies, but the problem of a big depenency chain is a problem to my mind, but by far not a critical one.
|
Here's a small experiment: let's embed dlopen() dependency data in ELF
binary sections. This is useful for packaging tools such as RPM/DEB to
synthesize "recommends" style dependencies from this information.
Example use:
(this generates output in RPM spec file style)