remaining unused sub-dependencies
If if want to deploy a binary that is linked against libfreetype.so.6, the following happens:
freetype as well as it's resolved depencies libpng and zlib are copied, but then freetype and zlib are removed as blacklisted libraries, which means an unused libpng library remains.
I guess one solution to this would be to check if a library is blacklisted before it is copied?
We use ldd to resolve the dependencies, which provides one single-dimensional array with dependencies.
There used to be some recursive dependency resolving mechanism but that was recently changed by @haampie for good reasons. I have doubts that before, this situation was being avoided though.
IMO we should replace ldd as a resolving mechanism anyway, due to some security implications it has. You shouldn't run it on unknown binaries etc. It would be much safer to read the direct dependencies e.g., using objcopy, and then recurse manually. That'd prevent libpng from being shipped in this scenario as well.
Ah, this is a very good reason to replace ldd indeed. I might look into that.
@TheAssassin some time ago I implemented an ldd replacement based on LIEF sadly I wasn't able to match ldd performance as it uses the system cache it's much more faster.
How do the two compare @azubieta? I don't think it can be that slow compared to all other IO that is being done bundling an executable, right?
@haampie I didn't performed any hard measurement it was only my first impression.
But it seems that I forgot to push the code to github. Now is gone https://github.com/azubieta/elf-tree-builder :sob::sob::sob::sob::sob::sob:
From the IRC discussion, a proposal to fix this issue is to let ldd resolve all dependencies, and then filter this list by recursively walking the tree via readelf or objdump. We prune branches of the tree whenever we hit a blacklisted shared lib.
The main reason to call ldd is that it's still best at handling RUNPATH, RPATH, LD_LIBRARY_PATH and their subtleties; readelf and objdump merely show the names of the libs but do not resolve the full path, so we can use the ldd output as a lookup table.
There's one alternative: lddtree -a. lddtree builds a tree of dependencies and outputs it in a way that you could parse easily. It however skips duplicates by default. With -a you get those duplicates as well.
By parsing the output, we could generate a complete tree, put it in a tree datastructure, then apply the excludelist to remove all intermediate nodes which are blacklisted (and the according subtrees), and traverse the tree. That gives us a precise list of dependencies.
lddtree $(which bash):
bash => /bin/bash (interpreter => /lib64/ld-linux-x86-64.so.2)
libtinfo.so.5 => /lib/x86_64-linux-gnu/libtinfo.so.5
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2
ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
lddtree -a $(which bash):
bash => /bin/bash (interpreter => /lib64/ld-linux-x86-64.so.2)
libtinfo.so.5 => /lib/x86_64-linux-gnu/libtinfo.so.5
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2
ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2
I think it makes most sense to just ship lddtree and build a parser for its output. We could even adjust the output again to show which library is pulled in by which library. But that's optional.
That looks nice. lddtree is a bash script with a single dependency scanelf, both licensed GPL v2. Not entirely sure what the status of the package is, copyright reads 2003-2012.
As long as it works (and it does, ELF hasn't changed really since... has it ever?), I don't see a problem.
We could also just learn from them how to resolve dependencies properly.
As a fallback we might even add an ldd cross-check to make sure everything's consistent, and print a warning in case there's inconsistencies or something like that.
After thinking a bit about this, I started to like the idea of just rebuilding the mechanisms of lddtree.sh.
We once extracted the ELF file related code contained in linuxdeploy into a separate library. That code should be updated with the latest changes (if there's any), then this library would be the perfect place to implement resolving mechanisms. We can even produce some sort of customized ldd replacement that works cross-architecture etc., which would contribute to some other issues in other projects which are looking for such a tool, and it'd close #8, which discusses the cross-architecture compilation problems.
Would be great to have, but hard to get right.
Looking at the source of lddtree.py I doubt they got the RPATH and RUNPATH right. I've seen at least a couple things that look questionable: they propagate RPATH only from the root (not from dependencies that specify rpath), they pass on RUNPATH recursively while it should only be used in direct dependencies, they discard RPATH when RUNPATH is set.
Of course they got a lot of things right as well and the package looks amazing, I'm just saying it's not easy to handle all the subtleties.
All of it is very well documented, though, so conformance testing should be possible. We just need some larger amount of tests. Also, we can do some tests against ldd to find more "obvious" issues.
"Overprovisioning" is also less of an issue for our specific use case, at least for the beginning. After all, I'd rather bundle too many than too few libraries.
I am not saying lddtree is perfect. I only ever used the bash script version, which works reliable enough for many applications.
I've set up a repo that exposes an issue with lddtree and serves as a nice test environment for us to test a custom ldd implementation:
https://github.com/haampie/lddtest
I'd say lddtree is a no-go if it fails to handle RPATH properly, especially since gcc-5 still defaults to RPATH, enabling RUNPATH requires a flag.
Dependency resolving is not a task of the ELF specification, it's the GNU linker (ld.so etc.) which defines the search order. The search algorithm is described in the man page. I consider that proper documentation.