Fix ldcache update when host and container distributions do not match#1444
Conversation
|
/cherry-pick release-1.18 |
9a32323 to
fb95f58
Compare
aa0c6fa to
a854d3d
Compare
cdesiniotis
left a comment
There was a problem hiding this comment.
I left a couple of comments / questions, but no blockers.
2794eaa to
0b28791
Compare
internal/ldconfig/ldconfig.go
Outdated
| func (l *Ldconfig) getSystemSerachPaths() []string { | ||
| func (l *Ldconfig) getSystemSearchPaths() []string { | ||
| if l.isDebianLikeContainer { | ||
| debianSystemSearchPaths() |
There was a problem hiding this comment.
I've been debugging issues on my homelab cluster with the latest CTK release, and I just spotted this code here. I think it's missing a return. I can send a PR for the existing code, or we can roll that fix into this PR.
There was a problem hiding this comment.
Thanks for the fix. In order to minimize backport churn, let's add the commit to this PR.
f432507 to
70cca03
Compare
|
I'm preparing a follow up / alternative to this patch to handle additional cases I've ran into recently. Should have it out as a branch in an hour or so. |
7f73f39 to
22d7b5d
Compare
d3dcb86 to
1e52761
Compare
1e52761 to
8e56cf7
Compare
Additional commits.
| // TODO: Add other architectures that have custom `add_system_dir` macros (e.g. riscv) | ||
| // TODO: Replace with executing the container's dynamlic linker with `--list-diagnostics`? |
There was a problem hiding this comment.
Once we merge this PR we should turn this into GitHub issues to track this TODO's and get them implemented
| // TODO: Add other architectures that have custom `add_system_dir` macros (e.g. riscv) | ||
| // TODO: Replace with executing the container's dynamlic linker with `--list-diagnostics`? |
There was a problem hiding this comment.
Once we merge this PR we should turn this into GitHub issues to track this TODO's and get them implemented
|
Thanks a lot for your contributions on this topic @jfroy ! |
internal/ldconfig/ldconfig.go
Outdated
| ldsoconfdFilenamePattern = "00-nvcr-*.conf" | ||
| // ldsoconfdSystemDirsFilenamePattern specifies the filename pattern for the drop-in conf file | ||
| // that includes the expected system directories for the container. | ||
| // This is chosen to have a high likelihood of being lexographically last in |
There was a problem hiding this comment.
"lexicographically" :)
We could also use the prefix "yolo-" 🤡
There was a problem hiding this comment.
I believe this was me. Updating this and references to Debian and non-Debian for clarity.
jgehrcke
left a comment
There was a problem hiding this comment.
Fascinating, thank you.
non-debian (including ubuntu)
From the perspective of this PR, Ubuntu is actually non-Debian?
No. Ubuntu is debian in this context. Let me update the description to be clearer. |
@elezar is that what you meant (I think this is quoting #1444 (comment))? @jgehrcke Ubuntu is "debian-like" because |
This change ensures that updating the ldcache in a container also includes the system search paths for the container ldconfig. In most cases, the hook will be executing a host ldconfig that may be configured widely differently from what the container image expects. The common case is Debian-like (e.g. Debian, Ubuntu) vs non-Debian-like (e.g. RHEL, Fedora). But there are also hosts that configure ldconfig to search in a glibc prefix (e.g. /usr/lib/glibc). To avoid all these cases, write the container's expected system search paths to a drop-in conf file that is likely to be last in lexicographic order. Entries in the top-level ld.so.conf file may be processed after this drop-in, but this hook does not modify the top-level file if it exists. Signed-off-by: Jean-Francois Roy <jeroy@nvidia.com> Signed-off-by: Evan Lezar <elezar@nvidia.com>
8e56cf7 to
6592021
Compare
Yes, the code also used |
|
🤖 Backport PR created for |
This change fixes the update of the ldcache when running a non-Debian-like (e.g. RHEL, Fedora) container on a Debian-like (e.g. Debian, Ubuntua) system. Here the ldcache in the container does not initially include system libraries from
/lib64or/usr/lib64since thisldconfigfrom the host that is executed in the container's namespace does not process those paths by default.Testing
The expected behaviour when using
runcdirectly:The current behaviour:
(note that
libc.so.6is not present in the ldcache since/lib64(linked to/usr/lib64) is not processed).The updated behaviour: