Synchronize the documentation with the website before releasing 0.0.99.6#1549
Conversation
The '-z now' flag, which is the opposite of '-z lazy', is unsupported as
an external linker flag [1], because of how the NVIDIA Container Toolkit
stack uses dlopen(3) to load libcuda.so.1 and libnvidia-ml.so.1 at
runtime [2,3].
The NVIDIA Container Toolkit stack doesn't use dlsym(3) to obtain the
address of a symbol at runtime before using it. It links against
undefined symbols at build-time available through a CUDA API definition
embedded directly in the CGO code or a copy of nvml.h. It relies upon
lazily deferring function call resolution to the point when dlopen(3) is
able to load the shared libraries at runtime, instead of doing it when
toolbox(1) is started.
This is unlike how Toolbx itself uses dlopen(3) and dlsym(3) to load
libsubid.so at runtime.
Compare the output of:
$ nm /path/to/toolbox | grep ' subid_init'
... with those from:
$ nm /path/to/toolbox | grep ' nvmlGpuInstanceGetComputeInstanceProfileInfoV'
U nvmlGpuInstanceGetComputeInstanceProfileInfoV
$ nm /path/to/toolbox | grep ' nvmlDeviceGetAccountingPids'
U nvmlDeviceGetAccountingPids
Using '-z now' as an external linker flag forces the dynamic linker to
resolve all symbols when toolbox(1) is started, and leads to:
$ toolbox
toolbox: symbol lookup error: toolbox: undefined symbol:
nvmlGpuInstanceGetComputeInstanceProfileInfoV
With the recent expansion of the test suite, it's necessary to increase
the timeout for the Fedora nodes to prevent the CI from timing out.
Fallout from 6e848b2
[1] NVIDIA Container Toolkit commit 1407ace94ab7c150
NVIDIA/nvidia-container-toolkit@1407ace94ab7c150
NVIDIA/go-nvml#18
NVIDIA/nvidia-container-toolkit#49
[2] https://github.com/NVIDIA/nvidia-container-toolkit/tree/main/internal/cuda
[3] https://github.com/NVIDIA/go-nvml/blob/main/README.md
https://github.com/NVIDIA/go-nvml/tree/main/pkg/dl
https://github.com/NVIDIA/go-nvml/tree/main/pkg/nvml
containers#1548
The previous commit explains how the NVIDIA Container Toolkit is
sensitive to some linker flags. Therefore, use the same linker flags
that are used by NVIDIA Container Toolkit to build the nvidia-cdi-hook,
nvidia-ctk, etc. binaries, because they use the same Go APIs that
toolbox(1) does [1]. It's better to use the same build configuration to
prevent subtle bugs from creeping in.
[1] NVIDIA Container Toolkit commit 772cf77dcc2347ce
NVIDIA/nvidia-container-toolkit@772cf77dcc2347ce
NVIDIA/nvidia-container-toolkit#333
containers#1548
Use 'software development' instead of just 'development' when introducing Toolbx. The additional context makes it more understandable to the reader. containers#1549
Only the images for currently maintained Fedoras (ie., 39) were updated. containers#1549
bd29451 to
2eccaf8
Compare
Mention that Toolbx is meant for system administrators to troubleshoot the host operating system. The word 'debugging' is often used in the context of software development, and hence most readers might not interpret it as 'troubleshooting'. containers#1549
Mention that Toolbx is meant for system administrators to troubleshoot the host operating system. The word 'debugging' is often used in the context of software development, and hence most readers might not interpret it as 'troubleshooting'. containers#1549
Using the word 'containerized' gives the false impression of heightened security. As if it's a mechanism to run untrusted software in a sandboxed environment without access to the user's private data (such as $HOME), hardware peripherals (such as cameras and microphones), etc.. That's not what Toolbx is for. Toolbx aims to offer an interactive command line environment for development and troubleshooting the host operating system, without having to install software on the host. That's all. It makes no promise about security beyond what's already available in the usual command line environment on the host that everybody is familiar with. containers#1020
|
Build failed. ✔️ unit-test SUCCESS in 5m 47s |
|
Build failed. ✔️ unit-test SUCCESS in 5m 28s |
|
Build failed. ✔️ unit-test SUCCESS in 5m 42s |
|
Build failed. ✔️ unit-test SUCCESS in 5m 47s |
|
Build failed. ✔️ unit-test SUCCESS in 5m 34s |
|
We will need to do something to cut down the amount of time it takes to go through all the tests. However, let's not block this pull request for that. |
No description provided.