Skip to content

Synchronize the documentation with the website before releasing 0.0.99.6#1549

Merged
debarshiray merged 7 commits intocontainers:mainfrom
debarshiray:wip/rishi/README-doc-synchronize-with-website
Sep 27, 2024
Merged

Synchronize the documentation with the website before releasing 0.0.99.6#1549
debarshiray merged 7 commits intocontainers:mainfrom
debarshiray:wip/rishi/README-doc-synchronize-with-website

Conversation

@debarshiray
Copy link
Copy Markdown
Member

No description provided.

The '-z now' flag, which is the opposite of '-z lazy', is unsupported as
an external linker flag [1], because of how the NVIDIA Container Toolkit
stack uses dlopen(3) to load libcuda.so.1 and libnvidia-ml.so.1 at
runtime [2,3].

The NVIDIA Container Toolkit stack doesn't use dlsym(3) to obtain the
address of a symbol at runtime before using it.  It links against
undefined symbols at build-time available through a CUDA API definition
embedded directly in the CGO code or a copy of nvml.h.  It relies upon
lazily deferring function call resolution to the point when dlopen(3) is
able to load the shared libraries at runtime, instead of doing it when
toolbox(1) is started.

This is unlike how Toolbx itself uses dlopen(3) and dlsym(3) to load
libsubid.so at runtime.

Compare the output of:
  $ nm /path/to/toolbox | grep ' subid_init'

... with those from:
  $ nm /path/to/toolbox | grep ' nvmlGpuInstanceGetComputeInstanceProfileInfoV'
          U nvmlGpuInstanceGetComputeInstanceProfileInfoV
  $ nm /path/to/toolbox | grep ' nvmlDeviceGetAccountingPids'
          U nvmlDeviceGetAccountingPids

Using '-z now' as an external linker flag forces the dynamic linker to
resolve all symbols when toolbox(1) is started, and leads to:
  $ toolbox
  toolbox: symbol lookup error: toolbox: undefined symbol:
      nvmlGpuInstanceGetComputeInstanceProfileInfoV

With the recent expansion of the test suite, it's necessary to increase
the timeout for the Fedora nodes to prevent the CI from timing out.

Fallout from 6e848b2

[1] NVIDIA Container Toolkit commit 1407ace94ab7c150
    NVIDIA/nvidia-container-toolkit@1407ace94ab7c150
    NVIDIA/go-nvml#18
    NVIDIA/nvidia-container-toolkit#49

[2] https://github.com/NVIDIA/nvidia-container-toolkit/tree/main/internal/cuda

[3] https://github.com/NVIDIA/go-nvml/blob/main/README.md
    https://github.com/NVIDIA/go-nvml/tree/main/pkg/dl
    https://github.com/NVIDIA/go-nvml/tree/main/pkg/nvml

containers#1548
The previous commit explains how the NVIDIA Container Toolkit is
sensitive to some linker flags.  Therefore, use the same linker flags
that are used by NVIDIA Container Toolkit to build the nvidia-cdi-hook,
nvidia-ctk, etc. binaries, because they use the same Go APIs that
toolbox(1) does [1].  It's better to use the same build configuration to
prevent subtle bugs from creeping in.

[1] NVIDIA Container Toolkit commit 772cf77dcc2347ce
    NVIDIA/nvidia-container-toolkit@772cf77dcc2347ce
    NVIDIA/nvidia-container-toolkit#333

containers#1548
Use 'software development' instead of just 'development' when
introducing Toolbx.  The additional context makes it more understandable
to the reader.

containers#1549
Only the images for currently maintained Fedoras (ie., 39) were updated.

containers#1549
@debarshiray debarshiray force-pushed the wip/rishi/README-doc-synchronize-with-website branch from bd29451 to 2eccaf8 Compare September 26, 2024 17:34
@debarshiray debarshiray changed the title Synchronize the documentation with the website to before releasing 0.0.99.6 [WIP] Synchronize the documentation with the website before releasing 0.0.99.6 Sep 26, 2024
debarshiray added a commit to debarshiray/toolbox that referenced this pull request Sep 26, 2024
debarshiray added a commit to debarshiray/toolbox that referenced this pull request Sep 26, 2024
Mention that Toolbx is meant for system administrators to troubleshoot
the host operating system.  The word 'debugging' is often used in the
context of software development, and hence most readers might not
interpret it as 'troubleshooting'.

containers#1549
Mention that Toolbx is meant for system administrators to troubleshoot
the host operating system.  The word 'debugging' is often used in the
context of software development, and hence most readers might not
interpret it as 'troubleshooting'.

containers#1549
Using the word 'containerized' gives the false impression of heightened
security.  As if it's a mechanism to run untrusted software in a
sandboxed environment without access to the user's private data (such as
$HOME), hardware peripherals (such as cameras and microphones), etc..
That's not what Toolbx is for.

Toolbx aims to offer an interactive command line environment for
development and troubleshooting the host operating system, without
having to install software on the host.  That's all.  It makes no
promise about security beyond what's already available in the usual
command line environment on the host that everybody is familiar with.

containers#1020
@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed.
https://softwarefactory-project.io/zuul/t/local/buildset/74243c03f3164991b690bef50af8548f

✔️ unit-test SUCCESS in 5m 47s
✔️ unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 35s
✔️ unit-test-restricted SUCCESS in 5m 41s
system-test-fedora-rawhide TIMED_OUT in 3h 00m 22s
✔️ system-test-fedora-40 SUCCESS in 1h 57m 24s
✔️ system-test-fedora-39 SUCCESS in 1h 58m 19s

@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed.
https://softwarefactory-project.io/zuul/t/local/buildset/d7958862761a4cfaa1768deba916934d

✔️ unit-test SUCCESS in 5m 28s
✔️ unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 47s
✔️ unit-test-restricted SUCCESS in 5m 45s
system-test-fedora-rawhide FAILURE in 6m 55s
✔️ system-test-fedora-40 SUCCESS in 2h 13m 09s
✔️ system-test-fedora-39 SUCCESS in 2h 18m 21s

@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed.
https://softwarefactory-project.io/zuul/t/local/buildset/0ff16e2026d14984b8bfe7aa706aba89

✔️ unit-test SUCCESS in 5m 42s
✔️ unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 41s
✔️ unit-test-restricted SUCCESS in 5m 44s
system-test-fedora-rawhide TIMED_OUT in 3h 00m 24s
✔️ system-test-fedora-40 SUCCESS in 2h 00m 11s
✔️ system-test-fedora-39 SUCCESS in 2h 02m 54s

@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed.
https://softwarefactory-project.io/zuul/t/local/buildset/40866a2886744507a2862113059b045e

✔️ unit-test SUCCESS in 5m 47s
✔️ unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 31s
✔️ unit-test-restricted SUCCESS in 5m 36s
system-test-fedora-rawhide TIMED_OUT in 3h 00m 18s
✔️ system-test-fedora-40 SUCCESS in 1h 54m 20s
✔️ system-test-fedora-39 SUCCESS in 1h 57m 59s

@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed.
https://softwarefactory-project.io/zuul/t/local/buildset/e153027ab3714298985593211e7fde78

✔️ unit-test SUCCESS in 5m 34s
✔️ unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 36s
✔️ unit-test-restricted SUCCESS in 5m 44s
system-test-fedora-rawhide TIMED_OUT in 3h 00m 22s
✔️ system-test-fedora-40 SUCCESS in 1h 56m 04s
system-test-fedora-39 POST_FAILURE in 1h 59m 33s

@debarshiray debarshiray merged commit 861cf85 into containers:main Sep 27, 2024
@debarshiray debarshiray deleted the wip/rishi/README-doc-synchronize-with-website branch September 27, 2024 16:44
@debarshiray
Copy link
Copy Markdown
Member Author

We will need to do something to cut down the amount of time it takes to go through all the tests. However, let's not block this pull request for that.

@debarshiray debarshiray changed the title [WIP] Synchronize the documentation with the website before releasing 0.0.99.6 Synchronize the documentation with the website before releasing 0.0.99.6 Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants