Skip to content

.zuul: Enable testing on Fedora 41#1550

Merged
debarshiray merged 8 commits intocontainers:mainfrom
debarshiray:wip/rishi/zuul-test-f41
Sep 27, 2024
Merged

.zuul: Enable testing on Fedora 41#1550
debarshiray merged 8 commits intocontainers:mainfrom
debarshiray:wip/rishi/zuul-test-f41

Conversation

@debarshiray
Copy link
Copy Markdown
Member

No description provided.

The '-z now' flag, which is the opposite of '-z lazy', is unsupported as
an external linker flag [1], because of how the NVIDIA Container Toolkit
stack uses dlopen(3) to load libcuda.so.1 and libnvidia-ml.so.1 at
runtime [2,3].

The NVIDIA Container Toolkit stack doesn't use dlsym(3) to obtain the
address of a symbol at runtime before using it.  It links against
undefined symbols at build-time available through a CUDA API definition
embedded directly in the CGO code or a copy of nvml.h.  It relies upon
lazily deferring function call resolution to the point when dlopen(3) is
able to load the shared libraries at runtime, instead of doing it when
toolbox(1) is started.

This is unlike how Toolbx itself uses dlopen(3) and dlsym(3) to load
libsubid.so at runtime.

Compare the output of:
  $ nm /path/to/toolbox | grep ' subid_init'

... with those from:
  $ nm /path/to/toolbox | grep ' nvmlGpuInstanceGetComputeInstanceProfileInfoV'
          U nvmlGpuInstanceGetComputeInstanceProfileInfoV
  $ nm /path/to/toolbox | grep ' nvmlDeviceGetAccountingPids'
          U nvmlDeviceGetAccountingPids

Using '-z now' as an external linker flag forces the dynamic linker to
resolve all symbols when toolbox(1) is started, and leads to:
  $ toolbox
  toolbox: symbol lookup error: toolbox: undefined symbol:
      nvmlGpuInstanceGetComputeInstanceProfileInfoV

With the recent expansion of the test suite, it's necessary to increase
the timeout for the Fedora nodes to prevent the CI from timing out.

Fallout from 6e848b2

[1] NVIDIA Container Toolkit commit 1407ace94ab7c150
    NVIDIA/nvidia-container-toolkit@1407ace94ab7c150
    NVIDIA/go-nvml#18
    NVIDIA/nvidia-container-toolkit#49

[2] https://github.com/NVIDIA/nvidia-container-toolkit/tree/main/internal/cuda

[3] https://github.com/NVIDIA/go-nvml/blob/main/README.md
    https://github.com/NVIDIA/go-nvml/tree/main/pkg/dl
    https://github.com/NVIDIA/go-nvml/tree/main/pkg/nvml

containers#1548
The previous commit explains how the NVIDIA Container Toolkit is
sensitive to some linker flags.  Therefore, use the same linker flags
that are used by NVIDIA Container Toolkit to build the nvidia-cdi-hook,
nvidia-ctk, etc. binaries, because they use the same Go APIs that
toolbox(1) does [1].  It's better to use the same build configuration to
prevent subtle bugs from creeping in.

[1] NVIDIA Container Toolkit commit 772cf77dcc2347ce
    NVIDIA/nvidia-container-toolkit@772cf77dcc2347ce
    NVIDIA/nvidia-container-toolkit#333

containers#1548
Use 'software development' instead of just 'development' when
introducing Toolbx.  The additional context makes it more understandable
to the reader.

containers#1549
Only the images for currently maintained Fedoras (ie., 39) were updated.

containers#1549
debarshiray added a commit to debarshiray/toolbox that referenced this pull request Sep 26, 2024
@debarshiray debarshiray force-pushed the wip/rishi/zuul-test-f41 branch from dd9176e to a1a3db0 Compare September 26, 2024 18:18
Mention that Toolbx is meant for system administrators to troubleshoot
the host operating system.  The word 'debugging' is often used in the
context of software development, and hence most readers might not
interpret it as 'troubleshooting'.

containers#1549
Using the word 'containerized' gives the false impression of heightened
security.  As if it's a mechanism to run untrusted software in a
sandboxed environment without access to the user's private data (such as
$HOME), hardware peripherals (such as cameras and microphones), etc..
That's not what Toolbx is for.

Toolbx aims to offer an interactive command line environment for
development and troubleshooting the host operating system, without
having to install software on the host.  That's all.  It makes no
promise about security beyond what's already available in the usual
command line environment on the host that everybody is familiar with.

containers#1020
@debarshiray debarshiray force-pushed the wip/rishi/zuul-test-f41 branch from a1a3db0 to 679bf87 Compare September 26, 2024 19:20
@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed.
https://softwarefactory-project.io/zuul/t/local/buildset/0341bd01ead748cc8136c87c1f843908

✔️ unit-test SUCCESS in 5m 37s
✔️ unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 42s
✔️ unit-test-restricted SUCCESS in 5m 37s
system-test-fedora-rawhide TIMED_OUT in 3h 00m 34s
✔️ system-test-fedora-41 SUCCESS in 1h 51m 30s
✔️ system-test-fedora-40 SUCCESS in 1h 54m 02s
✔️ system-test-fedora-39 SUCCESS in 1h 58m 04s

@debarshiray
Copy link
Copy Markdown
Member Author

recheck

@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed.
https://softwarefactory-project.io/zuul/t/local/buildset/9de859f19c224ea7a4e6a46fcd4ff99c

✔️ unit-test SUCCESS in 5m 36s
✔️ unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 52s
✔️ unit-test-restricted SUCCESS in 5m 35s
system-test-fedora-rawhide TIMED_OUT in 3h 00m 19s
✔️ system-test-fedora-41 SUCCESS in 1h 58m 46s
✔️ system-test-fedora-40 SUCCESS in 2h 13m 24s
✔️ system-test-fedora-39 SUCCESS in 1h 58m 10s

@debarshiray
Copy link
Copy Markdown
Member Author

recheck

@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed.
https://softwarefactory-project.io/zuul/t/local/buildset/d5c79e5ad8214c26828e6aea178047a0

✔️ unit-test SUCCESS in 5m 38s
✔️ unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 19s
✔️ unit-test-restricted SUCCESS in 5m 47s
system-test-fedora-rawhide TIMED_OUT in 3h 00m 21s
✔️ system-test-fedora-41 SUCCESS in 1h 56m 37s
✔️ system-test-fedora-40 SUCCESS in 1h 55m 53s
✔️ system-test-fedora-39 SUCCESS in 2h 06m 21s

@debarshiray
Copy link
Copy Markdown
Member Author

recheck

@debarshiray
Copy link
Copy Markdown
Member Author

We will need to do something to cut down the amount of time it takes to go through all the tests. However, let's not block this pull request for that.

@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed.
https://softwarefactory-project.io/zuul/t/local/buildset/d314774e9f284ce99932df24cdda3502

✔️ unit-test SUCCESS in 5m 43s
✔️ unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 12s
✔️ unit-test-restricted SUCCESS in 5m 29s
system-test-fedora-rawhide TIMED_OUT in 3h 00m 21s
✔️ system-test-fedora-41 SUCCESS in 1h 59m 19s
✔️ system-test-fedora-40 SUCCESS in 1h 56m 06s
✔️ system-test-fedora-39 SUCCESS in 1h 58m 43s

@debarshiray debarshiray merged commit 679bf87 into containers:main Sep 27, 2024
@debarshiray debarshiray deleted the wip/rishi/zuul-test-f41 branch September 27, 2024 21:59
debarshiray added a commit to debarshiray/toolbox that referenced this pull request Jan 26, 2026
debarshiray added a commit to debarshiray/toolbox that referenced this pull request Jan 26, 2026
debarshiray added a commit to debarshiray/toolbox that referenced this pull request Jan 27, 2026
debarshiray added a commit to debarshiray/toolbox that referenced this pull request Jan 27, 2026
debarshiray added a commit to debarshiray/toolbox that referenced this pull request Feb 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants