Skip to content

cudatoolkit_11_7: init at 11.7.0#179912

Merged
samuela merged 5 commits intoNixOS:masterfrom
dguibert:dg/cudatoolkit_11_7_0
Aug 5, 2022
Merged

cudatoolkit_11_7: init at 11.7.0#179912
samuela merged 5 commits intoNixOS:masterfrom
dguibert:dg/cudatoolkit_11_7_0

Conversation

@dguibert
Copy link
Copy Markdown
Member

@dguibert dguibert commented Jul 2, 2022

Description of changes
Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 22.11 Release Notes (or backporting 22.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
    • (Release notes changes) Ran nixos/doc/manual/md-to-db.sh to update generated release notes
  • Fits CONTRIBUTING.md.

@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux. labels Jul 2, 2022
@dguibert dguibert added 8.has: package (update) This PR updates a package to a newer version 6.topic: cuda Parallel computing platform and API labels Jul 2, 2022
@ajs124 ajs124 requested review from FRidh, SomeoneSerge and samuela and removed request for FRidh August 1, 2022 21:45
@samuela samuela changed the title update cudatoolkit to 11.7.0 cudatoolkit_11_7: init at 11.7.0 Aug 2, 2022
Copy link
Copy Markdown
Member

@samuela samuela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dguibert, thanks for contributing this!

@dguibert dguibert force-pushed the dg/cudatoolkit_11_7_0 branch from c2dd11b to a43d296 Compare August 2, 2022 14:20
@ofborg ofborg bot added 8.has: package (new) This PR adds a new package 10.rebuild-darwin: 11-100 This PR causes between 11 and 100 packages to rebuild on Darwin. 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux. and removed 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux. labels Aug 2, 2022
@samuela
Copy link
Copy Markdown
Member

samuela commented Aug 5, 2022

Result of nixpkgs-review pr 179912 run on x86_64-linux 1

2 packages marked as broken and skipped:
  • cudaPackages.nvidia_driver
  • truecrack-cuda
5 packages failed to build:
  • cudaPackages.cuda-samples
  • ethminer (ethminer-cuda)
  • mathematica-cuda
  • python310Packages.cupy
  • python39Packages.cupy
63 packages built:
  • colmapWithCuda
  • cudaPackages.cuda_cccl
  • cudaPackages.cuda_cudart
  • cudaPackages.cuda_cuobjdump
  • cudaPackages.cuda_cupti
  • cudaPackages.cuda_cuxxfilt
  • cudaPackages.cuda_demo_suite
  • cudaPackages.cuda_documentation
  • cudaPackages.cuda_gdb
  • cudaPackages.cuda_memcheck
  • cudaPackages.cuda_nsight
  • cudaPackages.cuda_nvcc
  • cudaPackages.cuda_nvdisasm
  • cudaPackages.cuda_nvml_dev
  • cudaPackages.cuda_nvprof
  • cudaPackages.cuda_nvprune
  • cudaPackages.cuda_nvrtc
  • cudaPackages.cuda_nvtx
  • cudaPackages.cuda_nvvp
  • cudaPackages.cuda_sanitizer_api
  • cudatoolkit (cudaPackages.cudatoolkit ,cudatoolkit_11)
  • cudaPackages.cudnn (cudaPackages.cudnn_8_4_0)
  • cudaPackages.cudnn_8_3_2
  • cudaPackages.cutensor
  • cudaPackages.fabricmanager
  • cudaPackages.libcublas
  • cudaPackages.libcufft
  • cudaPackages.libcufile
  • cudaPackages.libcurand
  • cudaPackages.libcusolver
  • cudaPackages.libcusparse
  • cudaPackages.libnpp
  • cudaPackages.libnvidia_nscq
  • cudaPackages.libnvjpeg
  • cudaPackages.nccl
  • cudaPackages.nsight_compute
  • cudaPackages.nsight_systems
  • cudaPackages.nvidia_fs
  • forge
  • gpu-burn
  • gromacsCudaMpi
  • gwe
  • katagoWithCuda
  • librealsenseWithCuda
  • magma
  • nvtop
  • nvtop-nvidia
  • python310Packages.TheanoWithCuda
  • python310Packages.numbaWithCuda
  • python310Packages.pycuda
  • python310Packages.pynvml
  • python310Packages.pyrealsense2WithCuda
  • python310Packages.pytorchWithCuda
  • python310Packages.tensorflowWithCuda
  • python39Packages.TheanoWithCuda
  • python39Packages.numbaWithCuda
  • python39Packages.pycuda
  • python39Packages.pynvml
  • python39Packages.pyrealsense2WithCuda
  • python39Packages.pytorchWithCuda
  • python39Packages.tensorflowWithCuda
  • xgboostWithCuda
  • xpraWithNvenc

@samuela
Copy link
Copy Markdown
Member

samuela commented Aug 5, 2022

  • cudaPackages.cuda-samples is known broken based on the comment in this PR
  • mathematica-cuda is an existing failure due to missing the installer download

That leaves ethminer-cuda and cupy, which appear to be broken by this 11.7 upgrade. @dguibert could you add the appropriate overrides to keep these two packages on 11.6 (or whatever the appropriate versions are)?

@samuela
Copy link
Copy Markdown
Member

samuela commented Aug 5, 2022

AFAICT cupy error looks like

cupy_backends/cuda/libs/cutensor.cpp: In function ‘uint64_t __pyx_f_13cupy_backends_4cuda_4libs_8cutensor_contractionGetWorkspaceSize(__pyx_obj_13cupy_backends_4cuda_4libs_8cutensor_Handle*, __pyx_obj_13cupy_backen
ds_4cuda_4libs_8cutensor_ContractionDescriptor*, __pyx_obj_13cupy_backends_4cuda_4libs_8cutensor_ContractionFind*, int, int)’:
cupy_backends/cuda/libs/cutensor.cpp:5717:17: error: ‘CUTENSOR_VERSION’ was not declared in this scope; did you mean ‘CUTENSOR_OP_SIN’?
 5717 |   __pyx_t_1 = ((CUTENSOR_VERSION < 0x2904) != 0);
      |                 ^~~~~~~~~~~~~~~~
      |                 CUTENSOR_OP_SIN
cupy_backends/cuda/libs/cutensor.cpp: In function ‘uint64_t __pyx_f_13cupy_backends_4cuda_4libs_8cutensor_reductionGetWorkspaceSize(__pyx_obj_13cupy_backends_4cuda_4libs_8cutensor_Handle*, intptr_t, __pyx_obj_13cup
y_backends_4cuda_4libs_8cutensor_TensorDescriptor*, intptr_t, intptr_t, __pyx_obj_13cupy_backends_4cuda_4libs_8cutensor_TensorDescriptor*, intptr_t, intptr_t, __pyx_obj_13cupy_backends_4cuda_4libs_8cutensor_TensorD
escriptor*, intptr_t, int, int, int)’:
cupy_backends/cuda/libs/cutensor.cpp:6584:17: error: ‘CUTENSOR_VERSION’ was not declared in this scope; did you mean ‘CUTENSOR_OP_SIN’?
 6584 |   __pyx_t_1 = ((CUTENSOR_VERSION < 0x2904) != 0);
      |                 ^~~~~~~~~~~~~~~~
      |                 CUTENSOR_OP_SIN

@SomeoneSerge
Copy link
Copy Markdown
Contributor

cupy has been broken on master for a while

@samuela
Copy link
Copy Markdown
Member

samuela commented Aug 5, 2022

ah gotcha, looks like the same is true for ethminer... in that case I guess we're good to merge

@samuela samuela merged commit a53c277 into NixOS:master Aug 5, 2022
@samuela samuela mentioned this pull request Aug 5, 2022
13 tasks
@zowoq
Copy link
Copy Markdown
Contributor

zowoq commented Aug 5, 2022

https://gist.github.com/GrahamcOfBorg/45ac7f5bc9e02a74cb1e4264f365417f

Seems this PR broke eval on master.

@winterqt
Copy link
Copy Markdown
Member

winterqt commented Aug 5, 2022

11.7.0 would need to be added here, assuming it's compatible. (cc @aidalgol)

@samuela
Copy link
Copy Markdown
Member

samuela commented Aug 5, 2022

@zowoq uh oh, sorry about that! I'll revert. Looks like we'll need to rebase onto latest master and try again

thanks for the heads up!

@aidalgol
Copy link
Copy Markdown
Contributor

aidalgol commented Aug 5, 2022

11.7.0 would need to be added here, assuming it's compatible. (cc @aidalgol)

@samuela
That version is not. The list of supported versions in that file is exactly as listed on the nvidia download page. There is a newer version of TensorRT that supports CUDA 11.7, but it requires cuDNN 8.4, which is not yet in nixpkgs.

@samuela
Copy link
Copy Markdown
Member

samuela commented Aug 5, 2022

Hmm interesting... so we should be marking TensorRT as broken in that case? at least we can't break evaluation haha

long term solution of course is to package cuDNN 8.4...

@aidalgol
Copy link
Copy Markdown
Contributor

aidalgol commented Aug 5, 2022

Hmm interesting... so we should be marking TensorRT as broken in that case? at least we can't break evaluation haha

That is already done automatically in the TensorRT derivations (see here).

@samuela
Copy link
Copy Markdown
Member

samuela commented Aug 5, 2022

Mm, I see what you mean. It appears that merging this did break eval however. I haven't dug into all the details just yet, but the logs suggest it has something to do with TensorRT...

@aidalgol
Copy link
Copy Markdown
Contributor

aidalgol commented Aug 5, 2022

It appears that the case where the CUDA version is not in tensorRTDefaultVersion is not handled, which is why adding CUDA 11.7 broke eval. I'm not sure how best to handle this case. Perhaps define a default version for CUDA 11.7, and let the logic in generic.nix mark it as broken, because 11.7 is not in the list of supported CUDA versions, but it will at least evaluate.

@samuela
Copy link
Copy Markdown
Member

samuela commented Aug 6, 2022

@zowoq What was the exact command that was failing? I just want to make sure that I'm testing correctly

@winterqt
Copy link
Copy Markdown
Member

winterqt commented Aug 6, 2022

See OfBorg's README for the command it runs.

@dguibert
Copy link
Copy Markdown
Member Author

dguibert commented Aug 6, 2022

long term solution of course is to package cuDNN 8.4...

introduced by a43d296 within this PR.

@dguibert
Copy link
Copy Markdown
Member Author

dguibert commented Aug 6, 2022

Mm, I see what you mean. It appears that merging this did break eval however. I haven't dug into all the details just yet, but the logs suggest it has something to do with TensorRT...

Adding a line: "11.7" = "8.4.0"; to tensorRTDefaultVersion should be enough.

@nixos-discourse
Copy link
Copy Markdown

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/how-to-install-a-specific-version-of-cuda-and-cudnn/21725/4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.topic: cuda Parallel computing platform and API 8.has: package (new) This PR adds a new package 8.has: package (update) This PR updates a package to a newer version 10.rebuild-darwin: 11-100 This PR causes between 11 and 100 packages to rebuild on Darwin. 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants