Initial support Blackwell GPU arch by johnnynunez · Pull Request #26820 · opencv/opencv

johnnynunez · 2025-01-22T00:47:01Z

10.0 blackwell b100/b200
12.0 blackwell rtx50

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

10.0 blackwell b100/b200 12.0 blackwell rtx50

asmorkalov · 2025-01-22T07:10:44Z

cc @cudawarped

cudawarped · 2025-01-22T08:24:02Z

@johnnynunez Which version of the CUDA toolkit are you using to compile for Blackwell. The latest version I have access to (12.6 Update 3) does not support compute capability 10. I assume Nvidia will release a version 13.0 to coincide with the release of the first Blackwell cards.

Isn't Blackwell compute capability 10.0, where does 12.0 come from?

johnnynunez · 2025-01-22T08:27:04Z

@johnnynunez Which version of the CUDA toolkit are you using to compile for Blackwell. The latest version I have access to (12.6 Update 3) does not support compute capability 10. I assume Nvidia will release a version 13.0 to coincide with the release of the first Blackwell cards.

Isn't Blackwell compute capability 10.0, where does 12.0 come from?

Hello,
B100 is coming support with 12.7 cuda
Rtx50 is coming with 12.8

johnnynunez · 2025-01-22T08:27:13Z

@johnnynunez Which version of the CUDA toolkit are you using to compile for Blackwell. The latest version I have access to (12.6 Update 3) does not support compute capability 10. I assume Nvidia will release a version 13.0 to coincide with the release of the first Blackwell cards.

Isn't Blackwell compute capability 10.0, where does 12.0 come from?

Hello,
B100 is coming support with 12.7 cuda
Rtx50 is coming with 12.8

also thor is 10.1 capability.
I have rtx 5090 card.

cudawarped · 2025-01-22T08:47:58Z

@johnnynunez So you haven't tested that this works on Blackwell 🤯? I would wait until a version of the CUDA toolkit is release which supports compute capability 10.0 to be 100% sure your change don't break anything. I would also remove compute capability 12.0.

Note: If you add 10.1 for Thor you also want to update this filter

opencv/cmake/OpenCVDetectCUDAUtils.cmake

Line 274 in ea023b7

ocv_filter_available_architecture(${nvcc_executable} __cuda_arch_bin

johnnynunez · 2025-01-22T08:49:51Z

@johnnynunez So you haven't tested that this works on Blackwell 🤯? I would wait until a version of the CUDA toolkit is release which supports compute capability 10.0 to be 100% sure your change don't break anything. I would also remove compute capability 12.0.

Note: If you add 10.1 for Thor you also want to update this filter

opencv/cmake/OpenCVDetectCUDAUtils.cmake

Line 274 in ea023b7

ocv_filter_available_architecture(${nvcc_executable} __cuda_arch_bin

But why you remove 12.0?
My 5090 output is 12.0. I think that digits will be 11.0

johnnynunez · 2025-01-22T08:50:35Z

@johnnynunez So you haven't tested that this works on Blackwell 🤯? I would wait until a version of the CUDA toolkit is release which supports compute capability 10.0 to be 100% sure your change don't break anything. I would also remove compute capability 12.0.
Note: If you add 10.1 for Thor you also want to update this filter

opencv/cmake/OpenCVDetectCUDAUtils.cmake

Line 274 in ea023b7

ocv_filter_available_architecture(${nvcc_executable} __cuda_arch_bin

But why you remove 12.0? My 5090 output is 12.0. I think that digits will be 11.0

Well, today NDA is removed

johnnynunez · 2025-01-22T08:51:55Z

@johnnynunez So you haven't tested that this works on Blackwell 🤯? I would wait until a version of the CUDA toolkit is release which supports compute capability 10.0 to be 100% sure your change don't break anything. I would also remove compute capability 12.0.
Note: If you add 10.1 for Thor you also want to update this filter

opencv/cmake/OpenCVDetectCUDAUtils.cmake

Line 274 in ea023b7

ocv_filter_available_architecture(${nvcc_executable} __cuda_arch_bin

But why you remove 12.0? My 5090 output is 12.0. I think that digits will be 11.0

I added support on pytorch, xla etc

cudawarped · 2025-01-22T08:55:59Z

But why you remove 12.0?
My 5090 output is 12.0. I think that digits will be 11.0

Can you provide me with a source to indicate that your 5090 will be compute capability 12.0?

I added support on pytorch, xla etc

Pytorch merged compute capability 10.0 and 12.0 without any build testing (I guess its a python first library)? Not sure I can see a reason for not waiting especially in OpenCV when you can manually select the compute cabability using combinations of CUDA_ARCH_BIN and CUDA_ARCH_PTX.

johnnynunez · 2025-01-22T09:16:04Z

But why you remove 12.0?
My 5090 output is 12.0. I think that digits will be 11.0

Can you provide me with a source to indicate that your 5090 will be compute capability 12.0?

I added support on pytorch, xla etc

Pytorch merged compute capability 10.0 and 12.0 without any build testing (I guess its a python first library)? Not sure I can see a reason for not waiting especially in OpenCV when you can manually select the compute cabability using combinations of CUDA_ARCH_BIN and CUDA_ARCH_PTX.

more references:
pytorch/pytorch#145270 I added to pytorch
Dao-AILab/flash-attention#1436

cudawarped · 2025-01-22T09:32:27Z

@johnnynunez 🤯 Nvidia is going up 3 compute capabilities with a single generation, that's going to be really confusing considering they previously did a compute version per generation.

What is the output from

nvidia-smi --query-gpu=compute_cap --format=csv

If it says

compute_cap
12.0

Do you think its possible that the driver is outputing the wrong info?

@asmorkalov Either way I would suggest it would be better to wait until more info is available before merging this PR.

johnnynunez · 2025-01-22T09:34:01Z

nvidia-smi --query-gpu=compute_cap --format=csv

It's okay, flash attention v4 is coming for blackwell also. They have 100 and 120.
I totally agree haha but well, nvidia now have a lot of products...
ARM upcoming laptops, gpus desktops, digits, jetson, and data centers

johnnynunez · 2025-01-22T09:51:34Z

@johnnynunez 🤯 Nvidia is going up 3 compute capabilities with a single generation, that's going to be really confusing considering they previously did a compute version per generation.

What is the output from

nvidia-smi --query-gpu=compute_cap --format=csv

If it says

compute_cap
12.0

Do you think its possible that the driver is outputing the wrong info?

@asmorkalov Either way I would suggest it would be better to wait until more info is available before merging this PR.

I share you in the following hours because I'm not at home. But we have press release driver 571.86 whl

asmorkalov · 2025-01-22T09:52:19Z

@cudawarped Thanks a lot for the analysis. I'll review hardware specs and return back soon.

johnnynunez · 2025-01-22T10:41:02Z

But why you remove 12.0?
My 5090 output is 12.0. I think that digits will be 11.0

Can you provide me with a source to indicate that your 5090 will be compute capability 12.0?

I added support on pytorch, xla etc

Pytorch merged compute capability 10.0 and 12.0 without any build testing (I guess its a python first library)? Not sure I can see a reason for not waiting especially in OpenCV when you can manually select the compute cabability using combinations of CUDA_ARCH_BIN and CUDA_ARCH_PTX.

johnnynunez · 2025-01-23T00:25:26Z

But why you remove 12.0?
My 5090 output is 12.0. I think that digits will be 11.0

Can you provide me with a source to indicate that your 5090 will be compute capability 12.0?

I added support on pytorch, xla etc

Pytorch merged compute capability 10.0 and 12.0 without any build testing (I guess its a python first library)? Not sure I can see a reason for not waiting especially in OpenCV when you can manually select the compute cabability using combinations of CUDA_ARCH_BIN and CUDA_ARCH_PTX.

new drivers are showing cuda 12.8 and same 12.0 codename

johnnynunez · 2025-01-23T13:33:11Z

more references:
NVIDIA/cccl#3493

10.0 b100 b200
10.0a arm laptops, digits?
10.1 thor
10.1a arm laptops, digits?
12.0 rtx50

cudawarped · 2025-01-23T13:38:30Z

@johnnynunez I still suggest we wait I can't see any downside with OpenCV because the compute capability can be manually selected.

I still wouldn't be suprised if the 12 is the version of the CUDA toolkit due to the returned compute capability not being valid because the CUDA toolkit used by pytorch and used to compile nvidia-smi pre-date compute capabilities >= 9.0.

johnnynunez · 2025-01-23T13:39:26Z

@johnnynunez I still suggest we wait I can't see any downside with OpenCV because the compute capability can be manually selected.

I still wouldn't be suprised if the 12 is the version of the CUDA toolkit due to the returned compute capability not being valid because the CUDA toolkit used by pytorch and used to compile nvidia-smi pre-date compute capabilities >= 9.0.

yeah! Totally agree

johnnynunez · 2025-01-23T21:08:44Z

cuda 12.8 is out

cudawarped · 2025-01-24T06:34:52Z

@johnnynunez Compute capability 12 looks to be official. Consumer cards look to have less resident threads per SM.

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html?highlight=compute%2520capability#features-and-technical-specifications

cudawarped · 2025-01-24T10:27:58Z

@asmorkalov Builds on both Windows 11 and Ubuntu 22.04 (WSL) with CUDA Toolkit 12.8 using default architectures selection (CUDA_ARCH_BIN and CUDA_ARCH_PTX not specified)

--   NVIDIA CUDA:                   YES (ver 12.8, CUFFT CUBLAS NVCUVID NVCUVENC)
--     NVIDIA GPU arch:             50 52 60 61 70 75 80 86 89 90 100 120
--     NVIDIA PTX archs:            120
--
--   cuDNN:                         YES (ver 9.7.0)

and -DCUDA_GENERATION=Blackwell


--   NVIDIA CUDA:                   YES (ver 12.8, CUFFT CUBLAS NVCUVID NVCUVENC)
--     NVIDIA GPU arch:             100 120
--     NVIDIA PTX archs:
--
--   cuDNN:                         YES (ver 9.7.0)

johnnynunez · 2025-01-24T13:49:44Z

@asmorkalov Builds on both Windows 11 and Ubuntu 22.04 (WSL) with CUDA Toolkit 12.8 using default architectures selection (CUDA_ARCH_BIN and CUDA_ARCH_PTX not specified)

--   NVIDIA CUDA:                   YES (ver 12.8, CUFFT CUBLAS NVCUVID NVCUVENC)
--     NVIDIA GPU arch:             50 52 60 61 70 75 80 86 89 90 100 120
--     NVIDIA PTX archs:            120
--
--   cuDNN:                         YES (ver 9.7.0)

and -DCUDA_GENERATION=Blackwell


--   NVIDIA CUDA:                   YES (ver 12.8, CUFFT CUBLAS NVCUVID NVCUVENC)
--     NVIDIA GPU arch:             100 120
--     NVIDIA PTX archs:
--
--   cuDNN:                         YES (ver 9.7.0)

Yeah I compiled it pytorch, xformers, etc and opencv with my rtx5090 and it works. I just couldn't comment on anything because of NDA. But it was lifted yesterday

johnnynunez · 2025-01-24T20:12:55Z

@asmorkalov feel free to merge! thanks

Initial support Blackwell GPU arch opencv#26820 10.0 blackwell b100/b200 12.0 blackwell rtx50 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake

initial support blackwell

cd83daa

10.0 blackwell b100/b200 12.0 blackwell rtx50

johnnynunez changed the title ~~initial support blackwell~~ initial support blackwell codegen Jan 22, 2025

asmorkalov added category: build/install category: gpu/cuda (contrib) OpenCV 4.0+: moved to opencv_contrib labels Jan 22, 2025

asmorkalov added this to the 4.12.0 milestone Jan 22, 2025

Adding Thor support

221d1fe

asmorkalov self-requested a review January 22, 2025 09:56

opencv-alalek added the feature label Jan 23, 2025

asmorkalov self-assigned this Jan 25, 2025

asmorkalov changed the title ~~initial support blackwell codegen~~ Initial support Blackwell GPU arch Jan 25, 2025

asmorkalov merged commit 4b2a33a into opencv:4.x Jan 25, 2025

ConnorBaker mentioned this pull request Feb 13, 2025

cmake/OpenCVDetectCUDAUtils.cmake: use IN_LIST to avoid regex matching valid capabilities #26920

Merged

6 tasks

asmorkalov mentioned this pull request Feb 19, 2025

5.x merge 4.x #26939

Merged

cudawarped mentioned this pull request Mar 16, 2025

CUDA_ARCH_BIN and CUDA_ARCH_PTX contain "12.0" which doesn't exist yet #27077

Closed

4 tasks

Uh oh!

Conversation

johnnynunez commented Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Readiness Checklist

Uh oh!

asmorkalov commented Jan 22, 2025

Uh oh!

cudawarped commented Jan 22, 2025

Uh oh!

johnnynunez commented Jan 22, 2025

Uh oh!

johnnynunez commented Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cudawarped commented Jan 22, 2025

Uh oh!

johnnynunez commented Jan 22, 2025

Uh oh!

johnnynunez commented Jan 22, 2025

Uh oh!

johnnynunez commented Jan 22, 2025

Uh oh!

cudawarped commented Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johnnynunez commented Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cudawarped commented Jan 22, 2025

Uh oh!

johnnynunez commented Jan 22, 2025

Uh oh!

johnnynunez commented Jan 22, 2025

Uh oh!

asmorkalov commented Jan 22, 2025

Uh oh!

johnnynunez commented Jan 22, 2025

Uh oh!

johnnynunez commented Jan 23, 2025

Uh oh!

johnnynunez commented Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cudawarped commented Jan 23, 2025

Uh oh!

johnnynunez commented Jan 23, 2025

Uh oh!

johnnynunez commented Jan 23, 2025

Uh oh!

cudawarped commented Jan 24, 2025

Uh oh!

cudawarped commented Jan 24, 2025

Uh oh!

johnnynunez commented Jan 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johnnynunez commented Jan 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

johnnynunez commented Jan 22, 2025 •

edited

Loading

johnnynunez commented Jan 22, 2025 •

edited

Loading

cudawarped commented Jan 22, 2025 •

edited

Loading

johnnynunez commented Jan 22, 2025 •

edited

Loading

johnnynunez commented Jan 23, 2025 •

edited

Loading

johnnynunez commented Jan 24, 2025 •

edited

Loading