Skip to content
This repository was archived by the owner on Aug 15, 2025. It is now read-only.

CUDA11 builders for Windows#463

Merged
malfet merged 32 commits intopytorch:masterfrom
zasdfgbnm:cuda110-win
Aug 6, 2020
Merged

CUDA11 builders for Windows#463
malfet merged 32 commits intopytorch:masterfrom
zasdfgbnm:cuda110-win

Conversation

@zasdfgbnm
Copy link
Copy Markdown
Contributor

cc: @peterjc123 #450 is deprecated and your comments in #450 will be resolved here. It is now WIP, not complete yet, and I don't have a chance to test it yet.

@zasdfgbnm
Copy link
Copy Markdown
Contributor Author

@peterjc123 How do I debug these scripts? Do I just create azure pipelines on my fork with windows/azure-pipelines.yml and it will run everything?

@peterjc123
Copy link
Copy Markdown
Contributor

We use CircleCI instead of Azure Pipelines now. I guess you'll need to modify https://github.com/pytorch/builder/blob/master/.circleci/regenerate.py. You can test that by removing the filter conditions in the builder jobs at PyTorch.

Copy link
Copy Markdown
Contributor

@peterjc123 peterjc123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. But there are several things missing including

  1. CircleCI configs in .circle/regenerate.py
  2. Conda configs in conda/build_pytorch.sh & conda/pytorch-nightly/bld.bat
  3. MAGMA with CUDA 11 for Windows (I can do that for you if you want)

pushd %SRC_DIR%\..

if "%CUDA_VERSION%" == "102" call internal\driver_update.bat
if "%CUDA_VERSION%" == "110" call internal\driver_update.bat
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you checked that the driver is up to date?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reminding me, and you are right that it is not up to date. It requires >=451.22. I will update this.

Copy link
Copy Markdown
Contributor Author

@zasdfgbnm zasdfgbnm Aug 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use this link http://us.download.nvidia.com/tesla/451.82/451.82-tesla-desktop-winserver-2019-2016-international.exe, or you can upload it to S3 so I can use S3 :)

)

if not exist "%SRC_DIR%\temp_build\cudnn-11.0-windows-x64-v8.0.0.180.zip" (
curl -k -L https://developer.download.nvidia.com/compute/redist/cudnn/v8.0.0/cudnn-11.0-windows-x64-v8.0.0.180.zip --output "%SRC_DIR%\temp_build\cudnn-11.0-windows-x64-v8.0.0.180.zip"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This link requires the credentials. So we have to upload it to S3.

Copy link
Copy Markdown
Contributor Author

@zasdfgbnm zasdfgbnm Aug 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this link to use the latest CUDA 11 and cuDNN release. These links should be working without login, but feel free to upload to S3 to keep consistent with other links.

Edit: I lied, this link is not working. Please do upload it to S3. :)

@robert-baldwin
Copy link
Copy Markdown

I'm just getting started with pytorch on Ubuntu on WSL2. Will this allow me to conda install pytorch torchvision cudatoolkit=11.0 -c pytorch?

@peterjc123
Copy link
Copy Markdown
Contributor

I'm just getting started with pytorch on Ubuntu on WSL2. Will this allow me to conda install pytorch torchvision cudatoolkit=11.0 -c pytorch?

Nope, I don't think there will be binaries for CUDA 11 recently because the conda package cudatoolkit=11.0 is not released yet. But you may eventually be able do that no matter you are using WSL or not on Windows.

@robert-baldwin
Copy link
Copy Markdown

Understood, thank you. I have one follow-up question if you don't mind.

I tried cloning the pytorch repository and building from source from the master branch which successfully built and loaded in python, but torch.cuda.is_available() #=> False although prior to running python setup.py install I had conda install -c nvidia cudatoolkit which gave me version 11.0.171. Would this branch potentially fix the local build issue?

@robert-baldwin
Copy link
Copy Markdown

robert-baldwin commented Jun 29, 2020

I appreciate the follow up @zasdfgbnm ! I am on the Windows Insider Preview Build 20152.1000 using Ubuntu 18.04 on WSL2 and have followed NVidia's CUDA Toolkit Documentation here: https://docs.nvidia.com/cuda/wsl-user-guide/index.html

I've verified that my GPU is reachable by watching Windows Task Manager while running scripts from the Cuda Samples in https://github.com/NVIDIA/cuda-samples within WSL2

One thing I've been uncertain about is whether installing CUDA Toolkit via apt as outlined in NVidia's somehow interferes w/ the version installed via conda -- I suspect so, since I saw "cycles in the constraint graph" warnings when I had both installed -- and opted to run the pytorch source build w/ only conda installed version only.

At this point I'm not quite sure where to look to discover whether where my issue may be.

@robert-baldwin
Copy link
Copy Markdown

I'm also happy to open an issue or move this conversation to a more appropriate place to get support based on your guidance. I very much appreciate your time and am conscious about unintentionally monopolizing it 😄

@zasdfgbnm
Copy link
Copy Markdown
Contributor Author

@robert-baldwin I actually don't know, because I haven't tried WSL yet. But if I were you, I would try

  • nightly build, and
  • compile from source with CUDA 10.2

@peterjc123
Copy link
Copy Markdown
Contributor

Understood, thank you. I have one follow-up question if you don't mind.

I tried cloning the pytorch repository and building from source from the master branch which successfully built and loaded in python, but torch.cuda.is_available() #=> False although prior to running python setup.py install I had conda install -c nvidia cudatoolkit which gave me version 11.0.171. Would this branch potentially fix the local build issue?

No, it is only a runtime package, not a source package.

@zasdfgbnm
Copy link
Copy Markdown
Contributor Author

@peterjc123 This PR is merged now #461, could you please use this to build magma for me? Thanks :)

@zasdfgbnm
Copy link
Copy Markdown
Contributor Author

This PR is tested by pytorch/pytorch#42482

@peterjc123
Copy link
Copy Markdown
Contributor

You could refer to the PR I mentioned below. Also magma is already built. See pytorch/pytorch#42420

@zasdfgbnm
Copy link
Copy Markdown
Contributor Author

@peterjc123 WOW, amazing! I was not aware of that PR. Thanks for working on it!


if not exist "%SRC_DIR%\temp_build\cudnn-11.0-windows10-x64-v8.0.2.39.zip" (
curl -k -L https://developer.download.nvidia.com/compute/redist/cudnn/v8.0.2/cudnn-11.0-windows10-x64-v8.0.2.39.zip --output "%SRC_DIR%\temp_build\cudnn-11.0-windows10-x64-v8.0.2.39.zip"
curl -k -L https://ossci-windows.s3.amazonaws.com/cudnn-11.0-windows-x64-v8.0.2.39.zip --output "%SRC_DIR%\temp_build\cudnn-11.0-windows10-x64-v8.0.2.39.zip"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the windows vs windows10? Looks like it is the same here: https://github.com/pytorch/pytorch/pull/42420/files#diff-3fde3b44a644efc229244fe515ea7d39R9

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. But the output name is windows10, right? It is confusing.

Copy link
Copy Markdown
Contributor

@peterjc123 peterjc123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes LGTM now, but would you mind opening a PR to test it, like what you do in pytorch/pytorch#42482?

@zasdfgbnm
Copy link
Copy Markdown
Contributor Author

@peterjc123 I will reuse pytorch/pytorch#42482 by rerunning the build. This shouldn't be merged until pytorch/pytorch#42482 is green.

set MAGMA_VERSION=2.5.2
if "%CUDA_VERSION%" == "80" set MAGMA_VERSION=2.4.0
if "%CUDA_VERSION%" == "90" set MAGMA_VERSION=2.5.0
if "%CUDA_VERSION%" == "110" set MAGMA_VERSION=2.5.3
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just update all to 2.5.3. Leaving 92 and 100 to 2.5.2.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

)

if not exist "%SRC_DIR%\temp_build\cudnn-11.0-windows10-x64-v8.0.2.39.zip" (
curl -k -L https://ossci-windows.s3.amazonaws.com/cudnn-11.0-windows10-x64-v8.0.2.39.zip --output "%SRC_DIR%\temp_build\cudnn-11.0-windows10-x64-v8.0.2.39.zip"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it should be https://ossci-windows.s3.amazonaws.com/cudnn-11.0-windows-x64-v8.0.2.39.zip.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I see. I just checked our cuDNN release page, for 10.2 there is a windows10 and windows, but for 11.0, there is only a windows. Sorry for being confusing, I will change the link and output names to windows.

@zasdfgbnm
Copy link
Copy Markdown
Contributor Author

The build/install logic should be OK, but I see some compilation errors that I believe fixed in your pytorch/pytorch#42420. I will wait for the merge of your PR and rerun the test again.

@zasdfgbnm
Copy link
Copy Markdown
Contributor Author

@malfet This should be ready too. This is tested at pytorch/pytorch#42482, which is failing with the same cuSparse error.

@zasdfgbnm zasdfgbnm changed the title [WIP] CUDA11 builders for Windows CUDA11 builders for Windows Aug 5, 2020
Comment on lines +112 to +114
PY3.5_110:
DESIRED_PYTHON: 3.5
CUDA_VERSION: 110
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We no longer support Python3.5 but I guess it can be done in a separate PR

@malfet malfet merged commit 492acfa into pytorch:master Aug 6, 2020
@zasdfgbnm zasdfgbnm deleted the cuda110-win branch August 6, 2020 17:19
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants