Skip to content

Automatically detect CUDA if it isn't set and default to upstream torch.cuda number of devices#6605

Merged
changm merged 3 commits intomasterfrom
changm/gpu
Feb 27, 2024
Merged

Automatically detect CUDA if it isn't set and default to upstream torch.cuda number of devices#6605
changm merged 3 commits intomasterfrom
changm/gpu

Conversation

@changm
Copy link
Copy Markdown
Collaborator

@changm changm commented Feb 23, 2024

No description provided.

@changm changm requested a review from JackCaoG February 23, 2024 23:12
@changm changm self-assigned this Feb 23, 2024
@changm
Copy link
Copy Markdown
Collaborator Author

changm commented Feb 23, 2024

I manually verified that this worked on a GPU VM.

Copy link
Copy Markdown
Collaborator

@will-cromar will-cromar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! LGTM as long as test works in our CI

Comment thread torch_xla/runtime.py Outdated
@JackCaoG
Copy link
Copy Markdown
Collaborator

I think @ysiraichi updated our CI to compile pytorch with CUDA as well.

Comment thread torch_xla/runtime.py Outdated
Comment thread torch_xla/runtime.py Outdated
@changm changm changed the title Automatically detect CUDA if it isn't set and default to a single GPU device Automatically detect CUDA if it isn't set and default to upstream torch.cuda number of devices Feb 26, 2024
@changm
Copy link
Copy Markdown
Collaborator Author

changm commented Feb 26, 2024

Was failing CI/CD because the test harness explicitly sets PJRT_DEVICE. However, now that it's set, and this test only checks when the environment variable isn't set, the test doesn't actually run.

Should I update CI/CD to not set GPU devices by default in a follow up PR? @will-cromar for insight? Thanks!

@changm changm requested a review from vanbasten23 February 26, 2024 21:08
Comment thread test/test_gpu_device_detection.py
@will-cromar
Copy link
Copy Markdown
Collaborator

Should I update CI/CD to not set GPU devices by default in a follow up PR? @will-cromar for insight? Thanks!

When PJRT_DEVICE=CUDA is set, we either use GPU or crash if it's not available for some reason. For the vast majority of our tests, this is the correct behavior. We don't want them defaulting to CPU without us noticing for example.

In you test, feel free to just unset PJRT_DEVICE in the script.

@changm changm merged commit cb4983e into master Feb 27, 2024
@changm changm deleted the changm/gpu branch February 27, 2024 18:25
amithrm pushed a commit to amithrm/xla that referenced this pull request Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants