-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Automatically infer the PyTorch index via --torch-backend=auto
#12070
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7742e65 to
232dc9b
Compare
9d06dfb to
3e85795
Compare
3e85795 to
5220a90
Compare
|
@charliermarsh : Adapted from a few different sources - namely conda. I hope that illustrates my point better - why you need a plugin interface and you don't want to be the person responsible to maintain that 👍 # Copyright (C) 2012 Anaconda, Inc
# SPDX-License-Identifier: BSD-3-Clause
"""Detect CUDA version."""
import ctypes
import functools
import itertools
import multiprocessing
import os
import platform
from contextlib import suppress
from dataclasses import dataclass
from typing import Optional
@dataclass()
class CudaVersion:
version: str
architectures: list[str]
def cuda_version() -> Optional[CudaVersion]:
# Do not inherit file descriptors and handles from the parent process.
# The `fork` start method should be considered unsafe as it can lead to
# crashes of the subprocess. The `spawn` start method is preferred.
context = multiprocessing.get_context("spawn")
queue = context.SimpleQueue()
# Spawn a subprocess to detect the CUDA version
detector = context.Process(
target=_cuda_detector_target,
args=(queue,),
name="CUDA driver version detector",
daemon=True,
)
try:
detector.start()
detector.join(timeout=60.0)
finally:
# Always cleanup the subprocess
detector.kill() # requires Python 3.7+
if queue.empty():
return None
result = queue.get()
if result:
driver_version, architectures = result.split(";")
result = CudaVersion(driver_version, architectures.split(","))
return result
@functools.lru_cache(maxsize=None)
def cached_cuda_version():
return cuda_version()
def _cuda_detector_target(queue):
"""
Attempt to detect the version of CUDA present in the operating system in a
subprocess.
On Windows and Linux, the CUDA library is installed by the NVIDIA
driver package, and is typically found in the standard library path,
rather than with the CUDA SDK (which is optional for running CUDA apps).
On macOS, the CUDA library is only installed with the CUDA SDK, and
might not be in the library path.
Returns: version string with CUDA version first, then a set of unique SM's for the GPUs present in the system
(e.g., '12.4;8.6,9.0') or None if CUDA is not found.
The result is put in the queue rather than a return value.
"""
# Platform-specific libcuda location
system = platform.system()
if system == "Darwin":
lib_filenames = [
"libcuda.1.dylib", # check library path first
"libcuda.dylib",
"/usr/local/cuda/lib/libcuda.1.dylib",
"/usr/local/cuda/lib/libcuda.dylib",
]
elif system == "Linux":
lib_filenames = [
"libcuda.so", # check library path first
"/usr/lib64/nvidia/libcuda.so", # RHEL/Centos/Fedora
"/usr/lib/x86_64-linux-gnu/libcuda.so", # Ubuntu
"/usr/lib/wsl/lib/libcuda.so", # WSL
]
# Also add libraries with version suffix `.1`
lib_filenames = list(
itertools.chain.from_iterable((f"{lib}.1", lib) for lib in lib_filenames)
)
elif system == "Windows":
bits = platform.architecture()[0].replace("bit", "") # e.g. "64" or "32"
lib_filenames = [f"nvcuda{bits}.dll", "nvcuda.dll"]
else:
queue.put(None) # CUDA not available for other operating systems
return
# Open library
if system == "Windows":
dll = ctypes.windll
else:
dll = ctypes.cdll
for lib_filename in lib_filenames:
with suppress(Exception):
libcuda = dll.LoadLibrary(lib_filename)
break
else:
queue.put(None)
return
# Empty `CUDA_VISIBLE_DEVICES` can cause `cuInit()` returns `CUDA_ERROR_NO_DEVICE`
# Invalid `CUDA_VISIBLE_DEVICES` can cause `cuInit()` returns `CUDA_ERROR_INVALID_DEVICE`
# Unset this environment variable to avoid these errors
os.environ.pop("CUDA_VISIBLE_DEVICES", None)
# Get CUDA version
try:
cuInit = libcuda.cuInit
flags = ctypes.c_uint(0)
ret = cuInit(flags)
if ret != 0:
queue.put(None)
return
cuDriverGetVersion = libcuda.cuDriverGetVersion
version_int = ctypes.c_int(0)
ret = cuDriverGetVersion(ctypes.byref(version_int))
if ret != 0:
queue.put(None)
return
# Convert version integer to version string
value = version_int.value
version_value = f"{value // 1000}.{(value % 1000) // 10}"
count = ctypes.c_int(0)
libcuda.cuDeviceGetCount(ctypes.pointer(count))
architectures = set()
for device in range(count.value):
major = ctypes.c_int(0)
minor = ctypes.c_int(0)
libcuda.cuDeviceComputeCapability(
ctypes.pointer(major),
ctypes.pointer(minor),
device)
architectures.add(f"{major.value}.{minor.value}")
queue.put(f"{version_value};{','.join(architectures)}")
except Exception:
queue.put(None)
return
if __name__ == "__main__":
print(cuda_version()) |
crates/uv-torch/src/lib.rs
Outdated
| | "torchserve" | ||
| | "torchtext" | ||
| | "torchvision" | ||
| | "pytorch-triton" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add this list to some documentation? Reading the high-level overview I didn't realize we were hardcoding a package list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we generate this by querying the PyTorch indices to see what they have? (Maybe a manually-run script that queries them and updates this list, or an automatically-run integration tests that makes sure this list is in sync with what's currently on their indices?)
Along those lines it would be helpful to have this list somewhere declarative. It might also be helpful to allow user-controlled overrides of this list if the set of packages changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately I don't know that we can... We don't want all packages on these indexes, because they include things like jinja2. And in some cases, they include incomplete packages like markupsafe (where they only have a few wheels).
geofft
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a great idea.
Would it be worth naming this feature something like uv-specialized-index instead of uv-torch with an eye to extending it to other libraries in the future? (jaxlib and tensorflow, for instance, have current/popular versions on PyPI, but I think also have their own indees)?
crates/uv-torch/src/lib.rs
Outdated
| | "torchserve" | ||
| | "torchtext" | ||
| | "torchvision" | ||
| | "pytorch-triton" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we generate this by querying the PyTorch indices to see what they have? (Maybe a manually-run script that queries them and updates this list, or an automatically-run integration tests that makes sure this list is in sync with what's currently on their indices?)
Along those lines it would be helpful to have this list somewhere declarative. It might also be helpful to allow user-controlled overrides of this list if the set of packages changes.
I had a similar thought, I think this is one of many cases. Also considering when such indexes are mirrored or vendored internally. I was thinking what would be the right naming. I know some avenues refers to this as a |
Nevermind, didn't notice you were referring to CUDA.
💯 In my experience nvidia-smi can also take a long time depending on gpu load. Although there multiple locations depending on how (e.g. dkms) and environment (windows, osx) it's installed. For example, WSL 2 its even weirder due to the shared drivers with the host situation. So nvidia-smi might be the most sure-fire low risk way (assuming no issues with install). |
|
Definitely agree with moving this out of the interpreter query (and possibly reading it from outside I'm a little wary of trying to brand this as something more general than |
10ecfd8 to
3349921
Compare
3349921 to
e95ca06
Compare
96038e9 to
c9e4b20
Compare
konstin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deferring to @geofft for the new detect logic
| Ok(None) => { | ||
| debug!("Failed to parse CUDA driver version from `/proc/driver/nvidia/version`"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this case return an error instead of falling through to nvidia-smi?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not confident enough in the format of this one... It seems like it varies across machines.
| if output.status.success() { | ||
| let driver_version = Version::from_str(&String::from_utf8(output.stdout)?)?; | ||
| debug!("Detected CUDA driver version from `nvidia-smi`: {driver_version}"); | ||
| return Ok(Some(Self::Cuda { driver_version })); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else {debug!("nvidia-smi returned error {output.status}: {output.stderr}")} might be nice
14cb5ed to
9e40e0c
Compare
9e40e0c to
888aecd
Compare
coezbek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--torch-backend=auto fails for multiple GPU systems which don't have /sys/module/nvidia/version or /proc/driver/nvidia/version (e.g. WSL)
| .output() | ||
| { | ||
| if output.status.success() { | ||
| let driver_version = Version::from_str(&String::from_utf8(output.stdout)?)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a system with multiple GPUs this line will return multiple driver versions, e.g. on my system:
$ nvidia-smi --query-gpu=driver_version --format=csv,noheader
572.60
572.60
This will make uv pip install with --torch-backend=auto fail with the following error:
uv pip install -U "vllm[audio]" --torch-backend=auto
error: after parsing `572.60
`, found `572.60
`, which is not part of a valid version
nvidia-smi does not respect NVIDIA_VISIBLE_DEVICES, so there is no way from the outside to use --torch-backend=auto with two graphics cards at the moment.
Workaround is to run nvidia-smi, identify CUDA version there and run with --torch-backend=cuXXX as indicated by nvidia-smi.
|
@charliermarsh I recommend you to use Example: This is using the Python bindings - but NVML is C library you can directly dlopen. The python example should tell you what functions to call for SM, UMD and KMD ;) |
|
Awesome, thanks @DEKHTIARJonathan. I filed an issue here: #14664 |
|
Tested the following command on a system with a GTX 1080 Ti and CUDA 12.8 driver (what shows in Then ran this test script: import torch
tensor = torch.randn(3, 4, device='cuda')
print(tensor)And got this: I think just looking at the installed CUDA version/driver might not be enough, the CUDA compute capability supported by different torch versions is of relevance as well. In this case, a GTX 1080 Ti has drivers installed that show CUDA 12.8 in Some relevant discussion I found: I then also tested without This seems to install the |
|
Do you mind filing a separate issue? We tend to prefer that over commenting on closed issues or pull requests. |
It's not entirely accurate. It's deprecated starting from CUDA 12.8 but not incompatible (otherwise you wouldn't have been to install it). I don't think @charliermarsh can fix it without having a massive headache of if/else conditions (it's on a per-library/package basis). @hamzaq2000 Just install |
Certainly, sorry about that! #14742 Unfortunately I'm not knowledgeable enough to comment on the feasibility of this, so I won't. But I thought it was worth putting out there; hopefully it is fixable and |
Summary
This is a prototype that I'm considering shipping under
--preview, based onlight-the-torch.light-the-torchpatches pip to pull PyTorch packages from the PyTorch indexes automatically. And, in particular,light-the-torchwill query the installed CUDA drivers to determine which indexes are compatible with your system.This PR implements equivalent behavior under
--torch-backend auto, though you can also set--torch-backend cpu, etc. for convenience. When enabled, the registry client will fetch from the appropriate PyTorch index when it sees a package from the PyTorch ecosystem (and ignore any other configured indexes, unless the package is explicitly pinned to a different index).Right now, this is only implemented in the
uv pipCLI, since it doesn't quite fit into the lockfile APIs given that it relies on feature detection on the currently-running machine.Test Plan
On macOS, you can test this with (e.g.):
On a GPU-enabled EC2 machine: