🐛 Describe the bug
| Device |
Eager Mode |
Inductor Mode |
| CPU |
❌ Error (RuntimeError) |
⚠️ Silent Pass (Dangerous) |
| CUDA |
❌ Error (Device-side Assert) |
⚠️ Silent Pass (Dangerous) |
| MPS |
⚠️ Silent Pass (Dangerous) |
⚠️ Silent Pass (Dangerous) |
Reproduce script
import os
# os.environ["TORCH_LOGS"] = "output_code"
import torch
device = "cpu"
# device = "cuda"
# device = "mps"
def fn(x):
indices = torch.tensor([-1], dtype=torch.int64).to(device)
return torch.index_select(x, 0, indices)
x = torch.randn(5, 10).to(device)
print("--- Eager Run ---")
try:
res = fn(x)
res = res.cpu()
print("Eager executed successfully")
except Exception as e:
print(f"Eager Failed: {e}")
print("--- Inductor Run ---")
opt_fn = torch.compile(fn, backend="inductor", dynamic=True)
try:
res = opt_fn(x)
res = res.cpu()
print("Inductor executed successfully")
except Exception as e:
print(f"Inductor Failed: {e}")
On CPU
--- Eager Run ---
Eager Failed: index out of range in self
--- Inductor Run ---
Inductor executed successfully
On CUDA
--- Eager Run ---
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1478: indexSelectSmallIndex: block: [0,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1478: indexSelectSmallIndex: block: [0,0,0], thread: [1,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1478: indexSelectSmallIndex: block: [0,0,0], thread: [2,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1478: indexSelectSmallIndex: block: [0,0,0], thread: [3,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1478: indexSelectSmallIndex: block: [0,0,0], thread: [4,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1478: indexSelectSmallIndex: block: [0,0,0], thread: [5,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1478: indexSelectSmallIndex: block: [0,0,0], thread: [6,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1478: indexSelectSmallIndex: block: [0,0,0], thread: [7,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1478: indexSelectSmallIndex: block: [0,0,0], thread: [8,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:1478: indexSelectSmallIndex: block: [0,0,0], thread: [9,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Eager Failed: CUDA error: device-side assert triggered
Search for `cudaErrorAssert' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
single run
--- Inductor Run ---
Inductor executed successfully
On MPS
--- Eager Run ---
Eager executed successfully
--- Inductor Run ---
Inductor executed successfully
Versions
PyTorch version: 2.10.0.dev20251205
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: macOS 26.1 (arm64)
GCC version: Could not collect
Clang version: 17.0.0 (clang-1700.4.4.1)
CMake version: Could not collect
Libc version: N/A
Python version: 3.12.12 (main, Oct 28 2025, 11:52:25) [Clang 20.1.4 ] (64-bit runtime)
Python platform: macOS-26.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Apple M4
Versions of relevant libraries:
[pip3] Could not collect
[conda] Could not collect
cc @albanD @chauhang @penguinwu @malfet
🐛 Describe the bug
Reproduce script
On CPU
--- Eager Run --- Eager Failed: index out of range in self --- Inductor Run --- Inductor executed successfullyOn CUDA
single run
On MPS
Versions
PyTorch version: 2.10.0.dev20251205
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: macOS 26.1 (arm64)
GCC version: Could not collect
Clang version: 17.0.0 (clang-1700.4.4.1)
CMake version: Could not collect
Libc version: N/A
Python version: 3.12.12 (main, Oct 28 2025, 11:52:25) [Clang 20.1.4 ] (64-bit runtime)
Python platform: macOS-26.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Apple M4
Versions of relevant libraries:
[pip3] Could not collect
[conda] Could not collect
cc @albanD @chauhang @penguinwu @malfet