SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats - PyTorch compile fails with Python 3.12

### 🐛 Describe the bug

`torch.compile` fails compiling a simple model in Python 3.12, with error `torch._inductor.exc.InductorError: SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats`.

The model works fine without compilation, and also compiles and works fine when run under Python 3.13.

This seems to be an issue with changes to Triton: others have reported similar issues in similar uses cases:
* https://github.com/triton-lang/triton/issues/5529
* https://stackoverflow.com/questions/70705404/systemerror-py-ssize-t-clean-macro-must-be-defined-for-formats

Recent versions of Python have [changed their C API behavior](https://docs.python.org/3/c-api/arg.html#strings-and-buffers) re `PY_SSIZE_T_CLEAN`:
> Note On Python 3.12 and older, the macro `PY_SSIZE_T_CLEAN` must be defined before including `Python.h` to use all `#` variants of formats (`s#`, `y#`, etc.) explained below. This is not necessary on Python 3.13 and later.

Possibly relevant:
* https://discuss.python.org/t/c-stable-api-py-ssize-t-clean-error/80159/3


### Error logs

```
W0516 11:23:24.382000 4559 torch/_inductor/utils.py:1250] [0/0_1] Not enough SMs to use max_autotune_gemm mode

Traceback (most recent call last):
  File "/myproj/src/myproj/models/transformer.py", line 191, in <module>
    trainer.fit(model, datamodule=sdm, ckpt_path=ckpt_path) #type:ignore
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/lightning/pytorch/trainer/trainer.py", line 561, in fit
    call._call_and_handle_interrupt(
  File "/myproj/venv/lib/python3.12/site-packages/lightning/pytorch/trainer/call.py", line 48, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/lightning/pytorch/trainer/trainer.py", line 599, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/myproj/venv/lib/python3.12/site-packages/lightning/pytorch/trainer/trainer.py", line 1012, in _run
    results = self._run_stage()
              ^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/lightning/pytorch/trainer/trainer.py", line 1054, in _run_stage
    self._run_sanity_check()
  File "/myproj/venv/lib/python3.12/site-packages/lightning/pytorch/trainer/trainer.py", line 1083, in _run_sanity_check
    val_loop.run()
  File "/myproj/venv/lib/python3.12/site-packages/lightning/pytorch/loops/utilities.py", line 179, in _decorator
    return loop_run(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 145, in run
    self._evaluation_step(batch, batch_idx, dataloader_idx, dataloader_iter)
  File "/myproj/venv/lib/python3.12/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 437, in _evaluation_step
    output = call._call_strategy_hook(trainer, hook_name, *step_args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/lightning/pytorch/trainer/call.py", line 328, in _call_strategy_hook
    output = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/lightning/pytorch/strategies/strategy.py", line 412, in validation_step
    return self.lightning_module.validation_step(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 663, in _fn
    raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 655, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1432, in __call__
    return self._torchdynamo_orig_callable(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1213, in __call__
    result = self._inner_convert(
             ^^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 598, in __call__
    return _compile(
           ^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1059, in _compile
    guarded_code = compile_inner(code, one_graph, hooks, transform)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_utils_internal.py", line 97, in wrapper_function
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 761, in compile_inner
    return _compile_inner(code, one_graph, hooks, transform)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 797, in _compile_inner
    out_code = transform_code_object(code, transform)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_dynamo/bytecode_transformation.py", line 1422, in transform_code_object
    transformations(instructions, code_options)
  File "/myproj/venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 257, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 715, in transform
    tracer.run()
  File "/myproj/venv/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 3500, in run
    super().run()
  File "/myproj/venv/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1337, in run
    while self.step():
          ^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1246, in step
    self.dispatch_table[inst.opcode](self, inst)
  File "/myproj/venv/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 817, in wrapper
    return handle_graph_break(self, inst, speculation.reason)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 867, in handle_graph_break
    self.output.compile_subgraph(self, reason=reason)
  File "/myproj/venv/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 1179, in compile_subgraph
    self.compile_and_call_fx_graph(
  File "/myproj/venv/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 1437, in compile_and_call_fx_graph
    compiled_fn = self.call_user_compiler(gm)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 1487, in call_user_compiler
    return self._call_user_compiler(gm)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 1519, in _call_user_compiler
    compiled_fn = compiler_fn(gm, self.example_inputs())
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 150, in __call__
    compiled_gm = compiler_fn(gm, example_inputs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/__init__.py", line 2347, in __call__
    return compile_fx(model_, inputs_, config_patches=self.config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2101, in compile_fx
    raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2089, in compile_fx
    return aot_autograd(
           ^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 101, in __call__
    cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1160, in aot_module_simplified
    compiled_fn = AOTAutogradCache.load(
                  ^^^^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/autograd_cache.py", line 775, in load
    compiled_fn = dispatch_and_compile()
                  ^^^^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1145, in dispatch_and_compile
    compiled_fn, _ = create_aot_dispatcher_function(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 570, in create_aot_dispatcher_function
    return _create_aot_dispatcher_function(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 820, in _create_aot_dispatcher_function
    compiled_fn, fw_metadata = compiler_fn(
                               ^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 219, in aot_dispatch_base
    compiled_fw = compiler(fw_module, updated_flat_args)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 479, in __call__
    return self.compiler_fn(gm, example_inputs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1944, in fw_compiler_base
    return inner_compile(
           ^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 628, in compile_fx_inner
    return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_dynamo/repro/after_aot.py", line 124, in debug_wrapper
    inner_compiled_fn = compiler_fn(gm, example_inputs)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 760, in _compile_fx_inner
    raise InductorError(e, currentframe()).with_traceback(
  File "/myproj/venv/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 745, in _compile_fx_inner
    mb_compiled_graph = fx_codegen_and_compile(
                        ^^^^^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1295, in fx_codegen_and_compile
    return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1197, in codegen_and_compile
    compiled_fn = graph.compile_to_module().call
                  ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2083, in compile_to_module
    return self._compile_to_module()
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2130, in _compile_to_module
    mod = PyCodeCache.load_by_key_path(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_inductor/codecache.py", line 2747, in load_by_key_path
    mod = _reload_python_module(key, path)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_inductor/runtime/compile_tasks.py", line 36, in _reload_python_module
    exec(code, mod.__dict__, mod.__dict__)
  File "/tmp/torchinductor_tcr/4c/c4cwgdwpedzfrpfhwj3j6mwbmdjj6bnduqspnvyub76mmgmqcvez.py", line 1051, in <module>
    async_compile.wait(globals())
  File "/myproj/venv/lib/python3.12/site-packages/torch/_inductor/async_compile.py", line 424, in wait
    self._wait_futures(scope)
  File "/myproj/venv/lib/python3.12/site-packages/torch/_inductor/async_compile.py", line 445, in _wait_futures
    scope[key] = result.result()
                 ^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_inductor/codecache.py", line 3224, in result
    return self.result_fn()
           ^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_inductor/async_compile.py", line 325, in get_result
    kernel.precompile(
  File "/myproj/venv/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 277, in precompile
    self._make_launchers()
  File "/myproj/venv/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 434, in _make_launchers
    launchers.append(result.make_launcher())
                     ^^^^^^^^^^^^^^^^^^^^^^
  File "/myproj/venv/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 1091, in make_launcher
    binary._init_handles()
  File "/myproj/venv/lib/python3.12/site-packages/triton/compiler/compiler.py", line 408, in _init_handles
    self.module, self.function, self.n_regs, self.n_spills = driver.active.utils.load_binary(
                                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch._inductor.exc.InductorError: SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats

```

### Versions

# Python 3.12 **FAILS**

```
Collecting environment information...
PyTorch version: 2.7.0+cu128
Is debug build: False
CUDA used to build PyTorch: 12.8
ROCM used to build PyTorch: N/A

OS: Ubuntu 24.04 LTS (x86_64)
GCC version: (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.39

Python version: 3.12.3 (main, Feb  4 2025, 14:48:35) [GCC 13.3.0] (64-bit runtime)
Python platform: Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.39
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA RTX 4000 Ada Generation Laptop GPU
Nvidia driver version: 538.92
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        39 bits physical, 48 bits virtual
Byte Order:                           Little Endian
CPU(s):                               32
On-line CPU(s) list:                  0-31
Vendor ID:                            GenuineIntel
Model name:                           13th Gen Intel(R) Core(TM) i9-13980HX
CPU family:                           6
Model:                                183
Thread(s) per core:                   2
Core(s) per socket:                   16
Socket(s):                            1
Stepping:                             1
BogoMIPS:                             4838.41
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves avx_vnni umip waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize flush_l1d arch_capabilities
Virtualization:                       VT-x
Hypervisor vendor:                    Microsoft
Virtualization type:                  full
L1d cache:                            768 KiB (16 instances)
L1i cache:                            512 KiB (16 instances)
L2 cache:                             32 MiB (16 instances)
L3 cache:                             36 MiB (1 instance)
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Vulnerable: No microcode
Vulnerability Retbleed:               Mitigation; Enhanced IBRS
Vulnerability Spec rstack overflow:   Not affected
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI BHI_DIS_S
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

Versions of relevant libraries:
[pip3] numpy==2.2.5
[pip3] nvidia-cublas-cu12==12.8.3.14
[pip3] nvidia-cuda-cupti-cu12==12.8.57
[pip3] nvidia-cuda-nvrtc-cu12==12.8.61
[pip3] nvidia-cuda-runtime-cu12==12.8.57
[pip3] nvidia-cudnn-cu12==9.7.1.26
[pip3] nvidia-cufft-cu12==11.3.3.41
[pip3] nvidia-curand-cu12==10.3.9.55
[pip3] nvidia-cusolver-cu12==11.7.2.55
[pip3] nvidia-cusparse-cu12==12.5.7.53
[pip3] nvidia-cusparselt-cu12==0.6.3
[pip3] nvidia-nccl-cu12==2.26.2
[pip3] nvidia-nvjitlink-cu12==12.8.61
[pip3] nvidia-nvtx-cu12==12.8.55
[pip3] pytorch-lightning==2.5.1.post0
[pip3] torch==2.7.0+cu128
[pip3] torchmetrics==1.7.1
[pip3] triton==3.3.0
[conda] No relevant packages
```

# Python 3.13 **WORKS FINE**
```
Collecting environment information...
PyTorch version: 2.7.0+cu128
Is debug build: False
CUDA used to build PyTorch: 12.8
ROCM used to build PyTorch: N/A

OS: Ubuntu 24.04 LTS (x86_64)
GCC version: (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.39

Python version: 3.13.3 (main, Apr  9 2025, 08:55:03) [GCC 13.3.0] (64-bit runtime)
Python platform: Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.39
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA RTX 4000 Ada Generation Laptop GPU
Nvidia driver version: 538.92
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        39 bits physical, 48 bits virtual
Byte Order:                           Little Endian
CPU(s):                               32
On-line CPU(s) list:                  0-31
Vendor ID:                            GenuineIntel
Model name:                           13th Gen Intel(R) Core(TM) i9-13980HX
CPU family:                           6
Model:                                183
Thread(s) per core:                   2
Core(s) per socket:                   16
Socket(s):                            1
Stepping:                             1
BogoMIPS:                             4838.41
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves avx_vnni umip waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize flush_l1d arch_capabilities
Virtualization:                       VT-x
Hypervisor vendor:                    Microsoft
Virtualization type:                  full
L1d cache:                            768 KiB (16 instances)
L1i cache:                            512 KiB (16 instances)
L2 cache:                             32 MiB (16 instances)
L3 cache:                             36 MiB (1 instance)
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Vulnerable: No microcode
Vulnerability Retbleed:               Mitigation; Enhanced IBRS
Vulnerability Spec rstack overflow:   Not affected
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI BHI_DIS_S
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

Versions of relevant libraries:
[pip3] numpy==2.2.5
[pip3] nvidia-cublas-cu12==12.8.3.14
[pip3] nvidia-cuda-cupti-cu12==12.8.57
[pip3] nvidia-cuda-nvrtc-cu12==12.8.61
[pip3] nvidia-cuda-runtime-cu12==12.8.57
[pip3] nvidia-cudnn-cu12==9.7.1.26
[pip3] nvidia-cufft-cu12==11.3.3.41
[pip3] nvidia-curand-cu12==10.3.9.55
[pip3] nvidia-cusolver-cu12==11.7.2.55
[pip3] nvidia-cusparse-cu12==12.5.7.53
[pip3] nvidia-cusparselt-cu12==0.6.3
[pip3] nvidia-nccl-cu12==2.26.2
[pip3] nvidia-nvjitlink-cu12==12.8.61
[pip3] nvidia-nvtx-cu12==12.8.55
[pip3] pytorch-lightning==2.5.1.post0
[pip3] torch==2.7.0+cu128
[pip3] torchmetrics==1.7.1
[pip3] triton==3.3.0
[conda] No relevant packages
```

cc @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @aakhundov @bertmaher @int3 @davidberard98 @nmacchioni @embg @peterbell10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats - PyTorch compile fails with Python 3.12 #153737

🐛 Describe the bug

Error logs

Versions

Python 3.12 FAILS

Python 3.13 WORKS FINE

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats - PyTorch compile fails with Python 3.12 #153737

Description

🐛 Describe the bug

Error logs

Versions

Python 3.12 FAILS

Python 3.13 WORKS FINE

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions