[inductor] fix allocation with deterministic guard#174718
[inductor] fix allocation with deterministic guard#174718kshitij12345 wants to merge 21 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/174718
Note: Links to docs will display an error until the docs builds have been completed. ❌ 8 New Failures, 4 Unrelated FailuresAs of commit 0437834 with merge base ee154ef ( NEW FAILURES - The following jobs have failed:
FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
torch/_inductor/codegen/wrapper.py
Outdated
| @@ -3240,6 +3240,23 @@ def make_allocation( | |||
| f"device='{device.type}', " | |||
| f"name='{name}')" | |||
| ) | |||
| # With torch.utils.deterministic.fill_uninitialized_memory, we fill the buffer with NaN or MAX_INT | |||
| elif ( | |||
There was a problem hiding this comment.
could you add a condition in elif device.type in ("cpu", "cuda", "xpu", "mtia"): here instead, so we fallback to default empty_strided? presumably that would get filled with nan
There was a problem hiding this comment.
This works, thanks!
|
@kshitij12345 , I think changes look good. |
| elif device.type == "cpu" and is_pinned and not is_deterministic: | ||
| out = ( | ||
| f"{name} = empty_strided_cpu_pinned(" | ||
| f"{codegen_allocation_shape_tuple}, " | ||
| f"{codegen_stride_tuple}, " | ||
| f"{dtype})" |
There was a problem hiding this comment.
we shouldn't be skipping pinned.
There was a problem hiding this comment.
We only skip the pin_memory fast path if deterministic mode is active (otherwise we still take the fast path). Without this, the following snippet fails.
import torch
from torch.testing._internal.common_utils import DeterministicGuard
def fn():
return torch.empty(
4, 4, device="cpu", dtype=torch.float32, pin_memory=True
)
cfunc = torch.compile(fn, fullgraph=True)
with DeterministicGuard(True, fill_uninitialized_memory=True):
eager = fn()
compiled = cfunc()
torch.testing.assert_close(eager, compiled, equal_nan=True)|
@pytorchbot rebase -b viable/strict |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
6c679f2 to
26a6731
Compare
Fixes pytorch#174386 Pull Request resolved: pytorch#174718 Approved by: https://github.com/eellison, https://github.com/mlazos
|
@pytorchbot revert -c "no signal" |
|
❌ 🤖 pytorchbot command failed: Try |
|
@pytorchbot revert -c "nosignal" |
|
❌ 🤖 pytorchbot command failed: Try |
|
@pytorchbot revert -m "inadvertently causes fill_ on inductor allocations, not just user empty_strided" -c nosignal |
|
@pytorchbot successfully started a revert job. Check the current status here. |
This reverts commit 6d12a21. Reverted #174718 on behalf of https://github.com/eellison due to inadvertently causes fill_ on inductor allocations, not just user empty_strided ([comment](#174718 (comment)))
|
@kshitij12345 your PR has been successfully reverted. |
|
Sorry for the revert. This is inadvertently extending empty Nan Filling to inductor internal allocations, not just user |
|
Thanks for the repro, will have a look. |
|
EDIT: Not sure about the correctness of approach 1, will check. claude suggested 2 approaches - Have used the first as it is more targeted (and similar to current fix). Second approach seems more correct but might require changes to IR which might have a larger radius. Wanted to know your thoughts. Thank you! # Skipping deterministic fill for op output buffers
## Problem
When `torch.use_deterministic_algorithms(True)` is enabled, inductor fills every
allocated buffer with NaN/MAX_INT — including output buffers of operations like
`a + b` that are immediately overwritten by their kernel. This adds an unnecessary
memset kernel per compiled op.
## Approach 1: Tag at codegen time via scheduler node type
The `AllocateLine` in `wrapper.py` already has access to
`V.graph.scheduler.current_node`. We check whether the node is a
`NopKernelSchedulerNode` (which corresponds to `torch.empty` and friends) and
store `is_uninitialized` on the `AllocateLine`. This flag is threaded through
`make_buffer_allocation` → `make_allocation` to gate the deterministic fill.
The change is entirely in codegen — four files touched (`wrapper.py`,
`cpp_wrapper_cpu.py`, `cpp_wrapper_cpu_array_ref.py`, and the test file).
## Approach 2: Add `is_uninitialized` to IR `ComputedBuffer`
An alternative is to set `is_uninitialized` on the IR node itself (e.g.,
`ComputedBuffer` or `Buffer`). The flag would be set during lowering —
`torch.empty` would produce a buffer with `is_uninitialized=True`, while ops
like `add` would default to `False`. The flag then propagates through scheduling
and into codegen where `AllocateLine` reads it from the buffer node.
This pushes the semantic distinction earlier in the pipeline (lowering/IR rather
than codegen) and makes the property available to any pass that inspects the IR,
not just the allocation codegen path.
|
…regression The previous default of True caused make_allocation callers that don't go through AllocateLine (e.g. memory_planning pool allocations) to incorrectly trigger the cpp_wrapper deterministic error. Only AllocateLine.codegen should opt in with True for NopKernelSchedulerNode buffers.
|
@pytorchbot rebase -b viable/strict |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Rebase failed due to Command Raised by https://github.com/pytorch/pytorch/actions/runs/23200777095 |
Fixes pytorch#174386 Pull Request resolved: pytorch#174718 Approved by: https://github.com/eellison, https://github.com/mlazos
…74718)" This reverts commit 6d12a21. Reverted pytorch#174718 on behalf of https://github.com/eellison due to inadvertently causes fill_ on inductor allocations, not just user empty_strided ([comment](pytorch#174718 (comment)))
Fixes #174386
Co-authored with claude
cc @mruberry @kurtamohler @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo