Skip to content

[inductor] fix allocation with deterministic guard#174718

Open
kshitij12345 wants to merge 21 commits intopytorch:mainfrom
kshitij12345:inductor-empty-deterministic
Open

[inductor] fix allocation with deterministic guard#174718
kshitij12345 wants to merge 21 commits intopytorch:mainfrom
kshitij12345:inductor-empty-deterministic

Conversation

@kshitij12345
Copy link
Copy Markdown
Collaborator

@kshitij12345 kshitij12345 commented Feb 10, 2026

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Feb 10, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/174718

Note: Links to docs will display an error until the docs builds have been completed.

❌ 8 New Failures, 4 Unrelated Failures

As of commit 0437834 with merge base ee154ef (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Feb 10, 2026

This PR needs a release notes: label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@bdhirsh
Copy link
Copy Markdown
Collaborator

bdhirsh commented Feb 10, 2026

cc @eellison / @mlazos ?

@bdhirsh bdhirsh requested review from eellison and mlazos February 10, 2026 23:20
@bdhirsh bdhirsh added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Feb 10, 2026
@@ -3240,6 +3240,23 @@ def make_allocation(
f"device='{device.type}', "
f"name='{name}')"
)
# With torch.utils.deterministic.fill_uninitialized_memory, we fill the buffer with NaN or MAX_INT
elif (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add a condition in elif device.type in ("cpu", "cuda", "xpu", "mtia"): here instead, so we fallback to default empty_strided? presumably that would get filled with nan

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works, thanks!

@krastogi-in
Copy link
Copy Markdown
Contributor

@kshitij12345 , I think changes look good.

Comment on lines +3249 to 3254
elif device.type == "cpu" and is_pinned and not is_deterministic:
out = (
f"{name} = empty_strided_cpu_pinned("
f"{codegen_allocation_shape_tuple}, "
f"{codegen_stride_tuple}, "
f"{dtype})"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't be skipping pinned.

Copy link
Copy Markdown
Collaborator Author

@kshitij12345 kshitij12345 Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only skip the pin_memory fast path if deterministic mode is active (otherwise we still take the fast path). Without this, the following snippet fails.

import torch

from torch.testing._internal.common_utils import DeterministicGuard

def fn():
    return torch.empty(
        4, 4, device="cpu", dtype=torch.float32, pin_memory=True
    )

cfunc = torch.compile(fn, fullgraph=True)

with DeterministicGuard(True, fill_uninitialized_memory=True):
    eager = fn()
    compiled = cfunc()
    torch.testing.assert_close(eager, compiled, equal_nan=True)

@kshitij12345
Copy link
Copy Markdown
Collaborator Author

@pytorchbot rebase -b viable/strict

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Successfully rebased inductor-empty-deterministic onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout inductor-empty-deterministic && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the inductor-empty-deterministic branch from 6c679f2 to 26a6731 Compare February 12, 2026 01:08
@eellison
Copy link
Copy Markdown
Contributor

@pytorchbot revert -c "no signal"

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Mar 12, 2026

❌ 🤖 pytorchbot command failed:

@pytorchbot revert: error: argument -c/--classification: invalid choice: 'no signal' (choose from 'nosignal', 'ignoredsignal', 'landrace', 'weird', 'ghfirst', 'autorevert')

usage: @pytorchbot revert -m MESSAGE -c
                          {nosignal,ignoredsignal,landrace,weird,ghfirst,autorevert}

Try @pytorchbot --help for more info.

@eellison
Copy link
Copy Markdown
Contributor

@pytorchbot revert -c "nosignal"

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Mar 12, 2026

❌ 🤖 pytorchbot command failed:

@pytorchbot revert: error: the following arguments are required: -m/--message

usage: @pytorchbot revert -m MESSAGE -c
                          {nosignal,ignoredsignal,landrace,weird,ghfirst,autorevert}

Try @pytorchbot --help for more info.

@eellison
Copy link
Copy Markdown
Contributor

@pytorchbot revert -m "inadvertently causes fill_ on inductor allocations, not just user empty_strided" -c nosignal

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot added a commit that referenced this pull request Mar 12, 2026
This reverts commit 6d12a21.

Reverted #174718 on behalf of https://github.com/eellison due to inadvertently causes fill_ on inductor allocations, not just user empty_strided ([comment](#174718 (comment)))
@pytorchmergebot
Copy link
Copy Markdown
Collaborator

@kshitij12345 your PR has been successfully reverted.

@pytorchmergebot pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Mar 12, 2026
@eellison
Copy link
Copy Markdown
Contributor

eellison commented Mar 12, 2026

Sorry for the revert. This is inadvertently extending empty Nan Filling to inductor internal allocations, not just user empty kernels. See https://gist.github.com/eellison/8b5acce23382e7a3dec571302c7ee678. For a + b, eager doesn't fill the output. Whereas with inductor, we would be with this pr. I've also filed #177269.

@kshitij12345
Copy link
Copy Markdown
Collaborator Author

Thanks for the repro, will have a look.

@pytorch-bot pytorch-bot bot added ciflow/torchtitan Run TorchTitan integration tests release notes: inductor (aoti) labels Mar 17, 2026
@kshitij12345
Copy link
Copy Markdown
Collaborator Author

kshitij12345 commented Mar 17, 2026

EDIT: Not sure about the correctness of approach 1, will check.

claude suggested 2 approaches -

Have used the first as it is more targeted (and similar to current fix). Second approach seems more correct but might require changes to IR which might have a larger radius. Wanted to know your thoughts.

Thank you!

# Skipping deterministic fill for op output buffers

## Problem

When `torch.use_deterministic_algorithms(True)` is enabled, inductor fills every
allocated buffer with NaN/MAX_INT — including output buffers of operations like
`a + b` that are immediately overwritten by their kernel. This adds an unnecessary
memset kernel per compiled op.

## Approach 1: Tag at codegen time via scheduler node type

The `AllocateLine` in `wrapper.py` already has access to
`V.graph.scheduler.current_node`. We check whether the node is a
`NopKernelSchedulerNode` (which corresponds to `torch.empty` and friends) and
store `is_uninitialized` on the `AllocateLine`. This flag is threaded through
`make_buffer_allocation``make_allocation` to gate the deterministic fill.

The change is entirely in codegen — four files touched (`wrapper.py`,
`cpp_wrapper_cpu.py`, `cpp_wrapper_cpu_array_ref.py`, and the test file).

## Approach 2: Add `is_uninitialized` to IR `ComputedBuffer`

An alternative is to set `is_uninitialized` on the IR node itself (e.g.,
`ComputedBuffer` or `Buffer`). The flag would be set during lowering —
`torch.empty` would produce a buffer with `is_uninitialized=True`, while ops
like `add` would default to `False`. The flag then propagates through scheduling
and into codegen where `AllocateLine` reads it from the buffer node.

This pushes the semantic distinction earlier in the pipeline (lowering/IR rather
than codegen) and makes the property available to any pass that inspects the IR,
not just the allocation codegen path.

…regression

  The previous default of True caused make_allocation callers that don't
  go through AllocateLine (e.g. memory_planning pool allocations) to
  incorrectly trigger the cpp_wrapper deterministic error. Only
  AllocateLine.codegen should opt in with True for NopKernelSchedulerNode
  buffers.
@kshitij12345
Copy link
Copy Markdown
Collaborator Author

@pytorchbot rebase -b viable/strict

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/viable/strict pull/174718/head returned non-zero exit code 1

Rebasing (1/17)
Rebasing (2/17)
Rebasing (3/17)
Rebasing (4/17)
Rebasing (5/17)
Rebasing (6/17)
Auto-merging test/inductor/test_torchinductor.py
CONFLICT (content): Merge conflict in test/inductor/test_torchinductor.py
Auto-merging torch/_inductor/codegen/wrapper.py
CONFLICT (content): Merge conflict in torch/_inductor/codegen/wrapper.py
error: could not apply c09c0ed167f... [inductor] fix allocation with deterministic guard
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Could not apply c09c0ed167f... # [inductor] fix allocation with deterministic guard

Raised by https://github.com/pytorch/pytorch/actions/runs/23200777095

EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
…74718)"

This reverts commit 6d12a21.

Reverted pytorch#174718 on behalf of https://github.com/eellison due to inadvertently causes fill_ on inductor allocations, not just user empty_strided ([comment](pytorch#174718 (comment)))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/inductor ciflow/torchtitan Run TorchTitan integration tests ciflow/trunk Trigger trunk jobs on your pull request Merged module: determinism module: inductor open source release notes: inductor (aoti) Reverted topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

torch.compile ignores torch.use_deterministic_algorithms(True) on empty_like

7 participants