Skip to content

Add UUID-based cache key support for pre-grad custom passes#177403

Closed
frgossen wants to merge 5 commits intogh/frgossen/10/basefrom
gh/frgossen/10/head
Closed

Add UUID-based cache key support for pre-grad custom passes#177403
frgossen wants to merge 5 commits intogh/frgossen/10/basefrom
gh/frgossen/10/head

Conversation

@frgossen
Copy link
Contributor

@frgossen frgossen commented Mar 13, 2026

Stack from ghstack (oldest at bottom):

pre_grad_custom_pass was the only custom pass config without UUID-based
cache key integration. It was excluded from config serialization but not
handled specially via UUID extraction, so its effect was only captured
indirectly through the resulting FX graph. This meant two different
passes producing the same graph could incorrectly share a cache entry.

Align pre_grad_custom_pass with post-grad and joint passes: change its
type to CustomGraphPassType, add it to _cache_config_ignore_prefix (so
the UUID is extracted explicitly), include it in FxGraphHashDetails, and
validate it in _check_can_cache.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo @Lucaskabela

pre_grad_custom_pass was the only custom pass config without UUID-based
cache key integration. It was excluded from config serialization but not
handled specially via UUID extraction, so its effect was only captured
indirectly through the resulting FX graph. This meant two different
passes producing the same graph could incorrectly share a cache entry.

Align pre_grad_custom_pass with post-grad and joint passes: change its
type to CustomGraphPassType, add it to _cache_config_ignore_prefix (so
the UUID is extracted explicitly), include it in FxGraphHashDetails, and
validate it in _check_can_cache.

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Mar 13, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/177403

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 6361bb2 with merge base 6a461fe (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

frgossen added a commit that referenced this pull request Mar 13, 2026
pre_grad_custom_pass was the only custom pass config without UUID-based
cache key integration. It was excluded from config serialization but not
handled specially via UUID extraction, so its effect was only captured
indirectly through the resulting FX graph. This meant two different
passes producing the same graph could incorrectly share a cache entry.

Align pre_grad_custom_pass with post-grad and joint passes: change its
type to CustomGraphPassType, add it to _cache_config_ignore_prefix (so
the UUID is extracted explicitly), include it in FxGraphHashDetails, and
validate it in _check_can_cache.

ghstack-source-id: 09ddb53
Pull Request resolved: #177403
@pytorch-bot
Copy link

pytorch-bot bot commented Mar 13, 2026

This PR needs a release notes: label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

pre_grad_custom_pass was the only custom pass config without UUID-based
cache key integration. It was excluded from config serialization but not
handled specially via UUID extraction, so its effect was only captured
indirectly through the resulting FX graph. This meant two different
passes producing the same graph could incorrectly share a cache entry.

Align pre_grad_custom_pass with post-grad and joint passes: change its
type to CustomGraphPassType, add it to _cache_config_ignore_prefix (so
the UUID is extracted explicitly), include it in FxGraphHashDetails, and
validate it in _check_can_cache.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo Lucaskabela

[ghstack-poisoned]
pre_grad_custom_pass was the only custom pass config without UUID-based
cache key integration. It was excluded from config serialization but not
handled specially via UUID extraction, so its effect was only captured
indirectly through the resulting FX graph. This meant two different
passes producing the same graph could incorrectly share a cache entry.

Align pre_grad_custom_pass with post-grad and joint passes: change its
type to CustomGraphPassType, add it to _cache_config_ignore_prefix (so
the UUID is extracted explicitly), include it in FxGraphHashDetails, and
validate it in _check_can_cache.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo Lucaskabela

[ghstack-poisoned]
pre_grad_custom_pass was the only custom pass config without UUID-based
cache key integration. It was excluded from config serialization but not
handled specially via UUID extraction, so its effect was only captured
indirectly through the resulting FX graph. This meant two different
passes producing the same graph could incorrectly share a cache entry.

Align pre_grad_custom_pass with post-grad and joint passes: change its
type to CustomGraphPassType, add it to _cache_config_ignore_prefix (so
the UUID is extracted explicitly), include it in FxGraphHashDetails, and
validate it in _check_can_cache.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo Lucaskabela

[ghstack-poisoned]
pre_grad_custom_pass was the only custom pass config without UUID-based
cache key integration. It was excluded from config serialization but not
handled specially via UUID extraction, so its effect was only captured
indirectly through the resulting FX graph. This meant two different
passes producing the same graph could incorrectly share a cache entry.

Align pre_grad_custom_pass with post-grad and joint passes: change its
type to CustomGraphPassType, add it to _cache_config_ignore_prefix (so
the UUID is extracted explicitly), include it in FxGraphHashDetails, and
validate it in _check_can_cache.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo Lucaskabela

[ghstack-poisoned]
@pytorchmergebot
Copy link
Collaborator

Starting merge as part of PR stack under #177428

@pytorchmergebot
Copy link
Collaborator

Starting merge as part of PR stack under #177428

@pytorchmergebot
Copy link
Collaborator

Starting merge as part of PR stack under #177428

2 similar comments
@pytorchmergebot
Copy link
Collaborator

Starting merge as part of PR stack under #177428

@pytorchmergebot
Copy link
Collaborator

Starting merge as part of PR stack under #177428

pytorchmergebot pushed a commit that referenced this pull request Mar 18, 2026
pytorchmergebot pushed a commit that referenced this pull request Mar 18, 2026
…77429)

Add a pre_grad_pass_timing config ("early", "late", or "default") that
controls when pre-grad passes run relative to the AOT autograd cache lookup.

- "early": passes run before cache lookup, so they execute on every compile
(including cache hits) and the cache key reflects the already-transformed
graph.
- "late": passes run after cache lookup (only on cache miss); requires
custom passes to provide a UUID for the cache key.
- "default": automatically resolves to "late" when possible (no custom pass,
or a custom pass with a UUID), and falls back to "early" when the custom
pass has no UUID.

Explicitly setting "late" with a UUID-less custom pass now raises a
RuntimeError instead of silently bypassing the cache. The existing
test_pre_grad_passes_called_on_cache_miss_only test is renamed and
pinned to "late" timing, and new tests cover early timing, both default
timing branches, and the error case.

Pull Request resolved: #177429
Approved by: https://github.com/aorenste, https://github.com/zou3519
ghstack dependencies: #177397, #177403, #177428
ryanzhang22 pushed a commit to ryanzhang22/pytorch that referenced this pull request Mar 19, 2026
…177403)

pre_grad_custom_pass was the only custom pass config without UUID-based
cache key integration. It was excluded from config serialization but not
handled specially via UUID extraction, so its effect was only captured
indirectly through the resulting FX graph. This meant two different
passes producing the same graph could incorrectly share a cache entry.

Align pre_grad_custom_pass with post-grad and joint passes: change its
type to CustomGraphPassType, add it to _cache_config_ignore_prefix (so
the UUID is extracted explicitly), include it in FxGraphHashDetails, and
validate it in _check_can_cache.

Pull Request resolved: pytorch#177403
Approved by: https://github.com/aorenste, https://github.com/zou3519
ghstack dependencies: pytorch#177397
ryanzhang22 pushed a commit to ryanzhang22/pytorch that referenced this pull request Mar 19, 2026
ryanzhang22 pushed a commit to ryanzhang22/pytorch that referenced this pull request Mar 19, 2026
…torch#177429)

Add a pre_grad_pass_timing config ("early", "late", or "default") that
controls when pre-grad passes run relative to the AOT autograd cache lookup.

- "early": passes run before cache lookup, so they execute on every compile
(including cache hits) and the cache key reflects the already-transformed
graph.
- "late": passes run after cache lookup (only on cache miss); requires
custom passes to provide a UUID for the cache key.
- "default": automatically resolves to "late" when possible (no custom pass,
or a custom pass with a UUID), and falls back to "early" when the custom
pass has no UUID.

Explicitly setting "late" with a UUID-less custom pass now raises a
RuntimeError instead of silently bypassing the cache. The existing
test_pre_grad_passes_called_on_cache_miss_only test is renamed and
pinned to "late" timing, and new tests cover early timing, both default
timing branches, and the error case.

Pull Request resolved: pytorch#177429
Approved by: https://github.com/aorenste, https://github.com/zou3519
ghstack dependencies: pytorch#177397, pytorch#177403, pytorch#177428
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants