Skip to content

[DTensor] Update dtensor_dispatch_inplace instruction count benchmark#177074

Open
wconstab wants to merge 1 commit intogh/wconstab/563/basefrom
gh/wconstab/563/head
Open

[DTensor] Update dtensor_dispatch_inplace instruction count benchmark#177074
wconstab wants to merge 1 commit intogh/wconstab/563/basefrom
gh/wconstab/563/head

Conversation

@wconstab
Copy link
Contributor

@wconstab wconstab commented Mar 10, 2026

Stack from ghstack (oldest at bottom):

The expected count for dtensor_dispatch_inplace (add_) regressed from
56530 to 58710 (~3.9%) after #175795 registered single-dim strategies
for categorized pointwise ops. The regression is on the cached dispatch
path and comes from two sources: an extra dict lookup in the C++
get_runtime_schema_info_for_op (~890 instructions), and a Python heap
layout difference in the cached OutputSharding object (~2860
instructions). Both are minor and not particularly worth fixing. While
the regression is within the 10% CI noise margin, it's better to reset
the counts so we still have our full 10% margin for the future.

Authored with Claude.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @kadeng @chauhang @amjames @Lucaskabela @jataylo

The expected count for dtensor_dispatch_inplace (add_) regressed from
56530 to 58710 (~3.9%) after #175795 registered single-dim strategies
for categorized pointwise ops. The regression is on the cached dispatch
path and comes from two sources: an extra dict lookup in the C++
get_runtime_schema_info_for_op (~890 instructions), and a Python heap
layout difference in the cached OutputSharding object (~2860
instructions). Both are minor and not particularly worth fixing. While
the regression is within the 10% CI noise margin, it's better to reset
the counts so we still have our full 10% margin for the future.

Authored with Claude.

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Mar 10, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/177074

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 3 Unrelated Failures

As of commit 8cf854e with merge base 3f60bc4 (image):

NEW FAILURE - The following job has failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

wconstab added a commit that referenced this pull request Mar 10, 2026
The expected count for dtensor_dispatch_inplace (add_) regressed from
56530 to 58710 (~3.9%) after #175795 registered single-dim strategies
for categorized pointwise ops. The regression is on the cached dispatch
path and comes from two sources: an extra dict lookup in the C++
get_runtime_schema_info_for_op (~890 instructions), and a Python heap
layout difference in the cached OutputSharding object (~2860
instructions). Both are minor and not particularly worth fixing. While
the regression is within the 10% CI noise margin, it's better to reset
the counts so we still have our full 10% margin for the future.

Authored with Claude.

ghstack-source-id: 1e59629
Pull Request resolved: #177074
@wconstab wconstab requested a review from anshul-si March 13, 2026 13:27
Copy link
Contributor

@anshul-si anshul-si left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wconstab
Copy link
Contributor Author

@pytorchbot merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants