[inductor][bucketing] Fx collectives bucketing of multiple dtypes by IvanKobzarev · Pull Request #162470 · pytorch/pytorch

IvanKobzarev · 2025-09-09T08:09:59Z

Stack from ghstack (oldest at bottom):

-> [inductor][bucketing] Fx collectives bucketing of multiple dtypes #162470

Bucketing of multiple dtypes to be processed in one bucketed collective.

First target is to bucket bf16 and f32, but already can be used with other dtypes.

For now multidtype bucketing is only supported with "custom_ops" mode.
Non custom_ops needs additional work on inductor side.

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @ezyang

[ghstack-poisoned]

pytorch-bot · 2025-09-09T08:10:02Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162470

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 00ecea8 with merge base 9272437 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta ezyang msaroufim voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

ghstack-source-id: 9ad19c7 Pull Request resolved: #162470

Lowering for aten._to_copy fails on fallback and ``` File "/data/users/ivankobzarev/h/pytorch/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 File "/data/users/ivankobzarev/h/pytorch/torch/_inductor/compile_fx.py", line 990, in _compile_fx_inner raise InductorError(e, currentframe()).with_traceback( File "/data/users/ivankobzarev/h/pytorch/torch/_inductor/compile_fx.py", line 974, in _compile_fx_inner mb_compiled_graph = fx_codegen_and_compile( File "/data/users/ivankobzarev/h/pytorch/torch/_inductor/compile_fx.py", line 1695, in fx_codegen_and_compile return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) File "/data/users/ivankobzarev/h/pytorch/torch/_inductor/compile_fx.py", line 1420, in codegen_and_compile graph.run(*example_inputs) File "/data/users/ivankobzarev/h/pytorch/torch/_inductor/graph.py", line 937, in run return super().run(*args) File "/data/users/ivankobzarev/h/pytorch/torch/fx/interpreter.py", line 174, in run self.env[node] = self.run_node(node) File "/data/users/ivankobzarev/h/pytorch/torch/_inductor/graph.py", line 1624, in run_node result = super().run_node(n) File "/data/users/ivankobzarev/h/pytorch/torch/fx/interpreter.py", line 256, in run_node return getattr(self, n.op)(n.target, args, kwargs) File "/data/users/ivankobzarev/h/pytorch/torch/_inductor/graph.py", line 1233, in call_function make_fallback(target, layout_constraint=decided_constraint) File "/data/users/ivankobzarev/h/pytorch/torch/_inductor/lowering.py", line 2080, in make_fallback assert op not in decompositions or override_decomp, ( torch._inductor.exc.InductorError: AssertionError: both a fallback and a decomp for same op: aten._to_copy.default ``` cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta ezyang msaroufim voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

ghstack-source-id: e47b03e Pull Request resolved: #162470

Lowering for aten._to_copy fails on fallback and ``` File "/data/users/ivankobzarev/h/pytorch/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 File "/data/users/ivankobzarev/h/pytorch/torch/_inductor/compile_fx.py", line 990, in _compile_fx_inner raise InductorError(e, currentframe()).with_traceback( File "/data/users/ivankobzarev/h/pytorch/torch/_inductor/compile_fx.py", line 974, in _compile_fx_inner mb_compiled_graph = fx_codegen_and_compile( File "/data/users/ivankobzarev/h/pytorch/torch/_inductor/compile_fx.py", line 1695, in fx_codegen_and_compile return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) File "/data/users/ivankobzarev/h/pytorch/torch/_inductor/compile_fx.py", line 1420, in codegen_and_compile graph.run(*example_inputs) File "/data/users/ivankobzarev/h/pytorch/torch/_inductor/graph.py", line 937, in run return super().run(*args) File "/data/users/ivankobzarev/h/pytorch/torch/fx/interpreter.py", line 174, in run self.env[node] = self.run_node(node) File "/data/users/ivankobzarev/h/pytorch/torch/_inductor/graph.py", line 1624, in run_node result = super().run_node(n) File "/data/users/ivankobzarev/h/pytorch/torch/fx/interpreter.py", line 256, in run_node return getattr(self, n.op)(n.target, args, kwargs) File "/data/users/ivankobzarev/h/pytorch/torch/_inductor/graph.py", line 1233, in call_function make_fallback(target, layout_constraint=decided_constraint) File "/data/users/ivankobzarev/h/pytorch/torch/_inductor/lowering.py", line 2080, in make_fallback assert op not in decompositions or override_decomp, ( torch._inductor.exc.InductorError: AssertionError: both a fallback and a decomp for same op: aten._to_copy.default ``` cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta ezyang msaroufim voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

ghstack-source-id: b1f5deb Pull Request resolved: #162470

eellison

now that the other prs have landed - mind rebasing ?

torch/_inductor/fx_passes/bucketing.py

… dtypes" cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta ezyang msaroufim voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

ghstack-source-id: e14ac69 Pull Request resolved: #162470

… dtypes" Bucketing of multiple dtypes to be processed in one bucketed collective. First target is to bucket bf16 and f32, but already can be used with other dtypes. For now multidtype bucketing is only supported with "custom_ops" mode. Non custom_ops needs additional work on inductor side. cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta ezyang msaroufim voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

ghstack-source-id: 4431db2 Pull Request resolved: #162470

… dtypes" Bucketing of multiple dtypes to be processed in one bucketed collective. First target is to bucket bf16 and f32, but already can be used with other dtypes. For now multidtype bucketing is only supported with "custom_ops" mode. Non custom_ops needs additional work on inductor side. cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta ezyang msaroufim voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

ghstack-source-id: fb2bcab Pull Request resolved: #162470

eellison

Looks good !

Only blocking question is about reduce scatter multi dtype

eellison · 2025-10-03T14:05:07Z

test/distributed/test_inductor_collectives.py

+
+    @unittest.skipIf(not HAS_GPU, "Inductor+gpu needs triton and recent GPU arch")
+    @unittest.skipIf(not SM80OrLater, "bfloat16")
+    @parametrize("bucket_mode", ["all_custom_ops_multidtype"])


nit: can we update the config of bucket_mode : Union[literal[....]]]` so we know what the options are ?

We can. It will be a bit long sequence, as we have 2**num_options (custom_ops +/ multidtype +/- all/fsdp). We should reduce it to 4 for now.

eellison · 2025-10-03T14:06:34Z

torch/_inductor/fx_passes/bucketing.py

+    if s == OrderedSet([torch.bfloat16, torch.float]):  # type: ignore[attr-defined]
+        return torch.bfloat16  # type: ignore[attr-defined]


reason for this special case ? we could just always choose the lowest itemsize dtype.

Yes, we can pick lowert dtype.

eellison · 2025-10-03T14:07:33Z

torch/_inductor/fx_passes/bucketing.py

    gm: torch.fx.GraphModule,
    bucket_cap_mb_by_bucket_idx: Callable[[int], float],
    filter_wait_node: Optional[Callable[[torch.fx.Node], bool]] = None,
+    mode: Optional[str] = None,


same question - add literal options ?

eellison · 2025-10-03T14:07:46Z

torch/_inductor/fx_passes/bucketing.py

    gm: torch.fx.GraphModule,
    bucket_cap_mb_by_bucket_idx: Callable[[int], float],
    filter_wait_node: Optional[Callable[[torch.fx.Node], bool]] = None,
+    mode: Optional[str] = None,


eellison · 2025-10-03T14:08:42Z

torch/_inductor/fx_passes/bucketing.py

    """

+    group_key_fn = (
+        _rs_group_key_multidtype if mode and "multidtype" in mode else _rs_group_key


I don't think we can do multidtype for reduce scatter, since nccl is actually doing a reduction.

Yes, agree, we can do it only with casts and numerics changes. I will remove this option.
We can only do joint uppermost dtype, but that is not what we want :)

eellison · 2025-10-03T14:12:30Z

torch/_inductor/fx_passes/bucketing.py

    rs_ins: list[torch.Tensor],
    group_size: int,
+    dtype: torch.dtype,  # type: ignore[name-defined]
+    numel_mults: list[int],


nit: not sure we need this if we always have the assumption we will view as lowest bitwidth dtype

Custom ops do not support List[dtype] to calculate it in the op. So I changed to passing multipliers as List[int] instead. We need to know how to split the result according to different dtypes.

… dtypes" Bucketing of multiple dtypes to be processed in one bucketed collective. First target is to bucket bf16 and f32, but already can be used with other dtypes. For now multidtype bucketing is only supported with "custom_ops" mode. Non custom_ops needs additional work on inductor side. cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta ezyang msaroufim voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

ghstack-source-id: 7d9f55d Pull Request resolved: #162470

… dtypes" Bucketing of multiple dtypes to be processed in one bucketed collective. First target is to bucket bf16 and f32, but already can be used with other dtypes. For now multidtype bucketing is only supported with "custom_ops" mode. Non custom_ops needs additional work on inductor side. cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta ezyang msaroufim voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

ghstack-source-id: 6c690e6 Pull Request resolved: #162470

eellison

looks good, just one comment

eellison · 2025-10-16T00:37:35Z

torch/_inductor/fx_passes/bucketing.py


+def pick_bucket_dtype(dtypes: list[torch.dtype]) -> torch.dtype:  # type: ignore[name-defined]
+    assert len(dtypes) > 0
+    lowest_dtype = dtypes[0]


nit: return min(dtypes, key=operator.attrgetter("itemsize"))

eellison · 2025-10-16T00:39:59Z

torch/_inductor/fx_passes/bucketing.py

 _pre_bucket_all_gather.register_fake(_pre_bucket_all_gather_fake)


+def _dtype_size_bytes(dtype: torch.dtype) -> int:  # type:  ignore[name-defined]


this is dtype.itemsize

eellison · 2025-10-16T00:51:29Z

torch/_inductor/fx_passes/bucketing.py

    rank: int,
 ) -> list[torch.Tensor]:
+    ag_ins = [
+        torch._prims.convert_element_type(_ag_in, out_dtype)


We should only view dtype here... Could we make this only do collective merging of different dtypes if we can do it without increase the total bytes transmitted ? Potentially in the future we would want to upcast if the latency is longer than the cost of additional bytes.. leave for future change?

Here we will have convert to out_dtypes only for fused-convert-dtype for all-gathers.

I'm not sure I follow. Why do we do this ? We shouldn't need extra casts

… dtypes" Bucketing of multiple dtypes to be processed in one bucketed collective. First target is to bucket bf16 and f32, but already can be used with other dtypes. For now multidtype bucketing is only supported with "custom_ops" mode. Non custom_ops needs additional work on inductor side. cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta ezyang msaroufim voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

ghstack-source-id: 8c55a96 Pull Request resolved: #162470

… dtypes" Bucketing of multiple dtypes to be processed in one bucketed collective. First target is to bucket bf16 and f32, but already can be used with other dtypes. For now multidtype bucketing is only supported with "custom_ops" mode. Non custom_ops needs additional work on inductor side. cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta ezyang msaroufim voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

ghstack-source-id: 4b76fe7 Pull Request resolved: #162470

… dtypes" Bucketing of multiple dtypes to be processed in one bucketed collective. First target is to bucket bf16 and f32, but already can be used with other dtypes. For now multidtype bucketing is only supported with "custom_ops" mode. Non custom_ops needs additional work on inductor side. cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta ezyang msaroufim voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

ghstack-source-id: 637e22c Pull Request resolved: #162470

IvanKobzarev · 2025-10-16T16:11:12Z

@pytorchbot merge

pytorchmergebot · 2025-10-16T16:13:14Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…torch#162470) Bucketing of multiple dtypes to be processed in one bucketed collective. First target is to bucket bf16 and f32, but already can be used with other dtypes. For now multidtype bucketing is only supported with "custom_ops" mode. Non custom_ops needs additional work on inductor side. Pull Request resolved: pytorch#162470 Approved by: https://github.com/eellison

[inductor][bucketing][WIP] Bucket multidtype

c19d451

[ghstack-poisoned]

pytorch-bot bot added ciflow/inductor module: inductor oncall: distributed Add this issue/PR to distributed oncall triage queue labels Sep 9, 2025

This was referenced Sep 9, 2025

[inductor][reordering] Use runtime estimations in reorder/sink passes #162469

Closed

[inductor][bucketing][WIP] Experimenting with Collectives Trie bucketing #162471

Closed

IvanKobzarev added the topic: not user facing topic category label Sep 9, 2025

IvanKobzarev added a commit that referenced this pull request Sep 9, 2025

[inductor][bucketing][WIP] Bucket multidtype

2bb5ed4

ghstack-source-id: 9ad19c7 Pull Request resolved: #162470

IvanKobzarev added a commit that referenced this pull request Sep 9, 2025

[inductor][bucketing][WIP] Bucket multidtype

0f9c723

ghstack-source-id: e47b03e Pull Request resolved: #162470

IvanKobzarev added a commit that referenced this pull request Sep 9, 2025

[inductor][bucketing] Bucket Collectives with multidtypes

6794775

ghstack-source-id: b1f5deb Pull Request resolved: #162470

IvanKobzarev changed the title ~~[inductor][bucketing][WIP] Bucket multidtype~~ [inductor][bucketing] Fx collectives bucketing of multiple dtypes Sep 9, 2025

IvanKobzarev requested review from eellison and fmassa September 9, 2025 12:12

eellison reviewed Sep 30, 2025

View reviewed changes

torch/_inductor/fx_passes/bucketing.py Outdated Show resolved Hide resolved

IvanKobzarev added a commit that referenced this pull request Oct 2, 2025

[inductor][bucketing] Bucket Collectives with multidtypes

06c2803

ghstack-source-id: e14ac69 Pull Request resolved: #162470

IvanKobzarev requested a review from eellison October 2, 2025 10:20

IvanKobzarev added a commit that referenced this pull request Oct 2, 2025

[inductor][bucketing] Bucket Collectives with multidtypes

4ff8c09

ghstack-source-id: 4431db2 Pull Request resolved: #162470

IvanKobzarev added a commit that referenced this pull request Oct 2, 2025

[inductor][bucketing] Bucket Collectives with multidtypes

6f8047c

ghstack-source-id: fb2bcab Pull Request resolved: #162470

eellison reviewed Oct 3, 2025

View reviewed changes

IvanKobzarev added a commit that referenced this pull request Oct 7, 2025

[inductor][bucketing] Bucket Collectives with multidtypes

6f8a858

ghstack-source-id: 7d9f55d Pull Request resolved: #162470

IvanKobzarev added a commit that referenced this pull request Oct 7, 2025

[inductor][bucketing] Bucket Collectives with multidtypes

044f5b8

ghstack-source-id: 6c690e6 Pull Request resolved: #162470

IvanKobzarev requested a review from eellison October 7, 2025 16:59

eellison approved these changes Oct 16, 2025

View reviewed changes

IvanKobzarev added a commit that referenced this pull request Oct 16, 2025

[inductor][bucketing] Bucket Collectives with multidtypes

a70b190

ghstack-source-id: 8c55a96 Pull Request resolved: #162470

IvanKobzarev added a commit that referenced this pull request Oct 16, 2025

[inductor][bucketing] Bucket Collectives with multidtypes

a6d16e4

ghstack-source-id: 4b76fe7 Pull Request resolved: #162470

IvanKobzarev added a commit that referenced this pull request Oct 16, 2025

[inductor][bucketing] Bucket Collectives with multidtypes

257443d

ghstack-source-id: 637e22c Pull Request resolved: #162470

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 16, 2025

pytorchmergebot added the merging label Oct 16, 2025

pytorchmergebot added the Merged label Oct 16, 2025

pytorchmergebot closed this in 7d87d70 Oct 16, 2025

pytorchmergebot removed the merging label Oct 16, 2025

github-actions bot deleted the gh/IvanKobzarev/151/head branch November 17, 2025 02:17

		if s == OrderedSet([torch.bfloat16, torch.float]): # type: ignore[attr-defined]
		return torch.bfloat16 # type: ignore[attr-defined]

		_pre_bucket_all_gather.register_fake(_pre_bucket_all_gather_fake)


		def _dtype_size_bytes(dtype: torch.dtype) -> int: # type: ignore[name-defined]

Conversation

IvanKobzarev commented Sep 9, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162470

✅ No Failures

Uh oh!

eellison left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

eellison left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eellison left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

IvanKobzarev Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

IvanKobzarev commented Oct 16, 2025

Uh oh!

pytorchmergebot commented Oct 16, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

IvanKobzarev commented Sep 9, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Sep 9, 2025 •

edited

Loading

IvanKobzarev Oct 16, 2025 •

edited

Loading