support dist.broadcast by zpcore · Pull Request #7956 · pytorch/xla

zpcore · 2024-09-05T01:53:05Z

Support torch.distributed.broadcast for both dynamo and nondynamo.

This PR needs pytorch/pytorch#135171 to be merged first.

will-cromar · 2024-09-05T17:33:49Z

+  XLATensorPtr xmask = bridge::GetXlaTensor(mask);
+  auto masked_input = tensor_methods::mul(xinput, xmask);
+  auto result = tensor_methods::all_reduce(masked_input, AllReduceType::kSum,
+                                           1.0, {}, true);


nit: name the non-obvious arguments at the end here. Assuming these two are scale and replica groups, /*scale=*/1, /*groups=*/{} (double check the names).

will-cromar · 2024-09-05T17:34:05Z



-@absltest.skipIf(lambda: tpu.num_logical_cores_per_chip() >= 2,
+@absltest.skipIf(tpu.num_logical_cores_per_chip() >= 2,


🤦 thanks

will-cromar · 2024-09-05T17:35:07Z

-
-
-# "broadcast(Tensor self, int src, str tag, int[] ranks, int group_size) -> Tensor",
-@torch.library.impl("_c10d_functional::broadcast", "XLA")


@JackCaoG FYI

will-cromar · 2024-09-05T17:36:16Z

+  at::Tensor mask;
+  const torch::lazy::BackendDevice& device = xinput->GetDevice();
+  if (device.ordinal() == src) {
+    mask = at::ones_like(input);


Is there an equivalent to torch.no_grad() in C++? That's the only difference I see between the original python version and this one

Searched the doc and we can use the following scope for tensor operation without grad:

{ at::NoGradGuard no_grad; // tensor operations }

Anyone knows why we set no grad here:

xla/torch_xla/_internal/c10d_registration.py

Line 17 in c0501f0

with torch.no_grad():

@JackCaoG

support dist.broadcast

1e30925

zpcore added usability Bugs/features related to improving the usability of PyTorch/XLA tpuci labels Sep 5, 2024

zpcore mentioned this pull request Sep 5, 2024

Dynamo support for collective broadcast op pytorch/pytorch#135171

Closed

zpcore requested review from JackCaoG and will-cromar September 5, 2024 01:56

nit

7d931f6

zpcore marked this pull request as ready for review September 5, 2024 02:04

nit

21ff07d

will-cromar reviewed Sep 5, 2024

View reviewed changes

tengyifei removed the tpuci label Jan 25, 2025

bfolie mentioned this pull request Jun 9, 2025

[RFC] Improved coverage for native distributed collective operations #9315

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support dist.broadcast#7956

support dist.broadcast#7956
zpcore wants to merge 3 commits intomasterfrom
piz/broadcast

zpcore commented Sep 5, 2024 •

edited

Loading

Uh oh!

will-cromar Sep 5, 2024

Uh oh!

will-cromar Sep 5, 2024

Uh oh!

will-cromar Sep 5, 2024

Uh oh!

will-cromar Sep 5, 2024

Uh oh!

zpcore Sep 5, 2024

Uh oh!

zpcore Sep 19, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		@absltest.skipIf(lambda: tpu.num_logical_cores_per_chip() >= 2,
		@absltest.skipIf(tpu.num_logical_cores_per_chip() >= 2,



		# "broadcast(Tensor self, int src, str tag, int[] ranks, int group_size) -> Tensor",
		@torch.library.impl("_c10d_functional::broadcast", "XLA")

Conversation

zpcore commented Sep 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

will-cromar Sep 5, 2024

Choose a reason for hiding this comment

Uh oh!

will-cromar Sep 5, 2024

Choose a reason for hiding this comment

Uh oh!

will-cromar Sep 5, 2024

Choose a reason for hiding this comment

Uh oh!

will-cromar Sep 5, 2024

Choose a reason for hiding this comment

Uh oh!

zpcore Sep 5, 2024

Choose a reason for hiding this comment

Uh oh!

zpcore Sep 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zpcore commented Sep 5, 2024 •

edited

Loading

zpcore Sep 19, 2024 •

edited

Loading