[SPMD] Introduce high level manual sharding APIs by alanwaketan · Pull Request #6931 · pytorch/xla

alanwaketan · 2024-04-17T00:46:51Z

Summary:
This pull request introduces:

enable_manual_sharding: which starts the manual sharding region.
disable_manual_sharding: which disable the manual sharding region.

Test Plan:
PJRT_DEVICE=TPU python test/spmd/test_xla_sharding.py -v -k test_manual_sharding_api_e2e

yeounoh · 2024-04-17T17:26:09Z

+                           *,
+                           mesh: Mesh = None) -> XLAShardedTensor:
+  """
+  This API enables manual sharding for the given tensor. Manual sharding disables auto sharding proporgation and auto


"auto" --> "SPMD", think it's important to not confuse.

yeounoh

LGTM, left a comment for comment :)

jonb377

LGTM!

jonb377 · 2024-04-17T17:36:50Z

+  """
+  mesh = get_global_mesh() if mesh is None else mesh
+  t = mark_sharding(unwrap_sharded_tensor(t), mesh, partition_spec)
+  t = torch_xla._XLAC._spmd_full_to_shard_shape(unwrap_sharded_tensor(t))


Can t here be DeviceData?

You mean the input? Yes!

jonb377 · 2024-04-17T17:38:40Z

+  """
+  This API enables manual sharding for the given tensor. Manual sharding disables auto sharding proporgation and auto
+  partition for the given tensor and all subsequential tensors that produced by an op that uses the given tensor as
+  input, and therefore allows the user to manually call collectives for the tensor and subsequential tensors. It


Also just curious - how will we enable collectives in a manual region?

XLA cc ops by default should work. Just use it as normal. However, we need to teach our cc ops wrapper to be aware of SPMD mode. So, it will be phase 2 of the mnual sharding.

Summary: This pull request introduces: 1. enable_manual_sharding: which starts the manual sharding region. 2. disable_manual_sharding: which disable the manual sharding region. Test Plan: PJRT_DEVICE=TPU python test/spmd/test_xla_sharding.py -v -k test_manual_sharding_api_e2e

alanwaketan added 4 commits April 17, 2024 00:40

Add an e2e test

0908fb3

initial commit

ce606ef

Fix linters

0bf9554

Fix linters

651415a

alanwaketan requested review from jonb377 and yeounoh April 17, 2024 00:46

alanwaketan self-assigned this Apr 17, 2024

yeounoh reviewed Apr 17, 2024

View reviewed changes

yeounoh approved these changes Apr 17, 2024

View reviewed changes

jonb377 approved these changes Apr 17, 2024

View reviewed changes

jonb377 reviewed Apr 17, 2024

View reviewed changes

Fix comment

4aafe14

alanwaketan merged commit 9b2ac4b into master Apr 17, 2024

alanwaketan deleted the alanwaketan/manual_sharding_api branch April 17, 2024 18:28

baoleai mentioned this pull request Aug 6, 2024

Add manual sharding API for SPMD AlibabaPAI/xla#2

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPMD] Introduce high level manual sharding APIs#6931

[SPMD] Introduce high level manual sharding APIs#6931
alanwaketan merged 5 commits intomasterfrom
alanwaketan/manual_sharding_api

alanwaketan commented Apr 17, 2024

Uh oh!

yeounoh Apr 17, 2024

Uh oh!

yeounoh left a comment

Uh oh!

jonb377 left a comment

Uh oh!

jonb377 Apr 17, 2024

Uh oh!

alanwaketan Apr 17, 2024

Uh oh!

jonb377 Apr 17, 2024

Uh oh!

alanwaketan Apr 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

alanwaketan commented Apr 17, 2024

Uh oh!

yeounoh Apr 17, 2024

Choose a reason for hiding this comment

Uh oh!

yeounoh left a comment

Choose a reason for hiding this comment

Uh oh!

jonb377 left a comment

Choose a reason for hiding this comment

Uh oh!

jonb377 Apr 17, 2024

Choose a reason for hiding this comment

Uh oh!

alanwaketan Apr 17, 2024

Choose a reason for hiding this comment

Uh oh!

jonb377 Apr 17, 2024

Choose a reason for hiding this comment

Uh oh!

alanwaketan Apr 17, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants