Support activation sharding by wonjoo-wj · Pull Request #53 · pytorch-tpu/llama

wonjoo-wj · 2024-02-15T21:56:12Z

With pytorch/xla#6524, we can now support activation sharding.

This is an example of the changes we need to call the new dynamo python custom op.

Note that this requires pytorch/xla#6524 to be merged first. Tested locally, llama 2 inference with dynamo+spmd (with activation sharding) is successful: https://gist.github.com/wonjoolee95/a290a68f29c52bd395b16ae6df651531.

Also note that performance is not tested, so we need to find the optimal sharding strategy. This only tests the functionality.

miladm · 2024-03-05T21:48:13Z

-        device_ids = torch.arange(num_devices)
-        data_model_mesh = xs.Mesh(device_ids, (4, 1, 2))
-        xs.mark_sharding(output, data_model_mesh, (0, 1, 2), use_dynamo_custom_op=True)
+        # num_devices = xr.global_runtime_device_count()


@wonjoolee95 can you add a discussion on the reason to keep output = self.wo(output) in line 204 when we want to do the dynamo_mark_sharding call after?

This is the original implementation, and we are just adding the annotations. We have to do output projection as-is.

yeounoh · 2024-03-20T18:28:46Z

+        # custom python dynamo mark sharding
+        import torch_xla.experimental.dynamo_mark_sharding
+        device_ids = [0]
+        mesh_shape = [1, 1, 1]


wait, why is the mesh shape 1, 1, 1?

yeounoh

This is for testing -- @wonjoolee95

could you remove the "[WONJOO]" debugging probes, and make it look more formal if useful?
Also, you have to change the mesh shape. Don't think we are sharding the activation in the current impl?

wonjoo-wj · 2024-03-20T18:31:12Z

Moving this over to #55

Support activation sharding

c134ffa

wonjoo-wj mentioned this pull request Feb 15, 2024

Implement mark_sharding as a custom dynamo op pytorch/xla#6524

Merged

miladm reviewed Mar 5, 2024

View reviewed changes

yeounoh self-requested a review March 20, 2024 18:25

yeounoh reviewed Mar 20, 2024

View reviewed changes

yeounoh requested changes Mar 20, 2024

View reviewed changes

wonjoo-wj closed this Mar 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support activation sharding#53

Support activation sharding#53
wonjoo-wj wants to merge 1 commit intooptimize_spmd_shardingfrom
wonjoo/activation-sharding

wonjoo-wj commented Feb 15, 2024 •

edited

Loading

Uh oh!

miladm Mar 5, 2024 •

edited

Loading

Uh oh!

yeounoh Mar 20, 2024 •

edited

Loading

Uh oh!

yeounoh Mar 20, 2024

Uh oh!

yeounoh left a comment

Uh oh!

wonjoo-wj commented Mar 20, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wonjoo-wj commented Feb 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

miladm Mar 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yeounoh Mar 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yeounoh Mar 20, 2024

Choose a reason for hiding this comment

Uh oh!

yeounoh left a comment

Choose a reason for hiding this comment

Uh oh!

wonjoo-wj commented Mar 20, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wonjoo-wj commented Feb 15, 2024 •

edited

Loading

miladm Mar 5, 2024 •

edited

Loading

yeounoh Mar 20, 2024 •

edited

Loading