Remove eager mode support form CommTensor by mrshenli · Pull Request #84978 · pytorch/pytorch

mrshenli · 2022-09-14T02:09:44Z

Stack from ghstack (oldest at bottom):

We don't need eager mode support (automatic wait on read) for now.
Removing that to simply the code. We can always add this back if
necessary in the future.

Note that, we still need the eager mode code in __torch_dispatch__,
as make_fx will also run the ops in eager mode to get the output.

We don't need eager mode support (automatic wait on read) for now. Removing that to simply the code. We can always add this back if necessary in the future. [ghstack-poisoned]

pytorch-bot · 2022-09-14T02:09:46Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/84978

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures, 2 Pending

As of commit 7648f90:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

We don't need eager mode support (automatic wait on read) for now. Removing that to simply the code. We can always add this back if necessary in the future. [ghstack-poisoned]

We don't need eager mode support (automatic wait on read) for now. Removing that to simply the code. We can always add this back if necessary in the future. Note that, we still need the eager mode code in `__torch_dispatch__`, as `make_fx` will also run the ops in eager mode to get the output. [ghstack-poisoned]

wanchaol

looks good to me overall, have some questions about how/when "eager mode" work.wait() should be triggered.

wanchaol · 2022-09-14T05:54:25Z

torch/distributed/_spmd/comm_tensor.py



+def _get_tracer(obj: Any) -> Optional[torch.fx.Tracer]:
+    slots = get_proxy_slots(obj)


there's a global fx tracer table, and that's why we could do this trick?

yep, exactly. Below is how proxy_tensor.py sets the slots.

pytorch/torch/fx/experimental/proxy_tensor.py

Lines 55 to 58 in 35f6a69

def set_proxy_slot(obj, tracer, proxy):

d = obj.__dict__.setdefault(proxy_slot, weakref.WeakKeyDictionary())

assert isinstance(d, weakref.WeakKeyDictionary)

d[tracer] = proxy

And it also a similar solution to get the proxy instance.

pytorch/torch/fx/experimental/proxy_tensor.py

Lines 85 to 92 in 35f6a69

# Gets the proxy for a tensor, if it exists.

def get_proxy(obj):

res = get_proxy_slots(obj)

if res is None:

return None

vals = tuple(res.values())

assert len(vals) == 1

return vals[0]

wanchaol · 2022-09-14T05:56:13Z

torch/distributed/_spmd/comm_tensor.py

+        t = tensor._tensor if isinstance(tensor, CommTensor) else tensor
+        if _get_tracer(t) is None:
+            # noop for eager mode
+            return tensor


what would be the eager mode looks like then? I think for device mesh collective, we would always wrap the input tensor be CommTensor, so for "eager mode", we just return the same tensor, and user of device mesh could need to manually call work.wait() to sync the stream and get the results?

user of device mesh could need to manually call work.wait() to sync the stream and get the results?

Yep, in eager mode, users/DT will have to explicitly call wait. I assume it is already so today, otherwise it won't be semantically correct? LMK if DT relies on CommTensor to correctly wait for communication. In that case, I will revert this change and only leave clean up code.

We don't need eager mode support (automatic wait on read) for now. Removing that to simply the code. We can always add this back if necessary in the future. Note that, we still need the eager mode code in `__torch_dispatch__`, as `make_fx` will also run the ops in eager mode to get the output. Pull Request resolved: #84978 Approved by: https://github.com/wanchaol

Remove eager mode support form CommTensor

e1b0e2a

We don't need eager mode support (automatic wait on read) for now. Removing that to simply the code. We can always add this back if necessary in the future. [ghstack-poisoned]

mrshenli requested review from H-Huang, awgu, mingzhe09088, pritamdamania87, rohan-varma and zhaojuanmao as code owners September 14, 2022 02:09

pytorch-bot bot added the topic: not user facing topic category label Sep 14, 2022

facebook-github-bot added the cla signed label Sep 14, 2022

Update on "Remove eager mode support form CommTensor"

3abfa5c

We don't need eager mode support (automatic wait on read) for now. Removing that to simply the code. We can always add this back if necessary in the future. [ghstack-poisoned]

facebook-github-bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Sep 14, 2022

Update on "Remove eager mode support form CommTensor"

b664754

We don't need eager mode support (automatic wait on read) for now. Removing that to simply the code. We can always add this back if necessary in the future. [ghstack-poisoned]

Update on "Remove eager mode support form CommTensor"

a85e941

We don't need eager mode support (automatic wait on read) for now. Removing that to simply the code. We can always add this back if necessary in the future. [ghstack-poisoned]

mrshenli mentioned this pull request Sep 14, 2022

Test tracing consecutive comms on the same input tensor #84980

Closed

mrshenli requested a review from wanchaol September 14, 2022 03:39

mrshenli added 2 commits September 14, 2022 04:44

wanchaol reviewed Sep 14, 2022

View reviewed changes

wanchaol approved these changes Sep 14, 2022

View reviewed changes

pytorchmergebot closed this in 0f30059 Sep 14, 2022

facebook-github-bot deleted the gh/mrshenli/335/head branch September 18, 2022 14:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove eager mode support form CommTensor#84978

Remove eager mode support form CommTensor#84978
mrshenli wants to merge 6 commits intogh/mrshenli/335/basefrom
gh/mrshenli/335/head

mrshenli commented Sep 14, 2022 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 14, 2022 •

edited

Loading

Uh oh!

wanchaol left a comment

Uh oh!

wanchaol Sep 14, 2022

Uh oh!

mrshenli Sep 14, 2022

Uh oh!

wanchaol Sep 14, 2022

Uh oh!

mrshenli Sep 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		def _get_tracer(obj: Any) -> Optional[torch.fx.Tracer]:
		slots = get_proxy_slots(obj)

	def set_proxy_slot(obj, tracer, proxy):
	d = obj.__dict__.setdefault(proxy_slot, weakref.WeakKeyDictionary())
	assert isinstance(d, weakref.WeakKeyDictionary)
	d[tracer] = proxy

	# Gets the proxy for a tensor, if it exists.
	def get_proxy(obj):
	res = get_proxy_slots(obj)
	if res is None:
	return None
	vals = tuple(res.values())
	assert len(vals) == 1
	return vals[0]

Conversation

mrshenli commented Sep 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/84978

✅ No Failures, 2 Pending

Uh oh!

wanchaol left a comment

Choose a reason for hiding this comment

Uh oh!

wanchaol Sep 14, 2022

Choose a reason for hiding this comment

Uh oh!

mrshenli Sep 14, 2022

Choose a reason for hiding this comment

Uh oh!

wanchaol Sep 14, 2022

Choose a reason for hiding this comment

Uh oh!

mrshenli Sep 14, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mrshenli commented Sep 14, 2022 •

edited

Loading

pytorch-bot bot commented Sep 14, 2022 •

edited

Loading