Remove eager mode support form CommTensor#84978
Remove eager mode support form CommTensor#84978mrshenli wants to merge 6 commits intogh/mrshenli/335/basefrom
Conversation
We don't need eager mode support (automatic wait on read) for now. Removing that to simply the code. We can always add this back if necessary in the future. [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/84978
Note: Links to docs will display an error until the docs builds have been completed. ✅ No Failures, 2 PendingAs of commit 7648f90: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
We don't need eager mode support (automatic wait on read) for now. Removing that to simply the code. We can always add this back if necessary in the future. [ghstack-poisoned]
We don't need eager mode support (automatic wait on read) for now. Removing that to simply the code. We can always add this back if necessary in the future. [ghstack-poisoned]
We don't need eager mode support (automatic wait on read) for now. Removing that to simply the code. We can always add this back if necessary in the future. [ghstack-poisoned]
We don't need eager mode support (automatic wait on read) for now. Removing that to simply the code. We can always add this back if necessary in the future. Note that, we still need the eager mode code in `__torch_dispatch__`, as `make_fx` will also run the ops in eager mode to get the output. [ghstack-poisoned]
We don't need eager mode support (automatic wait on read) for now. Removing that to simply the code. We can always add this back if necessary in the future. Note that, we still need the eager mode code in `__torch_dispatch__`, as `make_fx` will also run the ops in eager mode to get the output. [ghstack-poisoned]
wanchaol
left a comment
There was a problem hiding this comment.
looks good to me overall, have some questions about how/when "eager mode" work.wait() should be triggered.
|
|
||
|
|
||
| def _get_tracer(obj: Any) -> Optional[torch.fx.Tracer]: | ||
| slots = get_proxy_slots(obj) |
There was a problem hiding this comment.
there's a global fx tracer table, and that's why we could do this trick?
There was a problem hiding this comment.
yep, exactly. Below is how proxy_tensor.py sets the slots.
pytorch/torch/fx/experimental/proxy_tensor.py
Lines 55 to 58 in 35f6a69
And it also a similar solution to get the proxy instance.
pytorch/torch/fx/experimental/proxy_tensor.py
Lines 85 to 92 in 35f6a69
| t = tensor._tensor if isinstance(tensor, CommTensor) else tensor | ||
| if _get_tracer(t) is None: | ||
| # noop for eager mode | ||
| return tensor |
There was a problem hiding this comment.
what would be the eager mode looks like then? I think for device mesh collective, we would always wrap the input tensor be CommTensor, so for "eager mode", we just return the same tensor, and user of device mesh could need to manually call work.wait() to sync the stream and get the results?
There was a problem hiding this comment.
user of device mesh could need to manually call work.wait() to sync the stream and get the results?
Yep, in eager mode, users/DT will have to explicitly call wait. I assume it is already so today, otherwise it won't be semantically correct? LMK if DT relies on CommTensor to correctly wait for communication. In that case, I will revert this change and only leave clean up code.
We don't need eager mode support (automatic wait on read) for now. Removing that to simply the code. We can always add this back if necessary in the future. Note that, we still need the eager mode code in `__torch_dispatch__`, as `make_fx` will also run the ops in eager mode to get the output. Pull Request resolved: #84978 Approved by: https://github.com/wanchaol
Stack from ghstack (oldest at bottom):
We don't need eager mode support (automatic wait on read) for now.
Removing that to simply the code. We can always add this back if
necessary in the future.
Note that, we still need the eager mode code in
__torch_dispatch__,as
make_fxwill also run the ops in eager mode to get the output.