Extend buffer donation aliasing APIs#8721
Conversation
0dc6476 to
03e26ac
Compare
019f145 to
5cb4a6c
Compare
|
I had
but that was failing |
5cb4a6c to
ed28bcd
Compare
ysiraichi
left a comment
There was a problem hiding this comment.
Overall, LGTM. I do think it would be better to leave non-trivial refactorings for another PR. It would make reviewing easier.
That said, I'm not entirely familiar with this buffer donor system. Maybe @tengyifei @lsy323 @vanbasten23 @bhavya01 could help review.
ed28bcd to
6016023
Compare
3dac1fd to
b6a9371
Compare
|
As suggested by @ysiraichi, so we don't block on this needed PR, deferring:
In addition to separating the following as separate PRs (I'll do so shortly after closing this one):
|
b6a9371 to
3904c2c
Compare
Just so we are clear: are you closing this one? Or do you mean after this gets merged? |
I will do these as separate PRs after this one is merged. |
ysiraichi
left a comment
There was a problem hiding this comment.
Just one comment. Looks good otherwise.
Could you also change the PR description so we know exactly what changes this PR is introducing?
tengyifei
left a comment
There was a problem hiding this comment.
Apologies for the delay. I'll be frank I don't understand the internal details that well. In fact nobody on the team does. So I've asked some questions on the PR.
| self.assertEqual(met.metric_data("InputOutputAliasCount")[1], 2.0) | ||
|
|
||
| def test_user_config_donation_with_ltc_donation_overlap(self): | ||
| with alias_with_buffer_donor_config_context(True): |
There was a problem hiding this comment.
What does this config context do?
There was a problem hiding this comment.
It enables the general behavior of buffer donation. In other words, this (_xla_set_enable_alias_with_buffer_donor_config) needs to be set, so that the tensors (with the associated XLA device data backend) that were explicitly marked to be donated (_set_buffer_donation) can be donated in the next execution (i.e. mark step) [1]. It's effectively a general opt-in for the entire donation behavior (arguably we wouldn't need it since _set_buffer_donation is already assuming that the user does intend to explicitly donate it, but I just kept it as is).
#8711 describes the problem and use case, but Jack introduced this in the context of dynamo execution (#6587). Since it's an opt-in, we want to give users the flexibility to explicitly annotate the donation (precisely when they don't need the tensor after mark step). This is particularly helpful with the gradient accumulation (and scan), since we can't do in-place mutations of the fn parameters (aliasing is uncaptured by LTC), so this allows us to mitigate / close that gap. The main change in this PR, to resolve the PFR, is that we enable LTC and buffer donation to coexist (and alleviate the former constrained fallback behavior: if no LTC aliasings + user config is enabled -> donate the buffers). It doesn't alter the original intended behavior, since this heuristic is still directly maintained - but we widen the applications.
[1] XLA HLO donation ref: https://github.com/openxla/xla/blob/1c7d5f62e3b4c90a4b3484429c5042c69ceb5c5b/xla/service/hlo.proto#L479
There was a problem hiding this comment.
Got it. So we need both the config and an explicit donation call.
| self.assertIn('XlaSetBufferDonation', met.counter_names()) | ||
| self.assertEqual(met.counter_value('XlaSetBufferDonation'), 1) | ||
| t1 = t0 + 1 | ||
| torch_xla._XLAC._xla_sync_multi([t0, t1], [str(xla_device)], True, False) |
There was a problem hiding this comment.
What does sync_multi do? Is it the same as mark_step?
There was a problem hiding this comment.
Mostly, but more flexible. It allows me to specify sync_xla_data=False, so that I can ensure that the aliasing is due to the user config (explicit buffer donation), and not auto LTC aliasings. So sync_ltc_data in SyncTensorsConfig will be false, so it won't include the default LTC aliasing.
I'll add a comment, since that's the motivation behind using it over mark_step.
|
|
||
| void SetAliasWithBufferDonorConfig(bool should_alias); | ||
|
|
||
| void SetAliasWithBufferDonorConfig(bool enable_alias); |
There was a problem hiding this comment.
I see you've renamed "should" to "enable". Does this imply some feature change?
There was a problem hiding this comment.
Only in the sense that we allow is to run simultaneously with LTC aliasing, hence it should always be in effect. In this case, "should" for donating buffers (_set_buffer_donation) can be kept, because it may not necessarily donate it, since it depends whether this config is enabled (_xla_set_enable_alias_with_buffer_donor_config). I renamed the parameters for _xla_set_enable_alias_with_buffer_donor_config/SetAliasWithBufferDonorConfig, since we are enabling/allowing explicitly marked tensors (_set_buffer_donation) to be donated, simultaneously with LTC aliasing, so I figured that renaming it to "enable" is more suitable now.
No worries. Thanks for taking a look - let me know if I was able to answer these. I'll revise with the test change asap. |
3904c2c to
1d82b93
Compare
1d82b93 to
575d0c5
Compare
|
@tengyifei Done, PTAL |
|
@tengyifei @ysiraichi Hey folks, can you PTAL, thanks! |
| tensors, coll.indices, parameters_data); | ||
| } | ||
|
|
||
| std::vector<size_t> user_config_buffer_donor_indices; |
There was a problem hiding this comment.
This "user config" name is really unfortunate. It should just be called "explicit donation". I was going to suggest renaming it until I realized this is an existing feature that you exposed.
There was a problem hiding this comment.
Yes, unfortunately these are tied to existing API bindings, so they could already be used by users. We could rename, but we'd have to do it over a few release cycles with a deprecation warning.
In this PR, we modify this behavior to always accommodate explicitly donated buffers, working simultaneously with LTC, IF enabled.
Enables #8711. In particular, it alleviates the fallback aliasing logic for user config. Previously, it would only be applied if auto LTC did not infer any needed aliasing which is a semantically suboptimal surface area. In this PR, we modify this behavior to always accommodate explicitly donated buffers, working simultaneously with LTC, IF enabled.
In this PR, it is out of scope to address the per-device concerns associated with having global (DeviceArenaContext) metadata for all the devices with respect to aliasing and other dynamo execution -specific logic.