Handle multiple inplace update input output aliasing by JackCaoG · Pull Request #7023 · pytorch/xla

JackCaoG · 2024-05-03T17:12:58Z

Fix the bug where if inplace operation is being applied multiple times, the aliasing won;t happened.

Consider the case without this pr

# Tensor ID 1, alias ID 1
t1 = torch.randn(5,5).to('xla:0')
# Tensor ID 2, alias ID 1
t1 += 1
# Tensor ID 3, alias ID 2
t1 * = 3

During the mark_step time we check that input buffer has tensor ID 1, and the output alias id is 2, hence it will skip donating input buffer of size (5,5).

xla/torch_xla/csrc/xla_graph_executor.cpp

Lines 1249 to 1253 in d123585

    
           for (size_t i = 0; i < indices.size(); ++i) { 
        
             size_t tensor_index = indices[i]; 
        
             int64_t tensor_id = tensors[tensor_index]->data()->alias_id; 
        
             output_tensor_id_map[tensor_id] = i; 
        
           }

xla/torch_xla/csrc/xla_graph_executor.cpp

Lines 1261 to 1269 in d123585

    
           auto it = output_tensor_id_map.find(data_info->tensor_id); 
        
           // Parameter buffer's TensorId in output_tensor_id_map means 
        
           // this buffer is not needed after execution since XLATensor will get a 
        
           // new buffer. 
        
           if (it != output_tensor_id_map.end()) { 
        
             lowering_ctx->builder()->AddBufferDonor(/*param_number=*/i, 
        
                                                     /*param_index=*/{}); 
        
             buffer_donor_indexs.push_back(i); 
        
           }

Alias ID should track the tensor ID of the input buffer, not the tensor ID of last base.

JackCaoG · 2024-05-03T19:18:35Z

@alanwaketan @wonjoolee95 I think this one is ready for review.

alanwaketan · 2024-05-03T20:02:57Z

  auto input_tensor = bridge::GetXlaTensor(input);
  auto output_tensor = bridge::GetXlaTensor(output);
-  output_tensor->data()->alias_id = input_tensor->GetUniqueId();
+  if (input_tensor->CurrentDataHandle() != nullptr ||


I guess we can always use alias_id?

haha that's what I thought but actually no. Look at my example down below

// x.tensor_id = 1, x.alias_id = 1 x = torch.randn(5,5).to(xla_device()) // x.tensor_id = 2, x.alias_id should be 1 x += 1 xm.mark_step() // x.tensor_id =3, x.alias_id should be 2 since input tensor id will be 2 // for this graph x *= 1 xm.mark_step()

if we always use alias_id, the alias_id of x in second would be 1, but we need it to be 2.

in the second execution, input tensor id is 2, we need the alias ID to always match the input tensor ID. In other world we should not carry alias_id across mark_step.

This is a bit tricky, even the underlying buffer is aliased, we still create a new PjrtBuffer object for x after the first mark_step. That DeviceData object(wrap about pjrtbuffer) will have data_info with tensor_id 2, since x's tensor id is 2 after the first mark_step.

I guess resetting alias_id after mark_step is probably very complicated. This is more like a simplified way to achieve that. Assuming IR/outputs becomes DeviceData/inputs.

we can do that too(reset alias_id to tensor id after processed the input_output_alias info). That might make this code less confuse haha.

That sounds like a good follow up, but feel free to skip it.

alanwaketan

LGTM.

jeffhataws · 2024-06-04T16:03:12Z

Will this go into 2.4? Any chance it can be backported to 2.3?

JackCaoG · 2024-06-04T17:18:51Z

This will be part of the 2.4, we don't do dot releases so it is unlikely for this one to be in the 2.3 release.

JackCaoG added 3 commits May 3, 2024 17:12

Handle multiple inplace update input output aliasing

25232c5

add test for multiple in place

22d1fb4

add another test for aliasing across mark_step

8c1e37a

JackCaoG requested review from alanwaketan and wonjoo-wj May 3, 2024 17:37

alanwaketan reviewed May 3, 2024

View reviewed changes

alanwaketan approved these changes May 3, 2024

View reviewed changes

JackCaoG merged commit e3fc033 into master May 3, 2024

jeffhataws pushed a commit that referenced this pull request May 31, 2024

Handle multiple inplace update input output aliasing (#7023)

2fc7912

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle multiple inplace update input output aliasing#7023

Handle multiple inplace update input output aliasing#7023
JackCaoG merged 3 commits intomasterfrom
JackCaoG/fix_multi_inplace_update_aliasing

JackCaoG commented May 3, 2024 •

edited

Loading

Uh oh!

JackCaoG commented May 3, 2024

Uh oh!

alanwaketan May 3, 2024

Uh oh!

JackCaoG May 3, 2024 •

edited

Loading

Uh oh!

JackCaoG May 3, 2024

Uh oh!

JackCaoG May 3, 2024

Uh oh!

alanwaketan May 3, 2024 •

edited

Loading

Uh oh!

JackCaoG May 3, 2024

Uh oh!

alanwaketan May 3, 2024

Uh oh!

alanwaketan left a comment

Uh oh!

jeffhataws commented Jun 4, 2024

Uh oh!

JackCaoG commented Jun 4, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	for (size_t i = 0; i < indices.size(); ++i) {
	size_t tensor_index = indices[i];
	int64_t tensor_id = tensors[tensor_index]->data()->alias_id;
	output_tensor_id_map[tensor_id] = i;
	}

	auto it = output_tensor_id_map.find(data_info->tensor_id);
	// Parameter buffer's TensorId in output_tensor_id_map means
	// this buffer is not needed after execution since XLATensor will get a
	// new buffer.
	if (it != output_tensor_id_map.end()) {
	lowering_ctx->builder()->AddBufferDonor(/param_number=/i,
	/param_index=/{});
	buffer_donor_indexs.push_back(i);
	}

Conversation

JackCaoG commented May 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JackCaoG commented May 3, 2024

Uh oh!

alanwaketan May 3, 2024

Choose a reason for hiding this comment

Uh oh!

JackCaoG May 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JackCaoG May 3, 2024

Choose a reason for hiding this comment

Uh oh!

JackCaoG May 3, 2024

Choose a reason for hiding this comment

Uh oh!

alanwaketan May 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JackCaoG May 3, 2024

Choose a reason for hiding this comment

Uh oh!

alanwaketan May 3, 2024

Choose a reason for hiding this comment

Uh oh!

alanwaketan left a comment

Choose a reason for hiding this comment

Uh oh!

jeffhataws commented Jun 4, 2024

Uh oh!

JackCaoG commented Jun 4, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JackCaoG commented May 3, 2024 •

edited

Loading

JackCaoG May 3, 2024 •

edited

Loading

alanwaketan May 3, 2024 •

edited

Loading