[Kernel C API] Fix inplacement issue in C API #47356

jzhoulon · 2021-02-24T01:35:20Z

This commit is intended to fix two inplace issue in kernel C API:

RefCountIsOne
tensor.RefCountIsOne() is the condition to decide whether the op can
do inplacement(e.g. forward_input), however, TF_Tensor retrieved through
TF_GetInput(), TF_AllocateOutput() makes the refcount always > 1, in
TF_TensorFromTensor, it will call tensor.CopyFrom() and increment the refcount
TF_TensorBitcastFrom()
Currently, this API only modifies TF_Tensor's buf pointer, but the src c++ tensor it
copied from is not modified, that makes the TF_Tensor inconsistent with src c++ tensor,
see the following example, it allocated an output TF_Tensor and modified its buffer through
TF_TensorBitcastFrom(), however, src c++ tensor in Op_KernelContext is not modified.
```
 // Example:
 TF_Tensor* a = TF_AllocateOutput(ctx, 0); // src c++ tensor: x
 TF_Tensor* b = TF_AllocateTemp(ctx);
 TF_TensorBitcastFrom(b, a); // a->buf_ = b_buf;
 // src c++ tensor x's buf is not modified, thus x->buf_ != a->buf_
```

jzhoulon · 2021-02-24T01:46:31Z

@penpornk This PR aims at solving two inplacement issue in kernel C API, please help to review, thanks very much.

TF_Tensor's buffer refcount retrieved from TF_GetInput and TF_AllocatedOutput are always > 1, that will make TF_ForwardInputOrAllocateOutput, TF_TensorMaybeMove failed.
TF_TensorBitcastFrom only replace TF_Tensor's buffer pointer, if TF_Tensor is copied from a proper's c++ tensor(TF_GetInput, TF_Allocateoutput), proper's c++ tensor will not be changed, that will make TF_Tensor inconsistent with src c++ tensor.

what we changed:

we add a src_tensor_ptr_ in TensorInterface which makes TF_Tensor be able to find its src c++ tensor.
when the TF_Tensor is retrieved from TF_GetInput, TF_AllocateOutput, that means TF_Tensor is used in kernel, its lifecycle is in kernel's compute function, so it no need to add reference count to src c++ tensor. We decrement the refcount after tensor.CopyFrom in TF_TensorFromTensor and increment the refcount by one before the delete of TensorInterface..

penpornk · 2021-02-24T05:59:05Z

@jzhoulon Thank you for the PR! I think @saxenasaurabh owns this part. I'll request his review.

@saxenasaurabh Would you mind helping review this PR? Or could you please suggest other reviewers? Thank you very much!

This commit is intended to fix two inplace issue in kernel C API: 1. RefCountIsOne tensor.RefCountIsOne() is the condition to decide whether the op can do inplacement(e.g. forward_input), however, TF_Tensor retrieved through TF_GetInput(), TF_AllocateOutput() makes the refcount always > 1, in TF_TensorFromTensor, it will call tensor.CopyFrom() and increment the refcount 2. TF_TensorBitcastFrom() Currently, this API only modifies TF_Tensor's buf pointer, but the src c++ tensor it copied from is not modified, that makes the TF_Tensor inconsistent with src c++ tensor, see the following example, it allocated an output TF_Tensor and modified its buffer through TF_TensorBitcastFrom(), however, src c++ tensor in Op_KernelContext is not modified. ```c++ // Example: TF_Tensor* a = TF_AllocateOutput(ctx, 0); // src c++ tensor: x TF_Tensor* b = TF_AllocateTemp(ctx); TF_TensorBitcastFrom(b, a); // a->buf_ = b_buf; // src c++ tensor x's buf is not modified, thus x->buf_ != a->buf_ ```

saxenasaurabh · 2021-03-08T20:43:45Z

cc: @allenlavoie

saxenasaurabh · 2021-03-11T15:47:07Z

Thanks for finding and proposing a solution to this. TF_Tensor holding an additional reference to the Tensor buffer is definitely a problem. However the proposed changes seem a bit complex to me. I wonder if we can change the semantics of TF_Tensor to be const Tensor&. That might require a larger API change.

cc: @rjpower

allenlavoie · 2021-03-11T18:10:04Z

This is mostly just a problem for the C kernel APIs? Making TF_Tensor always a non-owning reference seems pretty drastic (who owns the result of TFE_TensorHandleResolve, and can I feed it into an op attribute which then owns it again?).

What is the blocker for saying TF_Tensor is tensorflow::Tensor and just reinterpret_casting between the types at the C API boundaries? The TensorInterface abstraction we do to for TFRT tensors?

jzhoulon · 2021-03-13T15:42:56Z

@saxenasaurabh @allenlavoie Thanks for the comments, I didn't try to make TF_Tensor always a non-owning reference since the TF_Tensor is also used for python tensor binding. In this PR, if TF_Tensor is get from (TF_GetInput, TF_AllocateOutput), that means TF_Tensor is used as Kernel C API, in this case, TF_Tensor will not increments the reference count of Tensor. In other case(such as TF_Tensor used for python tensor bind, ), the logic is not changed.

gbaned · 2021-03-19T17:51:19Z

@allenlavoie Can you please take a look on the above comment from @jzhoulon. Thanks!

penpornk · 2021-03-19T18:14:07Z

@saxenasaurabh and I had a discussion with the team and we think adding "explicit IncRef/DecRef/Borrow semantics on TF_Tensor" might be more suitable for this. This would need an RFC and will miss TF 2.5.

@jzhoulon Would you mind closing this PR for now? We can reopen it later if it becomes relevant. (Although I think that's unlikely given the RFC direction.)

jzhoulon · 2021-03-20T02:54:16Z

Sure, looking forward the RFC!

google-ml-butler bot added the size:M CL Change Size: Medium label Feb 24, 2021

google-cla bot added the cla: yes label Feb 24, 2021

gbaned self-assigned this Feb 24, 2021

gbaned requested a review from sanjoy February 24, 2021 03:41

jzhoulon changed the title ~~Fix inplacement issue in C API~~ [Kernel C API] Fix inplacement issue in C API Feb 24, 2021

penpornk requested a review from saxenasaurabh February 24, 2021 05:58

sanjoy removed their request for review February 25, 2021 06:10

gbaned added the awaiting review Pull request awaiting review label Mar 1, 2021

tensorflowbutler removed the awaiting review Pull request awaiting review label Mar 10, 2021

gbaned added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Mar 19, 2021

jzhoulon closed this Mar 20, 2021

penpornk mentioned this pull request May 10, 2021

RFC: TensorFlow on DirectML tensorflow/community#243

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Kernel C API] Fix inplacement issue in C API #47356

[Kernel C API] Fix inplacement issue in C API #47356

Uh oh!

jzhoulon commented Feb 24, 2021

Uh oh!

jzhoulon commented Feb 24, 2021

Uh oh!

penpornk commented Feb 24, 2021

Uh oh!

saxenasaurabh commented Mar 8, 2021

Uh oh!

saxenasaurabh commented Mar 11, 2021

Uh oh!

allenlavoie commented Mar 11, 2021

Uh oh!

jzhoulon commented Mar 13, 2021

Uh oh!

gbaned commented Mar 19, 2021

Uh oh!

penpornk commented Mar 19, 2021 •

edited

Loading

Uh oh!

jzhoulon commented Mar 20, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[Kernel C API] Fix inplacement issue in C API #47356

[Kernel C API] Fix inplacement issue in C API #47356

Uh oh!

Conversation

jzhoulon commented Feb 24, 2021

Uh oh!

jzhoulon commented Feb 24, 2021

Uh oh!

penpornk commented Feb 24, 2021

Uh oh!

saxenasaurabh commented Mar 8, 2021

Uh oh!

saxenasaurabh commented Mar 11, 2021

Uh oh!

allenlavoie commented Mar 11, 2021

Uh oh!

jzhoulon commented Mar 13, 2021

Uh oh!

gbaned commented Mar 19, 2021

Uh oh!

penpornk commented Mar 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jzhoulon commented Mar 20, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

penpornk commented Mar 19, 2021 •

edited

Loading