Skip to content

Conversation

@jzhoulon
Copy link
Contributor

This commit is intended to fix two inplace issue in kernel C API:

  1. RefCountIsOne
    tensor.RefCountIsOne() is the condition to decide whether the op can
    do inplacement(e.g. forward_input), however, TF_Tensor retrieved through
    TF_GetInput(), TF_AllocateOutput() makes the refcount always > 1, in
    TF_TensorFromTensor, it will call tensor.CopyFrom() and increment the refcount

  2. TF_TensorBitcastFrom()
    Currently, this API only modifies TF_Tensor's buf pointer, but the src c++ tensor it
    copied from is not modified, that makes the TF_Tensor inconsistent with src c++ tensor,
    see the following example, it allocated an output TF_Tensor and modified its buffer through
    TF_TensorBitcastFrom(), however, src c++ tensor in Op_KernelContext is not modified.

     // Example:
     TF_Tensor* a = TF_AllocateOutput(ctx, 0); // src c++ tensor: x
     TF_Tensor* b = TF_AllocateTemp(ctx);
     TF_TensorBitcastFrom(b, a); // a->buf_ = b_buf;
     // src c++ tensor x's buf is not modified, thus x->buf_ != a->buf_

@google-ml-butler google-ml-butler bot added the size:M CL Change Size: Medium label Feb 24, 2021
@google-cla google-cla bot added the cla: yes label Feb 24, 2021
@jzhoulon
Copy link
Contributor Author

@penpornk This PR aims at solving two inplacement issue in kernel C API, please help to review, thanks very much.

  1. TF_Tensor's buffer refcount retrieved from TF_GetInput and TF_AllocatedOutput are always > 1, that will make TF_ForwardInputOrAllocateOutput, TF_TensorMaybeMove failed.
  2. TF_TensorBitcastFrom only replace TF_Tensor's buffer pointer, if TF_Tensor is copied from a proper's c++ tensor(TF_GetInput, TF_Allocateoutput), proper's c++ tensor will not be changed, that will make TF_Tensor inconsistent with src c++ tensor.

what we changed:

  1. we add a src_tensor_ptr_ in TensorInterface which makes TF_Tensor be able to find its src c++ tensor.
  2. when the TF_Tensor is retrieved from TF_GetInput, TF_AllocateOutput, that means TF_Tensor is used in kernel, its lifecycle is in kernel's compute function, so it no need to add reference count to src c++ tensor. We decrement the refcount after tensor.CopyFrom in TF_TensorFromTensor and increment the refcount by one before the delete of TensorInterface..

@gbaned gbaned self-assigned this Feb 24, 2021
@gbaned gbaned requested a review from sanjoy February 24, 2021 03:41
@jzhoulon jzhoulon changed the title Fix inplacement issue in C API [Kernel C API] Fix inplacement issue in C API Feb 24, 2021
@penpornk
Copy link
Member

@jzhoulon Thank you for the PR! I think @saxenasaurabh owns this part. I'll request his review.

@saxenasaurabh Would you mind helping review this PR? Or could you please suggest other reviewers? Thank you very much!

This commit is intended to fix two inplace issue in kernel C API:
1. RefCountIsOne
   tensor.RefCountIsOne() is the condition to decide whether the op can
   do inplacement(e.g. forward_input), however, TF_Tensor retrieved through
   TF_GetInput(), TF_AllocateOutput() makes the refcount always > 1, in
   TF_TensorFromTensor, it will call tensor.CopyFrom() and increment the refcount

2. TF_TensorBitcastFrom()
   Currently, this API only modifies TF_Tensor's buf pointer, but the src c++ tensor it
   copied from is not modified, that makes the TF_Tensor inconsistent with src c++ tensor,
   see the following example, it allocated an output TF_Tensor and modified its buffer through
   TF_TensorBitcastFrom(), however, src c++ tensor in Op_KernelContext is not modified.
   ```c++
  // Example:
    TF_Tensor* a = TF_AllocateOutput(ctx, 0); // src c++ tensor: x
    TF_Tensor* b = TF_AllocateTemp(ctx);
    TF_TensorBitcastFrom(b, a); // a->buf_ = b_buf;
    // src c++ tensor x's buf is not modified, thus x->buf_ != a->buf_
   ```
@sanjoy sanjoy removed their request for review February 25, 2021 06:10
@gbaned gbaned added the awaiting review Pull request awaiting review label Mar 1, 2021
@saxenasaurabh
Copy link
Member

cc: @allenlavoie

@tensorflowbutler tensorflowbutler removed the awaiting review Pull request awaiting review label Mar 10, 2021
@saxenasaurabh
Copy link
Member

Thanks for finding and proposing a solution to this. TF_Tensor holding an additional reference to the Tensor buffer is definitely a problem. However the proposed changes seem a bit complex to me. I wonder if we can change the semantics of TF_Tensor to be const Tensor&. That might require a larger API change.

cc: @rjpower

@allenlavoie
Copy link
Member

This is mostly just a problem for the C kernel APIs? Making TF_Tensor always a non-owning reference seems pretty drastic (who owns the result of TFE_TensorHandleResolve, and can I feed it into an op attribute which then owns it again?).

What is the blocker for saying TF_Tensor is tensorflow::Tensor and just reinterpret_casting between the types at the C API boundaries? The TensorInterface abstraction we do to for TFRT tensors?

@jzhoulon
Copy link
Contributor Author

@saxenasaurabh @allenlavoie Thanks for the comments, I didn't try to make TF_Tensor always a non-owning reference since the TF_Tensor is also used for python tensor binding. In this PR, if TF_Tensor is get from (TF_GetInput, TF_AllocateOutput), that means TF_Tensor is used as Kernel C API, in this case, TF_Tensor will not increments the reference count of Tensor. In other case(such as TF_Tensor used for python tensor bind, ), the logic is not changed.

@gbaned
Copy link
Contributor

gbaned commented Mar 19, 2021

@allenlavoie Can you please take a look on the above comment from @jzhoulon. Thanks!

@gbaned gbaned added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Mar 19, 2021
@penpornk
Copy link
Member

penpornk commented Mar 19, 2021

@saxenasaurabh and I had a discussion with the team and we think adding "explicit IncRef/DecRef/Borrow semantics on TF_Tensor" might be more suitable for this. This would need an RFC and will miss TF 2.5.

@jzhoulon Would you mind closing this PR for now? We can reopen it later if it becomes relevant. (Although I think that's unlikely given the RFC direction.)

@jzhoulon
Copy link
Contributor Author

Sure, looking forward the RFC!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: yes size:M CL Change Size: Medium stat:awaiting tensorflower Status - Awaiting response from tensorflower

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants