Skip to content

Conversation

@mr-salty
Copy link
Contributor

@mr-salty mr-salty commented Jul 28, 2020

If an operation which has TransactionSelector::begin set in order to
implicitly begin a transaction fails, explicitly call BeginTransaction.

If that fails, invalidate the Transaction so all subsequent operations
fail - most importantly Commit, which is necessary to maintain atomicity
since the first operation did not execute.

Otherwise, retry the original operation using the transaction ID
obtained from BeginTransaction. That may fail or succeed, but even if
it fails, it may be a "non-constraint error" (read errors and certain
types of write/dml errors) which permit the rest of the Transaction to
succeed. Beginning the transaction leaves it to the backend to make
that decision.

Fixes #4516


This change is Reviewable

If an operation which has `TransactionSelector::begin` set in order to
implicitly begin a transaction fails, explicitly call `BeginTransaction`.

If that fails, invalidate the `Transaction` so all subsequent operations
fail - most importantly `Commit`, which is necessary to maintain atomicity
since the first operation did not execute.

Otherwise, retry the original operation using the transaction ID
obtained from `BeginTransaction`. That may fail or succeed, but even if
it fails, it may be a "non-constraint error" (read errors and certain
types of write/dml errors) which permit the rest of the `Transaction` to
succeed. Beginning the transaction leaves it to the backend to make
that decision.

Fixes googleapis#4516
@mr-salty mr-salty requested a review from devbww July 28, 2020 05:46
@mr-salty mr-salty requested a review from a team as a code owner July 28, 2020 05:46
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Jul 28, 2020
@mr-salty
Copy link
Contributor Author

let me take a look at the build failures... it passed all unit tests and integration tests before I committed.

@codecov
Copy link

codecov bot commented Jul 28, 2020

Codecov Report

Merging #4706 into master will decrease coverage by 0.06%.
The diff coverage is 93.63%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4706      +/-   ##
==========================================
- Coverage   92.37%   92.30%   -0.07%     
==========================================
  Files         904      911       +7     
  Lines       69792    70074     +282     
==========================================
+ Hits        64471    64685     +214     
- Misses       5321     5389      +68     
Impacted Files Coverage Δ
google/cloud/spanner/internal/connection_impl.cc 93.05% <82.92%> (-3.00%) ⬇️
...gle/cloud/spanner/internal/connection_impl_test.cc 98.24% <100.00%> (+0.15%) ⬆️
...cloud/spanner/internal/logging_result_set_reader.h 0.00% <0.00%> (-100.00%) ⬇️
...torage/internal/logging_resumable_upload_session.h 0.00% <0.00%> (-100.00%) ⬇️
...gle/cloud/storage/internal/range_from_pagination.h 3.84% <0.00%> (-84.62%) ⬇️
google/cloud/storage/internal/object_streambuf.h 17.64% <0.00%> (-82.36%) ⬇️
google/cloud/bigtable/testing/mock_admin_client.h 2.56% <0.00%> (-69.24%) ⬇️
...gle/cloud/storage/internal/curl_download_request.h 16.66% <0.00%> (-66.67%) ⬇️
google/cloud/spanner/retry_policy.h 41.66% <0.00%> (-58.34%) ⬇️
google/cloud/spanner/instance.h 50.00% <0.00%> (-50.00%) ⬇️
... and 82 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8814c51...34d8a10. Read the comment docs.

@mr-salty
Copy link
Contributor Author

PTAL - forgot to check ok() in one place before dereferencing (and the first test failures I looked at sent me down the wrong rathole, it looked like some gmock issue)... should have looked at asan first - and also #4707 which might have made this more obvious.

@mr-salty
Copy link
Contributor Author

well, apparently the gmock thing was an actual issue. It sometimes thinks one of the GrpcReader calls is a sequence prerequisite for one of the SpannerStub calls, I don't understand why (is there some threading involved here? even that doesn't seem to fit the symptoms) but I may be holding InSequence incorrectly, especially combined with a lambda. I think this more explicit version should work but I'm not able to reproduce the problem reliably.

In any case, I think it's an issue with the test and not the code so I think you can still review it. If the latest commit doesn't fix it I'll take another look in the morning.

@mr-salty
Copy link
Contributor Author

looks like that fixed the test issues although Brad and I were discussing a change to handing the case where we get an "OK" response and no transaction - we should probably handle that differently... although I think that could be done in a follow-up PR, it's a corner case that probably won't occur in practice (and the checked in code does not handle it so it's not a regression).

@mr-salty
Copy link
Contributor Author


google/cloud/spanner/internal/connection_impl.cc, line 457 at r2 (raw file):

Previously, devbww (Bradley White) wrote…

I don't think we should do this if the response was OK.

my earlier comment: Brad and I were discussing a change to handing the case where we get an "OK" response and no transaction - we should probably handle that differently... although I think that could be done in a follow-up PR, it's a corner case that probably won't occur in practice (and the checked in code does not handle it so it's not a regression).

Copy link
Contributor Author

@mr-salty mr-salty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 7 files reviewed, 7 unresolved discussions (waiting on @devbww)


google/cloud/spanner/internal/connection_impl.cc, line 457 at r2 (raw file):

Previously, mr-salty (Todd Derr) wrote…

my earlier comment: Brad and I were discussing a change to handing the case where we get an "OK" response and no transaction - we should probably handle that differently... although I think that could be done in a follow-up PR, it's a corner case that probably won't occur in practice (and the checked in code does not handle it so it's not a regression).

Ack


google/cloud/spanner/internal/connection_impl.cc, line 460 at r2 (raw file):

Previously, devbww (Bradley White) wrote…

And, given that the response was not OK, I think this should return response.status().

Ack


google/cloud/spanner/internal/connection_impl.cc, line 464 at r2 (raw file):

Previously, devbww (Bradley White) wrote…

In general, I thought the control flow would look like:

{
  auto request = <setup-request>();
  for (;;) {
    *request.mutable_transaction() = *s;
    auto response = <handle-request>(request);
    if (s->has_begin()) {
      if (!response) {
        auto begin = BeginTransaction(...);
        if (!begin) {
          s = begin.status();
          return response;
        }
        s->set_id(begin->id);
        continue;  // try again with id
      }
      if (response->has_transaction()) {
        s->set_id(response->transaction().id());
      }
    }
    return <handle-response>(response);
  } 
} 

ack. This doesn't handle ok + no transaction, which I think we should per our offline discussion


google/cloud/spanner/internal/connection_impl_test.cc, line 418 at r2 (raw file):

Previously, devbww (Bradley White) wrote…

Nit: I believe we choose to omit the () when there are no parameters.

Done


google/cloud/spanner/internal/connection_impl_test.cc, line 427 at r2 (raw file):

Previously, devbww (Bradley White) wrote…

Is this resolved?

Done


google/cloud/spanner/internal/connection_impl_test.cc, line 453 at r2 (raw file):

Previously, devbww (Bradley White) wrote…

Use StatusIs() matcher?

I was planning to make another pass to clean all these existing uses up


google/cloud/spanner/internal/connection_impl_test.cc, line 751 at r2 (raw file):

Previously, devbww (Bradley White) wrote…

s/ -/ as/? Or, two sentences?

Done.


google/cloud/spanner/internal/connection_impl_test.cc, line 768 at r2 (raw file):

Previously, devbww (Bradley White) wrote…

s/ther/the/

Done.

Copy link
Contributor

@devbww devbww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 7 files reviewed, 3 unresolved discussions (waiting on @devbww)


google/cloud/spanner/internal/connection_impl_test.cc, line 453 at r2 (raw file):

Previously, mr-salty (Todd Derr) wrote…

I was planning to make another pass to clean all these existing uses up

It looked like a new use.

Copy link
Contributor

@devbww devbww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r3.
Reviewable status: 5 of 7 files reviewed, 3 unresolved discussions (waiting on @devbww)

Copy link
Contributor

@devbww devbww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r2, 1 of 1 files at r4.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @devbww)

If we requested a transaction with `TransactionSelector::begin` in a
request but the response does not contain a `Transaction`, that is
an invalid response and we now consistently treat it as such by marking
the transaction as invalid and returning an error. I added a test for
that.

also, if an operation fails, and the subsequent `BeginTransaction` also
fails, return the operation status rather than the `BeginTransaction`
status.
Copy link
Contributor Author

@mr-salty mr-salty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL

Reviewable status: 5 of 7 files reviewed, 3 unresolved discussions (waiting on @devbww)


google/cloud/spanner/internal/connection_impl.cc, line 457 at r2 (raw file):

Previously, mr-salty (Todd Derr) wrote…

Ack

ok, this is now done


google/cloud/spanner/internal/connection_impl.cc, line 460 at r2 (raw file):

Previously, mr-salty (Todd Derr) wrote…

Ack

Done.


google/cloud/spanner/internal/connection_impl.cc, line 464 at r2 (raw file):

Previously, mr-salty (Todd Derr) wrote…

ack. This doesn't handle ok + no transaction, which I think we should per our offline discussion

Done, with the ok + no transaction changes

@mr-salty
Copy link
Contributor Author


google/cloud/spanner/internal/connection_impl_test.cc, line 453 at r2 (raw file):

Previously, devbww (Bradley White) wrote…

It looked like a new use.

Done

Copy link
Contributor Author

@mr-salty mr-salty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 7 files reviewed, 5 unresolved discussions (waiting on @devbww)


google/cloud/spanner/internal/connection_impl.cc, line 407 at r5 (raw file):

Previously, devbww (Bradley White) wrote…

Observation: It still seems strange to me to fail an operation that actually succeeded (despite the missing transaction ID), but I guess I'm happy in the knowledge that it never happens.

If the first streaming read/query response has no metadata we also fail with INTERNAL, this feels like the same thing to me - we got a response that doesn't look like it was supposed to and there's no obvious way to continue.


google/cloud/spanner/internal/connection_impl.cc, line 466 at r5 (raw file):

Previously, devbww (Bradley White) wrote…

Opt: I'd change all the

if (B) {
  S1;
} else {
  S2;
  return;
}
S3;

to

if (!B) {
  S2;
  return;
}
S1;
S3;

Done.

Copy link
Contributor

@devbww devbww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r5, 1 of 1 files at r7.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

@mr-salty mr-salty merged commit b5001cf into googleapis:master Jul 29, 2020
@mr-salty mr-salty deleted the acid branch July 29, 2020 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: yes This human has signed the Contributor License Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

atomicity is violated if the first operation in a RW Transaction fails

3 participants