Skip to content

fix lost tsCache updates#3175

Merged
tbg merged 4 commits intocockroachdb:masterfrom
tbg:repro_3148
Nov 21, 2015
Merged

fix lost tsCache updates#3175
tbg merged 4 commits intocockroachdb:masterfrom
tbg:repro_3148

Conversation

@tbg
Copy link
Copy Markdown
Member

@tbg tbg commented Nov 19, 2015

previously, concurrent reads during the execution of the
splitTrigger could add updates to the timestamp cache
on the old range after it had already been copied to
the new range. With this change, reads happen atomically
with respect to commit triggers.

fixes #3148.

Review on Reviewable

@tamird
Copy link
Copy Markdown
Contributor

tamird commented Nov 19, 2015

Reviewed 2 of 2 files at r1.
Review status: all files reviewed at latest revision, 1 unresolved discussion.


storage/replica_command.go, line 1400 [r1] (raw file):
this seems weird; if this lock isn't necessary, remove it.


Comments from the review on Reviewable.io

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Nov 19, 2015

Review status: all files reviewed at latest revision, 1 unresolved discussion.


storage/replica_command.go, line 1400 [r1] (raw file):
It's not necessary here because I know (or at least am reasonably certain) that nobody can concurrently access it with readMu locked at the time of writing, but tsCache is one of the fields protected by the replica lock and so I should grab it regardless. I can remove the comment if you prefer that.


Comments from the review on Reviewable.io

@tamird
Copy link
Copy Markdown
Contributor

tamird commented Nov 19, 2015

Review status: all files reviewed at latest revision, all discussions resolved.


storage/replica_command.go, line 1400 [r1] (raw file):
Fine as is.


Comments from the review on Reviewable.io

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readMu sounds confusingly similar to the RLock of the embedded RWMutex. Maybe call it readOnlyCmdMu?

I'd add more detail to the comment: "Held in read mode during read-only commands. Held in exclusive mode to prevent read-only commands from executing. Acquired before the embedded RWMutex"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@bdarnell
Copy link
Copy Markdown
Contributor

Can you add a test that deliberately races a split with a read-only command? It doesn't have to consistently trigger the issue but it would be good to have a more reliable and explicit test than the rare failure of sql.TestConcurrentIncrements. (which only triggered this issue because of the non-obvious fact that table creation leads to an asynchronous split)

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Nov 19, 2015

I'll see what I can do about the test.

@tbg tbg force-pushed the repro_3148 branch 2 times, most recently from eba97ba to 08a4168 Compare November 20, 2015 00:21
@tbg
Copy link
Copy Markdown
Member Author

tbg commented Nov 20, 2015

Review status: 0 of 3 files reviewed at latest revision, 2 unresolved discussions.


storage/replica_command.go, line 481 [r1] (raw file):
I must be thick but to me it still looks like the first batch.Defered action is the unlock, so that should be the very last thing called (i.e. all the other batch.Defered functions should be under the lock).


Comments from the review on Reviewable.io

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Nov 20, 2015

PTAL @bdarnell. I added a test that's flaky as hell before this fix, and also took into account the fact that the batch.Defer() would not run if the commit trigger returned an error, deadlocking the range.

@tamird
Copy link
Copy Markdown
Contributor

tamird commented Nov 20, 2015

Reviewed 2 of 2 files at r2, 2 of 2 files at r3, 1 of 1 files at r4, 2 of 2 files at r5.
Review status: all files reviewed at latest revision, 3 unresolved discussions.


storage/client_split_test.go, line 790 [r3] (raw file):
lets


Comments from the review on Reviewable.io

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Nov 20, 2015

TestRaftRemoveRace fails very reliably on this branch. The new lock must've changed timings in a way that make the errors much more likely. See #3178 #3010.

@bdarnell
Copy link
Copy Markdown
Contributor

LGTM


Reviewed 2 of 2 files at r1, 2 of 2 files at r2, 2 of 2 files at r3, 1 of 1 files at r4.
Review status: 1 of 4 files reviewed at latest revision, 1 unresolved discussion.


storage/replica_command.go, line 481 [r1] (raw file):
No, it's me being thick; I looked at batch.Defer and didn't notice we reversed the list when we called the functions.


Comments from the review on Reviewable.io

tbg added 4 commits November 20, 2015 20:04
previously, concurrent reads during the execution of the
splitTrigger could add updates to the timestamp cache
on the old range after it had already been copied to
the new range. With this change, reads happen atomically
with respect to commit triggers.

fixes cockroachdb#3148.
tbg added a commit that referenced this pull request Nov 21, 2015
@tbg tbg merged commit 02c7b12 into cockroachdb:master Nov 21, 2015
@tbg tbg deleted the repro_3148 branch November 21, 2015 01:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Non-serializable behavior in TestConcurrentIncrements

3 participants