Skip to content

kvserver: fix race in rangefeed processor setup#53295

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
ajwerner:ajwerner/fix-race-in-rangefeed-registration
Aug 24, 2020
Merged

kvserver: fix race in rangefeed processor setup#53295
craig[bot] merged 1 commit intocockroachdb:masterfrom
ajwerner:ajwerner/fix-race-in-rangefeed-registration

Conversation

@ajwerner
Copy link
Copy Markdown
Contributor

Fixes #53183

Release note: None

@ajwerner ajwerner requested review from andreimatei and nvb August 23, 2020 20:36
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@ajwerner ajwerner force-pushed the ajwerner/fix-race-in-rangefeed-registration branch from 02f1350 to 6abcc03 Compare August 23, 2020 20:43
@ajwerner
Copy link
Copy Markdown
Contributor Author

Actually this doesn't totally work.

@ajwerner ajwerner force-pushed the ajwerner/fix-race-in-rangefeed-registration branch from 6abcc03 to a9a9e80 Compare August 23, 2020 21:21
@ajwerner
Copy link
Copy Markdown
Contributor Author

Okay, this is RFAL

@ajwerner ajwerner force-pushed the ajwerner/fix-race-in-rangefeed-registration branch from a9a9e80 to 9f64083 Compare August 23, 2020 21:22
Copy link
Copy Markdown
Contributor

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this changing the semantics of the cleanup function? Before, the semaphore would be released when the iterator was closed. Now, it's released when the registration is canceled. Maybe we could use a sync.Once with the existing iteratorWithCloser structure to avoid needing to pass a new variable through the registration?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @andreimatei, and @nvanbenschoten)


pkg/kv/kvserver/replica_rangefeed.go, line 204 at r1 (raw file):

maybeDisconnectEmptyRangefeed

Want to remove the p == nil handling from this function now?


pkg/kv/kvserver/replica_rangefeed.go, line 295 at r1 (raw file):

	p := r.rangefeedMu.proc
	if p != nil {
		reg, filter := p.Register(span, startTS, catchupIter, nil, withDiff, stream, errC)

Shouldn't we be passing cleanup here? And down below?


pkg/kv/kvserver/rangefeed/processor.go, line 361 at r1 (raw file):

// provided an error when the registration closes.
//
// If the rangefeed is successfully registered (true is returned), the

Both of these new sentences need a bit of love. This one is missing punctuation and the other is truncated.

@ajwerner ajwerner force-pushed the ajwerner/fix-race-in-rangefeed-registration branch from 9f64083 to 4ba37cc Compare August 24, 2020 04:05
@ajwerner
Copy link
Copy Markdown
Contributor Author

Yeah, the thought crossed my mind to just use a sync.Once. It's cleaner and simpler. RFAL

Copy link
Copy Markdown
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @andreimatei)


pkg/kv/kvserver/replica_rangefeed.go, line 176 at r2 (raw file):

		// Finish the iterator limit if we exit before the iterator finishes.
		// The release function will be hooked into the Close method on the
		// iterator below. The sync.Once prevents any races.

Mind spelling out which races?
Also in the commit message pls give a hint that the race was on the iterSemRelease variable.


pkg/kv/kvserver/replica_rangefeed.go, line 177 at r2 (raw file):

		// The release function will be hooked into the Close method on the
		// iterator below. The sync.Once prevents any races.
		var iterSemReleaseOnce sync.Once

I see that we have gotten to this sync.Once after something else has been tried so you can shut me up, but it seems to me that the control flow is simple enough that a sync.Once big hammer shouldn't be necessary; I think ownership of that rate semaphore token is clear. I'm thinking of turning catchUpIterFunc into a proper struct with a Close() that needs to be called by Processor.Register or delegated to the iteratorWithCloser.

Copy link
Copy Markdown
Contributor

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 4 of 5 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner)

@ajwerner
Copy link
Copy Markdown
Contributor Author

TFTR!

bors r=nvanbenschoten

@ajwerner
Copy link
Copy Markdown
Contributor Author

bors r-

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 24, 2020

Canceled.

@ajwerner ajwerner force-pushed the ajwerner/fix-race-in-rangefeed-registration branch from 4ba37cc to 8898880 Compare August 24, 2020 15:20
@ajwerner
Copy link
Copy Markdown
Contributor Author

Mind spelling out which races?
Also in the commit message pls give a hint that the race was on the iterSemRelease variable.

Added more commentary around the call site and to the commit title.

I see that we have gotten to this sync.Once after something else has been tried so you can shut me up, but it seems to me that the control flow is simple enough that a sync.Once big hammer shouldn't be necessary; I think ownership of that rate semaphore token is clear. I'm thinking of turning catchUpIterFunc into a proper struct with a Close() that needs to be called by Processor.Register or delegated to the iteratorWithCloser.

@andreimatei be my guest but I'm going to merge this to unflake the tests.

bors r=nvanbenschoten

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 24, 2020

Build failed (retrying...):

@ajwerner
Copy link
Copy Markdown
Contributor Author

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 24, 2020

Already running a review

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 24, 2020

Build failed (retrying...):

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 24, 2020

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

kv/kvnemesis: TestApplier failed

4 participants