Make testserver multi-node more robust. by RichardJCai · Pull Request #148 · cockroachdb/cockroach-go

RichardJCai · 2022-09-21T20:29:29Z

Fix race condition in pollListeningURLFile.
Add Opts for init node timeout and poll listening url.
- InitTimeoutOpt, PollListenURLTimeoutOpt
Parallelize restart test

cockroach-teamcity · 2022-09-21T20:29:34Z

This change is

- Fix race condition in pollListeningURLFile. - Add Opts for init node timeout and poll listening url. - InitTimeoutOpt, PollListenURLTimeoutOpt - Parallelize restart test

pawalt

Is there any way we can test the added arguments? Maybe make a testserver configuration we know will timeout on init and time it to make sure the option is propogated properly?

Reviewed all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rafiss, @RichardJCai, and @ZhouXing19)

testserver/testserver.go line 814 at r1 (raw file):

		ts.mu.RLock()

		nodeDir := fmt.Sprintf("%s%d", ts.baseDir, i)

Why do we need to remove the individual directories if we're going to removeAll after anyway? Is there some case in which we'd fail to remove a single node's directory but not the others?

testserver/testserver_test.go line 463 at r1 (raw file):

			for j := 0; j < 3; j++ {
				for {
					port, err := getFreePort()

This is a very edge case, but this could race if another process is attempting to get a port from:0 at the same time. If that process does a TCP listen on :0 between when these ports are allocated and the testserver is started, it could get assigned one of the ports you've allocated here.

I think if you just feed the testserver :0 as the port to listen on, it'll start up guaranteeing port uniqueness, and you won't have to do any of this allocation.

RichardJCai · 2022-09-22T15:30:52Z

Is there any way we can test the added arguments? Maybe make a testserver configuration we know will timeout on init and time it to make sure the option is propogated properly?

Reviewed all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rafiss, @RichardJCai, and @ZhouXing19)

testserver/testserver.go line 814 at r1 (raw file):
		ts.mu.RLock()

		nodeDir := fmt.Sprintf("%s%d", ts.baseDir, i)
Why do we need to remove the individual directories if we're going to removeAll after anyway? Is there some case in which we'd fail to remove a single node's directory but not the others?

testserver/testserver_test.go line 463 at r1 (raw file):
			for j := 0; j < 3; j++ {
				for {
					port, err := getFreePort()
This is a very edge case, but this could race if another process is attempting to get a port from:0 at the same time. If that process does a TCP listen on :0 between when these ports are allocated and the testserver is started, it could get assigned one of the ports you've allocated here.

I think if you just feed the testserver :0 as the port to listen on, it'll start up guaranteeing port uniqueness, and you won't have to do any of this allocation.

RichardJCai · 2022-09-22T15:30:55Z

Is there any way we can test the added arguments? Maybe make a testserver configuration we know will timeout on init and time it to make sure the option is propogated properly?

Reviewed all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rafiss, @RichardJCai, and @ZhouXing19)

testserver/testserver.go line 814 at r1 (raw file):
		ts.mu.RLock()

		nodeDir := fmt.Sprintf("%s%d", ts.baseDir, i)
Why do we need to remove the individual directories if we're going to removeAll after anyway? Is there some case in which we'd fail to remove a single node's directory but not the others?

testserver/testserver_test.go line 463 at r1 (raw file):
			for j := 0; j < 3; j++ {
				for {
					port, err := getFreePort()
This is a very edge case, but this could race if another process is attempting to get a port from:0 at the same time. If that process does a TCP listen on :0 between when these ports are allocated and the testserver is started, it could get assigned one of the ports you've allocated here.

I think if you just feed the testserver :0 as the port to listen on, it'll start up guaranteeing port uniqueness, and you won't have to do any of this allocation.

RichardJCai

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @pawalt, @rafiss, and @ZhouXing19)

testserver/testserver.go line 814 at r1 (raw file):

Previously, pawalt (Peyton Walters) wrote…

Why do we need to remove the individual directories if we're going to removeAll after anyway? Is there some case in which we'd fail to remove a single node's directory but not the others?

They're separate directories, not nested. Ie the base one looks like cockroach-testserver123 and the node directories are cockroach-testserver1230, cockroach-testserver1231...

testserver/testserver_test.go line 463 at r1 (raw file):

This is a very edge case, but this could race if another process is attempting to get a port from:0 at the same time. If that process does a TCP listen on :0 between when these ports are allocated and the testserver is started, it could get assigned one of the ports you've allocated here.

Yeah I thought about using :0 but we need to know what address to feed into the --join flag which includes the port. Pretty annoying problem here.

pawalt

LGTM. Your call on testing the new arguments. Functionality looks pretty obvious, but never hurts to have some tests.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rafiss, @RichardJCai, and @ZhouXing19)

testserver/testserver_test.go line 463 at r1 (raw file):

Previously, RichardJCai (Richard Cai) wrote…

This is a very edge case, but this could race if another process is attempting to get a port from:0 at the same time. If that process does a TCP listen on :0 between when these ports are allocated and the testserver is started, it could get assigned one of the ports you've allocated here.

Yeah I thought about using :0 but we need to know what address to feed into the --join flag which includes the port. Pretty annoying problem here.

Hmmm that is annoying. Thanks for the clarification.

RichardJCai · 2022-09-22T19:57:16Z

LGTM. Your call on testing the new arguments. Functionality looks pretty obvious, but never hurts to have some tests.

I'm allergic to adding tests for timeouts. Jk I'll add a simple one.

RichardJCai · 2022-10-03T18:39:50Z

Actually I'm gonna merge this as is, adding a testhook for timeout is somewhat annoying and I'm gonna try to get the testserver into CRDB before I leave.

RichardJCai force-pushed the make_multinode_testserver_robust branch 3 times, most recently from a98fbfc to fc0e44c Compare September 21, 2022 21:05

Make testserver multi-node more robust.

dd2bfb9

- Fix race condition in pollListeningURLFile. - Add Opts for init node timeout and poll listening url. - InitTimeoutOpt, PollListenURLTimeoutOpt - Parallelize restart test

RichardJCai force-pushed the make_multinode_testserver_robust branch from fc0e44c to dd2bfb9 Compare September 21, 2022 21:13

RichardJCai requested review from ZhouXing19, pawalt and rafiss September 21, 2022 21:17

pawalt suggested changes Sep 22, 2022

View reviewed changes

RichardJCai closed this Sep 22, 2022

RichardJCai reopened this Sep 22, 2022

RichardJCai commented Sep 22, 2022

View reviewed changes

pawalt approved these changes Sep 22, 2022

View reviewed changes

RichardJCai merged commit f3c635a into master Oct 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make testserver multi-node more robust.#148

Make testserver multi-node more robust.#148
RichardJCai merged 1 commit intomasterfrom
make_multinode_testserver_robust

RichardJCai commented Sep 21, 2022

Uh oh!

cockroach-teamcity commented Sep 21, 2022

Uh oh!

pawalt left a comment

Uh oh!

RichardJCai commented Sep 22, 2022

Uh oh!

RichardJCai commented Sep 22, 2022

Uh oh!

RichardJCai left a comment

Uh oh!

pawalt left a comment

Uh oh!

RichardJCai commented Sep 22, 2022

Uh oh!

RichardJCai commented Oct 3, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

RichardJCai commented Sep 21, 2022

Uh oh!

cockroach-teamcity commented Sep 21, 2022

Uh oh!

pawalt left a comment

Choose a reason for hiding this comment

Uh oh!

RichardJCai commented Sep 22, 2022

Uh oh!

RichardJCai commented Sep 22, 2022

Uh oh!

RichardJCai left a comment

Choose a reason for hiding this comment

Uh oh!

pawalt left a comment

Choose a reason for hiding this comment

Uh oh!

RichardJCai commented Sep 22, 2022

Uh oh!

RichardJCai commented Oct 3, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants