Skip to content

roachtest: port docker-based cdc tests to roachtest#31707

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
danhhz:cdc_acceptance
Oct 24, 2018
Merged

roachtest: port docker-based cdc tests to roachtest#31707
craig[bot] merged 1 commit intocockroachdb:masterfrom
danhhz:cdc_acceptance

Conversation

@danhhz
Copy link
Copy Markdown
Contributor

@danhhz danhhz commented Oct 22, 2018

The docker ones occasionally flaked on teamcity. The acceptance test
momentum right now is definitely with roachtests, so move to them so we
get all the hottest bug fixes.

This also helps set up end-to-end avro testing, which was going to be
hard with the docker based tests because some port nonsense. (The full
gory details are in a huge comment in the file this commit deletes.)

Also makes the cdc roachtests work with --local.

Also make the bank workload's update fire more often. Previously it only
did anything if the account had enough balance to cover the transfer.
This models the real world, but in the test (where the invariant is just
that the accounts sum to 0) it was just making extra work.

Closes #30740
Closes #27292

Release note: None

@danhhz danhhz requested review from a team and mrtracy October 22, 2018 20:23
@danhhz
Copy link
Copy Markdown
Contributor Author

danhhz commented Oct 22, 2018

The teamcity failure made me realize that we don't currently have a way to run this on every PR because of the enterprise license requirement. It was using the TestingEnableEnterprise hook before but that doesn't work with a binary. And there are security concerns with doing it via env vars in an every-pr build. Maybe just running it in the nightlies is sufficient. Thoughts @mrtracy?

Copy link
Copy Markdown
Contributor

@mrtracy mrtracy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the inability to run these on every PR; is there anything being captured by the bank test (and existing roachtests) that isn't being effectively captured by unit tests, other than kafka interaction? And, is there any way for us to get any effective kafka sink tests as unit tests?

This roachtest still provides value, I think hooking the validator up to kafka is a good innovation here. However, we need to make sure we're not being too skimpy in our checkin tests.

m.Wait()
}

func runCDCErrors(ctx context.Context, t *test, c *cluster) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to be a roachtest at all; none of these cases even contact kafka!

})
m.Go(func(ctx context.Context) error {
defer workloadCancel()
l, err := c.l.ChildLogger(`changefeed`)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently we should use t.l.ChildLogger, instead of c.. Don't know why, was just informed of that in my last PR though.

m := newMonitor(workloadCtx, c, crdbNodes)
var doneAtomic int64
m.Go(func(ctx context.Context) error {
err := c.RunE(ctx, workloadNode, `./workload run bank {pgurl:0}`)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this know when to stop? To me this looks like it will run forever.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor Author

@danhhz danhhz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kafka interaction is the only thing unique about these tests at this point. The only way I could think to do that with unit tests is the thing I'm deleting. It was super fragile, hard to extend, and took weeks to get set up, I've really become convinced it wasn't the way to go. (Seriously, go read the huge comment block in the file I deleted in this commit.)

Now that I think about it, we may be able to bring these back on merge-to-master tests, but I'd like to leave that for a a followup.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/cmd/roachtest/cdc.go, line 200 at r1 (raw file):

Previously, mrtracy (Matt Tracy) wrote…

How does this know when to stop? To me this looks like it will run forever.

it's cancelled using the context when the below goroutine exits: defer workloadCancel()


pkg/cmd/roachtest/cdc.go, line 208 at r1 (raw file):

Previously, mrtracy (Matt Tracy) wrote…

Apparently we should use t.l.ChildLogger, instead of c.. Don't know why, was just informed of that in my last PR though.

Oh, must be part of andrei's refactor. Done


pkg/cmd/roachtest/cdc.go, line 275 at r1 (raw file):

Previously, mrtracy (Matt Tracy) wrote…

This doesn't need to be a roachtest at all; none of these cases even contact kafka!

Heh, thanks for pushing me on my laziness. It was connecting to the sink before doing these checks, but that's silly. Done

Took this opportunity to clean up TestChangefeedErrors which was identical between sinkless and enterprise, so it was needlessly running twice.

@benesch
Copy link
Copy Markdown
Contributor

benesch commented Oct 23, 2018

Now that I think about it, we may be able to bring these back on merge-to-master tests, but I'd like to leave that for a a followup.

I'd recommend against doing that, actually. If you fail the merge-to-master build you'll prevent binaries from getting built. (And if you somehow soft fail it, no one will ever notice that it failed.)

Maybe I'm misunderstanding what you meant, though. I think a TeamCity build that separately listened for pushes to master could be workable.

@danhhz
Copy link
Copy Markdown
Contributor Author

danhhz commented Oct 23, 2018

Yeah, I'm not sold on this, so happy to give up on it if it doesn't make sense. Just to be clear we're talking about the same thing, my strawman is the test wouldn't be run while iterating on a PR, but would be run by bors and prevent the merge if we somehow broke it. Also a concern here is having something fail only when bors-ing, which is going to be even more frustrating than usual if it flakes.

The docker ones occasionally flaked on teamcity. The acceptance test
momentum right now is definitely with roachtests, so move to them so we
get all the hottest bug fixes.

This also helps set up end-to-end avro testing, which was going to be
hard with the docker based tests because some port nonsense. (The full
gory details are in a huge comment in the file this commit deletes.)

Also makes the cdc roachtests work with --local.

Also make the bank workload's update fire more often. Previously it only
did anything if the account had enough balance to cover the transfer.
This models the real world, but in the test (where the invariant is just
that the accounts sum to 0) it was just making extra work.

Closes cockroachdb#30740
Closes cockroachdb#27292

Release note: None
@mrtracy
Copy link
Copy Markdown
Contributor

mrtracy commented Oct 23, 2018

This is LGTM, questions about nightly tests aside.

@danhhz
Copy link
Copy Markdown
Contributor Author

danhhz commented Oct 24, 2018

@benesch came up with a great idea about how to use a Vault plugin to solve the every-pr enterprise license problem for good. In the meantime, it's not ideal, but I'm comfortable with only running these tests in the nightly build.

TFTR!

bors r=mrtracy

craig bot pushed a commit that referenced this pull request Oct 24, 2018
31707: roachtest: port docker-based cdc tests to roachtest r=mrtracy a=danhhz

The docker ones occasionally flaked on teamcity. The acceptance test
momentum right now is definitely with roachtests, so move to them so we
get all the hottest bug fixes.

This also helps set up end-to-end avro testing, which was going to be
hard with the docker based tests because some port nonsense. (The full
gory details are in a huge comment in the file this commit deletes.)

Also makes the cdc roachtests work with --local.

Also make the bank workload's update fire more often. Previously it only
did anything if the account had enough balance to cover the transfer.
This models the real world, but in the test (where the invariant is just
that the accounts sum to 0) it was just making extra work.

Closes #30740
Closes #27292

Release note: None

Co-authored-by: Daniel Harrison <daniel.harrison@gmail.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Oct 24, 2018

Build succeeded

@craig craig bot merged commit 0fecc22 into cockroachdb:master Oct 24, 2018
Copy link
Copy Markdown
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/cmd/roachtest/cdc.go, line 360 at r2 (raw file):

		Nodes:      nodes(4),
		Stable:     true, // DO NOT COPY to new tests
		SubTests: []testSpec{{

Any reason for a test with a single subtest (as opposed to just a test)? I don't think this arrangement currently carries any semantics; does it?
Asking cause I'm generally trying to get rid of subtests.

Copy link
Copy Markdown
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/cmd/roachtest/cdc.go, line 364 at r2 (raw file):

			Stable: true, // DO NOT COPY to new tests
			Run: func(ctx context.Context, t *test, c *cluster) {
				c.Wipe(ctx)

also, is there a point to this wipe()?

@danhhz
Copy link
Copy Markdown
Contributor Author

danhhz commented Oct 24, 2018

There will be more tests here in the future, but I'm fine making this not a subtest

@andreimatei
Copy link
Copy Markdown
Contributor

andreimatei commented Oct 24, 2018 via email

@danhhz danhhz deleted the cdc_acceptance branch October 25, 2018 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

teamcity: failed test: TestCDC acceptanceccl: revisit cdc tests as acceptance tests

5 participants