Skip to content

roachtest: Index overload automation of TPC-E#90005

Closed
andrewbaptist wants to merge 2 commits intocockroachdb:masterfrom
andrewbaptist:221014.min_cpu
Closed

roachtest: Index overload automation of TPC-E#90005
andrewbaptist wants to merge 2 commits intocockroachdb:masterfrom
andrewbaptist:221014.min_cpu

Conversation

@andrewbaptist
Copy link
Copy Markdown

@andrewbaptist andrewbaptist commented Oct 14, 2022

This patch adds the ability to run tests against n2 standard from within roachprod / roachtest.

Release note: None
Epic: None

@andrewbaptist andrewbaptist requested a review from a team as a code owner October 14, 2022 19:19
@andrewbaptist andrewbaptist requested review from herkolategan and smg260 and removed request for a team October 14, 2022 19:19
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@andrewbaptist andrewbaptist force-pushed the 221014.min_cpu branch 3 times, most recently from b14f2d0 to f4c557c Compare October 20, 2022 19:39
@andrewbaptist andrewbaptist changed the title roachprod: Add support for GCE n2-standard roachtest: Index overload automation of TPC-E Oct 24, 2022
@irfansharif irfansharif requested a review from a team October 24, 2022 15:33
@andrewbaptist andrewbaptist force-pushed the 221014.min_cpu branch 2 times, most recently from 73f08d3 to 74543ca Compare October 27, 2022 20:17
Copy link
Copy Markdown
Contributor

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that @dt's working on adding roachtest libraries to snapshot GCE images and use them for quicker turn around times. He's also actually doing it for TPC-E restores I think. This PR should wait until that gets done -- this is far too long running and $$$ to be useful otherwise.

@andrewbaptist
Copy link
Copy Markdown
Author

What if we merged it with the skip test flag - since the ability to run skipped tests is about to merge

Copy link
Copy Markdown
Contributor

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok merging this test as skipped until the GCE snapshotting work but we ought to improve the TPC-E harness first here instead of duplicating a bunch of code (even if this was the easiest thing to do at the start). Left some comments below but the TL;DR is lets make this as good as the TPC-C harness; I think it's doable since it's not a ton of code for either today. Feel free to do this work in whatever PRs/commits make life easiest for you.

"github.com/cockroachdb/errors"
)

// TODO(baptist): This shares a lot of code with tpce.go. Consider ways to refactor.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do this now.


// Wait for the load to run for a while before starting the index build.
select {
case <-time.After(opts.delayBeforeIndex):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look at how the TPC-C harness does things. It's very flexible, and better in many regards. There we have that During callback where test authors can inject whatever sleep they want during a test run, and this sort of thing would apply. So I would suggest improving the TPC-E test harness first as part of this PR and then adding a good use of it. It'll also encourage more TPC-E roachtests from other teams.

t.Status("creating index (~30min)")

db := c.Conn(ctx, t.L(), 1)
_, err := db.ExecContext(ctx, "CREATE INDEX ON tpce.cash_transaction (ct_dts)")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same suggestion as the other PR -- use an explicit, unique name, suffixed perhaps by the timestamp this was run at so that repeated runs are made easy.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized when iterating with that one that not specifying a name gives the flexibility to re-run the identical create index (the system does this automatically) - so its not necessary to pick an explicit name unless you want to drop it in the end. Since I wasn't dropping I didn't pick a name. Let me know if you think I should change this.

})

// This test takes about 1 hours to run.
r.Add(registry.TestSpec{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meh, let's just have the one test above and make sure it runs <1h. And also, within reason, support either --local and/or --skip-init if you use an already set up cluster to keep re-running things without the bulk import. This TPC-E harness also needs to integrate with prometheus/grafana (I see you've done so manually above), let's do that in this PR -- steal as many ideas from the TPC-C harness when doing so.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this request. I don't think it is possible to create a test that runs <1hr, uses default settings and sees this overload issue (since it needs 1000 L0 files as part of index creation). I wasn't sure how best to approach this, so definitely open to ideas on how to do this (combination of skipped tests, short test that doesn't show the issue or other).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the small test below.

@irfansharif irfansharif requested a review from a team October 28, 2022 15:56
@blathers-crl blathers-crl bot requested a review from irfansharif October 28, 2022 16:58
@irfansharif irfansharif marked this pull request as draft January 9, 2023 15:32
@irfansharif irfansharif removed their request for review January 20, 2023 17:03
@blathers-crl blathers-crl bot requested a review from irfansharif February 28, 2023 15:06
This patch adds the ability to run tests against n2 standard
from within roachprod / roachtest.

Epic: none

Release note: None
@andrewbaptist andrewbaptist force-pushed the 221014.min_cpu branch 4 times, most recently from 13c6837 to 0a24449 Compare March 9, 2023 04:57
This introduces a test which runs an index creation. During the index
creation the latency of foreground traffic spikes even though it is not
actually using the index. The cause of this is contention for resources
between the foreground workload and the index creation at multiple
levels (CPU, IO, Pebble sublevels).

Release note: None

Epic: none
@irfansharif
Copy link
Copy Markdown
Contributor

This is subsumed by #103816.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants