Skip to content

roachtest/cdc: export stats for initial scan test to roachperf#92669

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
jayshrivastava:roachperf-cdc-metrics
Nov 30, 2022
Merged

roachtest/cdc: export stats for initial scan test to roachperf#92669
craig[bot] merged 1 commit intocockroachdb:masterfrom
jayshrivastava:roachperf-cdc-metrics

Conversation

@jayshrivastava
Copy link
Copy Markdown
Contributor

@jayshrivastava jayshrivastava commented Nov 29, 2022

This change updates the cdc/initial_scan_only test to produce a stats.json artifact to be consumed by roachprod. This file contains stats for p99 foreground latency, changefeed throughput, and CPU usage.

Release note: None
Epic: None

image

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you have to set a window for stats collection, I decided to create a start and stop API. The idea is that you call start before starting changefeeds and call export once they finish.

@jayshrivastava jayshrivastava force-pushed the roachperf-cdc-metrics branch 2 times, most recently from 977a6fe to 4f846df Compare November 29, 2022 15:37
@jayshrivastava jayshrivastava marked this pull request as ready for review November 29, 2022 15:37
@jayshrivastava jayshrivastava requested a review from a team as a code owner November 29, 2022 15:37
@jayshrivastava jayshrivastava requested review from herkolategan and smg260 and removed request for a team November 29, 2022 15:37
Copy link
Copy Markdown
Contributor

@samiskin samiskin Nov 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about not even storing a cdcStatsExporter struct at all by just having

func (ct *cdcTester) startStatsCollection() func() {
  promClient, err := clusterstats.SetupCollectorPromClient(ct.ctx, ct.cluster, ct.t.L(), cfg)
	if err != nil {
		ct.t.Errorf("error creating prometheus client for stats collector: %s", err)
	}
  statsCollector := clusterstats.NewStatsCollector(ct.ctx, promClient)
  startTime = timeutil.Now()
  return func() { // this returned function is then called in the test to export to roachperf
    	endTime := timeutil.Now()
	     ...statsCollector.Exporter().Export(...)...
  }
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this better than what I had, but the reason I stored the stats exporter was because it gets created in newCDCTester > startGrafana. I just pushed a small refactor. Let me know if this is better.

Copy link
Copy Markdown
Contributor

@samiskin samiskin Nov 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need to be created there though rather than just in the function itself? Grafana can be assumed to always be running for an existing cdcTester since we only don't start it when SkipInit is set since the earlier instance should still be running. I feel like its worth just commenting on the function that this requires prometheus.

If it does need to be at the very beginning, I'd rather just have it be a property in cdcTester rather than an extra return value.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created it in startGrafana because we had the prometheus config in scope there. It is weird to initialize it there though. I would also prefer doing it in startStatsCollection. The only issue is that we have to store the promCfg in the cdcTester, which I think is fine.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@jayshrivastava jayshrivastava force-pushed the roachperf-cdc-metrics branch 3 times, most recently from ba63cb7 to 7c6e8ee Compare November 30, 2022 18:41
Copy link
Copy Markdown
Contributor

@samiskin samiskin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Comment on lines +37 to +41
cpuUsageAgg = clusterstats.AggQuery{
Stat: cpuUsage,
Query: "avg(sys_cpu_combined_percent_normalized) * 100",
Tag: "CPU Utilization (%)",
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if this is just the crdb nodes or does it also include the workload node?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. This just measures the CRDB nodes. Combined/normalized refers to normalization by number of cores. The metric description is Current user+system cpu percentage, normalized 0-1 by number of cores.

This change updates the cdc/initial_scan_only test to produce a
`stats.json` artifact to be consumed by roachprod. This file
contains stats for p99 foreground latency, changefeed throughput,
and CPU usage.

Release note: None
@jayshrivastava
Copy link
Copy Markdown
Contributor Author

bors r+

@craig craig bot merged commit 21ec411 into cockroachdb:master Nov 30, 2022
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Nov 30, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants