Collect aggregates from expvar metrices by lahsivjar · Pull Request #8174 · elastic/apm-server

lahsivjar · 2022-05-20T07:24:24Z

Motivation/summary

Collect aggregated metrics from the result of expvar. Currently collecting max, min and average.

Checklist

~~- [ ] Update CHANGELOG.asciidoc~~
~~- [ ] Update package changelog.yml (only if changes to apmpackage have been made)~~
~~- [ ] Documentation has been updated~~

How to test these changes

Run a local or cloud APM-Server

Run apmbench test with -detailed flag

cd systemtest/cmd/apmbench
go run main.go -server http://localhost:8200 -detailed

Observe the following fields in the output: gc_cycles, max_rss, max_goroutines, max_heap_alloc, max_heap_objects and mean_available_indexers

Related issues

Closes #7817

mergify · 2022-05-20T07:24:42Z

This pull request does not have a backport label. Could you fix it @lahsivjar? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-7.x is the label to automatically backport to the 7.x branch.
backport-7./d is the label to automatically backport to the 7./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

ghost · 2022-05-20T07:33:12Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2022-06-01T03:36:07.431+0000
Duration: 29 min 36 sec

Test stats 🧪

Test	Results
Failed	0
Passed	4006
Skipped	13
Total	4019

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.
/hey-apm : Run the hey-apm benchmark.
/package : Generate and publish the docker images.
/test windows : Build & tests on Windows.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

ghost · 2022-05-22T08:06:35Z

🌐 Coverage report

Name	Metrics % (`covered/total`)	Diff
Packages	100.0% (`42/42`)	💚
Files	91.96% (`183/199`)	👍
Classes	93.421% (`426/456`)	👍
Methods	89.256% (`1080/1210`)	👍
Lines	76.846% (`13146/17107`)	👍 0.041
Conditionals	100.0% (`0/0`)	💚

axw

@lahsivjar @marclop what do you think about making these metrics opt-in?

I fear the output will become unwieldy if we add so many metrics. Do you anticipate using these regularly, or only (interactively) after diagnosing a drop in throughput?

lahsivjar · 2022-05-23T02:09:03Z

@axw I agree with the output getting unwieldy. For automated benchmark, IMO, we should always index the results to ES but for other use cases around manual benchmarking I think optional metric opt-in flag for extra metrics will be really useful. @marclop WDYT?

marclop · 2022-05-24T07:35:58Z

@axw I think that would be better and we should consider categorizing the metrics in essential and optional categories. This may improve the output and the mental strain of parsing all these number thrown at you. I should've added some of these ideas to the the original issue when I created it and start the discussion then.

There are some metrics that aren't really that useful, particularly when the number is 0, one that comes to mind is error_responses/sec.

Since we mimick the output of go test -bench, we have certain constraints on what we need to publish (name, iterations and ns/op), additionally we print B/op and allocs/op which would be the metrics reported when -benchmem is set in go test -bench.

Of our custom metrics, we don't really need all of them by default, if we're just doing some quick benchmarks, it may be better to limit the number of other metrics that are printed out. I believe that it would be more effective to print less metrics by default and omit those that are 0. the output of these could be as below (the final number of metrics to be printed is yet TBD)

$ ./apmbench
benchmarkName   100   1000 ns/op  6000 events/sec   1276948 B/op   2831 allocs/op

In case that there are some error responses, we could report those, but they needn't be included in all the results:

$ ./apmbench
benchmarkName   100   1000 ns/op   1 error_responses/sec  6000 events/sec   1276948 B/op   2831 allocs/op

We can introduce a new flag that when set, includes all the metrics that we're recording to be displayed -detailed or a similar flag name could work:

$ ./apmbench -detailed
benchmarkName   100   1000 ns/op   1 error_responses/sec  6000 events/sec   1000 errors/sec 1000 metrics/sec   1000 txs/sec   3000 spans/sec   10 gc/op   1276948 B/op   2831 allocs/op ...

I think that the usefulness of the broken down event/s (metrics, errors, txs, spans) metrics is not immediately obvious and we should most likely omit that to free some space.

metric name	Required	Optional
Benchmark Name	x	-
Iterations	x	-
ns/op	x	-
B/op	x	-
allocs/op	x	-
events/sec	x	-
metrics/sec	-	x
spans/sec	-	x
errors/sec	-	x
txs/sec	-	x
gc cycles	-	x (required?)
max RSS	-	x (required?)

axw · 2022-05-24T07:40:16Z

Thanks for the proposal @marclop, that all sounds good to me.

lahsivjar · 2022-05-24T07:57:38Z

@marclop What is your recommendation on the extra metrics for rss, goroutines, indexers, gc and heap? I see in the table that you have only kept gc cycles and max rss. Is that all we want to keep?

Also, regarding max_rss, I am not sure if that will be enough to derive a lot of data from. IIUC, max_rss will also include the memory not released to OS and it might be possible that max_rss remains same throughout the benchmark process especially if the same apm-server process is used in multiple benchmarks.

marclop

Overall approach looks good. I left a few comments on the overall watcher and map structure.

What is your recommendation on the extra metrics for rss, goroutines, indexers, gc and heap? I see in the table that you have only kept gc cycles and max rss. Is that all we want to keep?

No, I would like to keep all the metrics proposed previously, I just added them to the table to exemplify the addition of the new metrics in that table without writing the entire list.

Also, regarding max_rss, I am not sure if that will be enough to derive a lot of data from. IIUC, max_rss will also include the memory not released to OS and it might be possible that max_rss remains same throughout the benchmark process especially if the same apm-server process is used in multiple benchmarks.

From what I've been able to observe, the beats RSS is updated fairly regularly. In the current approach, the collector is created on each benchmark, and will be confined to the lifetime of that benchmark, meaning that the max RSS reading will only be for that benchmark (even if a benchmark that ran before did push the RSS considerably, the max reading will may be affected by it).

To provide an updated table, I think we should record the following:

Current

metric name	Required	Optional
Benchmark Name	x	-
Iterations	x	-
ns/op	x	-
B/op	x	-
allocs/op	x	-
events/sec	x	-
metrics/sec	-	x
spans/sec	-	x
errors/sec	-	x
txs/sec	-	x

New

metric name	Required	Optional
gc_cycles	x	-
max_rss	x	-
max_goroutines	-	x
max_heap_alloc	-	x
max_heap_objects	-	x
mean_available_indexers	-	x

Like my previous comment mentions, I think we should add a new flag (-detailed or a similar name that makes sense) to control this behavior.

mergify · 2022-05-25T06:32:21Z

This pull request is now in conflicts. Could you fix it @lahsivjar? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b 7817_usage_metrics upstream/7817_usage_metrics
git merge upstream/main
git push upstream 7817_usage_metrics

marclop

LGTM!

Co-authored-by: Marc Lopez Rubio <marc5.12@outlook.com>

lahsivjar · 2022-06-01T03:35:35Z

/test

mergify bot added the backport-skip Skip notification from the automated backport with mergify label May 20, 2022

lahsivjar force-pushed the 7817_usage_metrics branch 4 times, most recently from e4ebb56 to 4753a44 Compare May 21, 2022 20:27

Collect aggregates from expvar metrices

a0873dc

lahsivjar force-pushed the 7817_usage_metrics branch from 4753a44 to a0873dc Compare May 22, 2022 07:37

lahsivjar marked this pull request as ready for review May 23, 2022 00:17

lahsivjar requested a review from a team May 23, 2022 00:17

axw reviewed May 23, 2022

View reviewed changes

Merge branch 'main' into 7817_usage_metrics

568abe0

marclop reviewed May 24, 2022

View reviewed changes

lahsivjar added 4 commits May 24, 2022 20:25

Refactor and put detailed metrics behind a flag

aa343a1

Merge branch 'main' into 7817_usage_metrics

c995b2d

Fix typo in test case

1a2cd0a

Update collected metrics for benchtest

5472774

lahsivjar requested a review from marclop May 24, 2022 13:14

Fix typo in function comments

d7a0ca3

marclop reviewed May 24, 2022

View reviewed changes

Comment thread systemtest/benchtest/expvar/metrics.go Outdated

Comment thread systemtest/benchtest/expvar/metrics.go Outdated

Comment thread systemtest/benchtest/main.go Outdated

lahsivjar added 2 commits May 25, 2022 10:26

Handle adding watch if collector has stopped

a16e5dc

Add test cases for expvar metrics

4d59837

lahsivjar requested a review from marclop May 25, 2022 04:53

Merge branch 'main' into 7817_usage_metrics

ce61c83

lahsivjar added 2 commits May 25, 2022 13:52

Fix gofmt

b76012b

Merge branch 'main' into 7817_usage_metrics

38246ed

marclop requested changes May 25, 2022

View reviewed changes

Comment thread systemtest/benchtest/expvar/metrics_test.go Outdated

Comment thread systemtest/benchtest/main.go Outdated

Comment thread systemtest/benchtest/expvar/metrics.go Outdated

lahsivjar added 4 commits May 25, 2022 23:24

Merge branch 'main' into 7817_usage_metrics

22466a6

Fix handling of stopped collector flag

b843d5c

Update tests based on support for multiple apm server

30e21f7

Add tests for expvar metrics

3f1ac61

lahsivjar commented May 25, 2022

View reviewed changes

Comment thread systemtest/benchtest/main_test.go

lahsivjar requested a review from marclop May 25, 2022 15:28

lahsivjar added 3 commits May 25, 2022 23:46

Fix gofmt

e67ea73

Merge branch 'main' into 7817_usage_metrics

a844e22

Close watch channel after success message produced

408441b

marclop approved these changes May 31, 2022

View reviewed changes

Comment thread systemtest/benchtest/main_test.go Outdated

Comment thread systemtest/benchtest/main_test.go

lahsivjar and others added 2 commits June 1, 2022 09:53

Update systemtest/benchtest/main_test.go

c6a8549

Co-authored-by: Marc Lopez Rubio <marc5.12@outlook.com>

Merge branch 'main' into 7817_usage_metrics

9e7438c

lahsivjar enabled auto-merge (squash) June 1, 2022 01:58

lahsivjar disabled auto-merge June 1, 2022 02:02

lahsivjar enabled auto-merge (squash) June 1, 2022 02:03

lahsivjar merged commit 15a2925 into elastic:main Jun 1, 2022

lahsivjar deleted the 7817_usage_metrics branch June 1, 2022 04:13

lahsivjar added test-plan v8.4.0 and removed test-plan labels Jul 28, 2022

Conversation

lahsivjar commented May 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation/summary

Checklist

How to test these changes

Related issues

Uh oh!

mergify bot commented May 20, 2022

Uh oh!

ghost commented May 20, 2022 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💚 Build Succeeded

Build stats

Test stats 🧪

🤖 GitHub comments

Uh oh!

ghost commented May 22, 2022 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🌐 Coverage report

Uh oh!

axw left a comment

Choose a reason for hiding this comment

Uh oh!

lahsivjar commented May 23, 2022

Uh oh!

marclop commented May 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

axw commented May 24, 2022

Uh oh!

lahsivjar commented May 24, 2022

Uh oh!

marclop left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Current

New

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented May 25, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

marclop left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lahsivjar commented Jun 1, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lahsivjar commented May 20, 2022 •

edited

Loading

ghost commented May 20, 2022 •

edited by ghost

Loading

ghost commented May 22, 2022 •

edited by ghost

Loading

marclop commented May 24, 2022 •

edited

Loading

marclop left a comment •

edited

Loading