Skip to content

Collect aggregates from expvar metrices#8174

Merged
lahsivjar merged 21 commits intoelastic:mainfrom
lahsivjar:7817_usage_metrics
Jun 1, 2022
Merged

Collect aggregates from expvar metrices#8174
lahsivjar merged 21 commits intoelastic:mainfrom
lahsivjar:7817_usage_metrics

Conversation

@lahsivjar
Copy link
Copy Markdown
Contributor

@lahsivjar lahsivjar commented May 20, 2022

Motivation/summary

Collect aggregated metrics from the result of expvar. Currently collecting max, min and average.

Checklist

- [ ] Update CHANGELOG.asciidoc
- [ ] Update package changelog.yml (only if changes to apmpackage have been made)
- [ ] Documentation has been updated

How to test these changes

  1. Run a local or cloud APM-Server

  2. Run apmbench test with -detailed flag

    cd systemtest/cmd/apmbench
    go run main.go -server http://localhost:8200 -detailed
    
  3. Observe the following fields in the output: gc_cycles, max_rss, max_goroutines, max_heap_alloc, max_heap_objects and mean_available_indexers

Related issues

Closes #7817

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented May 20, 2022

This pull request does not have a backport label. Could you fix it @lahsivjar? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-7.x is the label to automatically backport to the 7.x branch.
  • backport-7./d is the label to automatically backport to the 7./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@mergify mergify bot added the backport-skip Skip notification from the automated backport with mergify label May 20, 2022
@ghost
Copy link
Copy Markdown

ghost commented May 20, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-06-01T03:36:07.431+0000

  • Duration: 29 min 36 sec

Test stats 🧪

Test Results
Failed 0
Passed 4006
Skipped 13
Total 4019

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /hey-apm : Run the hey-apm benchmark.

  • /package : Generate and publish the docker images.

  • /test windows : Build & tests on Windows.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@lahsivjar lahsivjar force-pushed the 7817_usage_metrics branch 4 times, most recently from e4ebb56 to 4753a44 Compare May 21, 2022 20:27
@lahsivjar lahsivjar force-pushed the 7817_usage_metrics branch from 4753a44 to a0873dc Compare May 22, 2022 07:37
@ghost
Copy link
Copy Markdown

ghost commented May 22, 2022

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 100.0% (42/42) 💚
Files 91.96% (183/199) 👍
Classes 93.421% (426/456) 👍
Methods 89.256% (1080/1210) 👍
Lines 76.846% (13146/17107) 👍 0.041
Conditionals 100.0% (0/0) 💚

@lahsivjar lahsivjar marked this pull request as ready for review May 23, 2022 00:17
@lahsivjar lahsivjar requested a review from a team May 23, 2022 00:17
Copy link
Copy Markdown
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lahsivjar @marclop what do you think about making these metrics opt-in?

I fear the output will become unwieldy if we add so many metrics. Do you anticipate using these regularly, or only (interactively) after diagnosing a drop in throughput?

@lahsivjar
Copy link
Copy Markdown
Contributor Author

@axw I agree with the output getting unwieldy. For automated benchmark, IMO, we should always index the results to ES but for other use cases around manual benchmarking I think optional metric opt-in flag for extra metrics will be really useful. @marclop WDYT?

@marclop
Copy link
Copy Markdown
Contributor

marclop commented May 24, 2022

@axw I think that would be better and we should consider categorizing the metrics in essential and optional categories. This may improve the output and the mental strain of parsing all these number thrown at you. I should've added some of these ideas to the the original issue when I created it and start the discussion then.

There are some metrics that aren't really that useful, particularly when the number is 0, one that comes to mind is error_responses/sec.

Since we mimick the output of go test -bench, we have certain constraints on what we need to publish (name, iterations and ns/op), additionally we print B/op and allocs/op which would be the metrics reported when -benchmem is set in go test -bench.

Of our custom metrics, we don't really need all of them by default, if we're just doing some quick benchmarks, it may be better to limit the number of other metrics that are printed out. I believe that it would be more effective to print less metrics by default and omit those that are 0. the output of these could be as below (the final number of metrics to be printed is yet TBD)

$ ./apmbench
benchmarkName   100   1000 ns/op  6000 events/sec   1276948 B/op   2831 allocs/op

In case that there are some error responses, we could report those, but they needn't be included in all the results:

$ ./apmbench
benchmarkName   100   1000 ns/op   1 error_responses/sec  6000 events/sec   1276948 B/op   2831 allocs/op

We can introduce a new flag that when set, includes all the metrics that we're recording to be displayed -detailed or a similar flag name could work:

$ ./apmbench -detailed
benchmarkName   100   1000 ns/op   1 error_responses/sec  6000 events/sec   1000 errors/sec 1000 metrics/sec   1000 txs/sec   3000 spans/sec   10 gc/op   1276948 B/op   2831 allocs/op ...

I think that the usefulness of the broken down event/s (metrics, errors, txs, spans) metrics is not immediately obvious and we should most likely omit that to free some space.

metric name Required Optional
Benchmark Name x -
Iterations x -
ns/op x -
B/op x -
allocs/op x -
events/sec x -
metrics/sec - x
spans/sec - x
errors/sec - x
txs/sec - x
gc cycles - x (required?)
max RSS - x (required?)

@axw
Copy link
Copy Markdown
Member

axw commented May 24, 2022

Thanks for the proposal @marclop, that all sounds good to me.

@lahsivjar
Copy link
Copy Markdown
Contributor Author

@marclop What is your recommendation on the extra metrics for rss, goroutines, indexers, gc and heap? I see in the table that you have only kept gc cycles and max rss. Is that all we want to keep?

Also, regarding max_rss, I am not sure if that will be enough to derive a lot of data from. IIUC, max_rss will also include the memory not released to OS and it might be possible that max_rss remains same throughout the benchmark process especially if the same apm-server process is used in multiple benchmarks.

Copy link
Copy Markdown
Contributor

@marclop marclop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall approach looks good. I left a few comments on the overall watcher and map structure.

What is your recommendation on the extra metrics for rss, goroutines, indexers, gc and heap? I see in the table that you have only kept gc cycles and max rss. Is that all we want to keep?

No, I would like to keep all the metrics proposed previously, I just added them to the table to exemplify the addition of the new metrics in that table without writing the entire list.

Also, regarding max_rss, I am not sure if that will be enough to derive a lot of data from. IIUC, max_rss will also include the memory not released to OS and it might be possible that max_rss remains same throughout the benchmark process especially if the same apm-server process is used in multiple benchmarks.

From what I've been able to observe, the beats RSS is updated fairly regularly. In the current approach, the collector is created on each benchmark, and will be confined to the lifetime of that benchmark, meaning that the max RSS reading will only be for that benchmark (even if a benchmark that ran before did push the RSS considerably, the max reading will may be affected by it).

To provide an updated table, I think we should record the following:

Current

metric name Required Optional
Benchmark Name x -
Iterations x -
ns/op x -
B/op x -
allocs/op x -
events/sec x -
metrics/sec - x
spans/sec - x
errors/sec - x
txs/sec - x

New

metric name Required Optional
gc_cycles x -
max_rss x -
max_goroutines - x
max_heap_alloc - x
max_heap_objects - x
mean_available_indexers - x

Like my previous comment mentions, I think we should add a new flag (-detailed or a similar name that makes sense) to control this behavior.

Comment thread systemtest/benchtest/expvarmetrics/expvar.go Outdated
Comment thread systemtest/benchtest/expvarmetrics/metrics.go Outdated
Comment thread systemtest/benchtest/expvarmetrics/metrics.go Outdated
Comment thread systemtest/benchtest/expvarmetrics/metrics.go Outdated
Comment thread systemtest/benchtest/expvarmetrics/metrics.go Outdated
Comment thread systemtest/benchtest/expvarmetrics/metrics.go
Comment thread systemtest/benchtest/expvarmetrics/metrics_test.go Outdated
Comment thread systemtest/benchtest/expvarmetrics/metrics_test.go Outdated
Comment thread systemtest/benchtest/expvarmetrics/metrics.go Outdated
Comment thread systemtest/benchtest/expvarmetrics/metrics_test.go Outdated
@lahsivjar lahsivjar requested a review from marclop May 24, 2022 13:14
Comment thread systemtest/benchtest/expvar/metrics.go Outdated
Comment thread systemtest/benchtest/expvar/metrics.go Outdated
Comment thread systemtest/benchtest/main.go Outdated
@lahsivjar lahsivjar requested a review from marclop May 25, 2022 04:53
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented May 25, 2022

This pull request is now in conflicts. Could you fix it @lahsivjar? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b 7817_usage_metrics upstream/7817_usage_metrics
git merge upstream/main
git push upstream 7817_usage_metrics

Comment thread systemtest/benchtest/expvar/metrics_test.go Outdated
Comment thread systemtest/benchtest/main.go Outdated
Comment thread systemtest/benchtest/expvar/metrics.go Outdated
Comment thread systemtest/benchtest/main_test.go
@lahsivjar lahsivjar requested a review from marclop May 25, 2022 15:28
Copy link
Copy Markdown
Contributor

@marclop marclop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment thread systemtest/benchtest/main_test.go Outdated
Comment thread systemtest/benchtest/main_test.go
lahsivjar and others added 2 commits June 1, 2022 09:53
@lahsivjar lahsivjar enabled auto-merge (squash) June 1, 2022 01:58
@lahsivjar lahsivjar disabled auto-merge June 1, 2022 02:02
@lahsivjar lahsivjar enabled auto-merge (squash) June 1, 2022 02:03
@lahsivjar
Copy link
Copy Markdown
Contributor Author

/test

@lahsivjar lahsivjar merged commit 15a2925 into elastic:main Jun 1, 2022
@lahsivjar lahsivjar deleted the 7817_usage_metrics branch June 1, 2022 04:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-skip Skip notification from the automated backport with mergify v8.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

benchmarking: Record more APM Server usage metrics

3 participants