In #7216 @marclop worked on a POC for a new benchmarking tool for APM data ingestion. The POC is leveraging Opbeans services to create apm events, which are recorded into ndjson files by the new Intake Receiver component. The apmbench loader reads from the ndjson files and the runner sends the data to an APM Server, while also collecting statistics which are indexed into a dedicated Elasticsearch. Find a more detailed description in #7216 (comment).
After successfully building the POC, all the components should be tied together in a maintainable way.
Goals
- [state: on-track] New Benchmark tooling is wired up to run automated benchmarks on a fixed set of testdata on a regular schedule
- [state: moved] Can be integrated into the teams workflow when building new features or refactoring (ad-hoc and automated benchmarks)
- [state: moved] Can easily be used by agent developers to create a new testdata set for a new agent version for automated benchmarks, and ad-hoc benchmarks.
- [state: moved] Can be used by engineers and developers outside the apm-server team for throughput ananlysis with a targeted throughput per unit.
- [state: on-track] Update the processing & performance guide with updated numbers
Tasks
Intake Receiver (Generation)
None for this milestone
APM Bench (Execution)
apmbench will read the captured events from intake-receiver and load them in memory to allow them to be replayed against the APM Server. We need to modify the existing binary to allow us to measure changes to the APM Server over time.
ECE Test/ESS (Environment)
The new benchmarking framework is going to mainly use ESS testing regions to run the Elastic Stack with APM Server for the that we run the benchmarks and the deployment will be destroyed after the benchmark suite has finished.
This creates the opportunity to benchmark the APM Server with a very specific set of parameters and greatly facilitates ad-hoc benchmarking as well, since any APM Server developer can easily create the necessary pieces for the benchmarks, run them, and tear them down after they've finished. Additionally, we can leverage the existing ESS metric beat metrics that are collected by default and can be used to proactively debug or monitor the benchmarks when required.
The goal is to store the necessary scripts and automation that create the infrastructure in the apm-server repo and run the benchmarks through a Jenkins job that is scheduled to run daily.
Analysis
The analysis part comprises of loading the data into a remote long-lived Elasticsearch cluster, and build the necessary dashboards that cover the needed use cases.
Automation
The automation part should only require setting up the right accounts and credentials for the automation jobs to create or access the required infrastructure
Documentation
Ad Hoc
In #7216 @marclop worked on a POC for a new benchmarking tool for APM data ingestion. The POC is leveraging Opbeans services to create apm events, which are recorded into ndjson files by the new Intake Receiver component. The
apmbenchloader reads from the ndjson files and the runner sends the data to an APM Server, while also collecting statistics which are indexed into a dedicated Elasticsearch. Find a more detailed description in #7216 (comment).After successfully building the POC, all the components should be tied together in a maintainable way.
Goals
Tasks
Intake Receiver (Generation)
None for this milestone
APM Bench (Execution)
apmbenchwill read the captured events fromintake-receiverand load them in memory to allow them to be replayed against the APM Server. We need to modify the existing binary to allow us to measure changes to the APM Server over time.events/unitsent to APM Server #7843Implement alternative solution to committing the generated(moved to out of scope)ndjsonfiles in the APM Server repo. As one option, we discussed uploading generated files to s3 and providing a small uploader tool that is easy to use. ([META] Benchmark 2.0: Write benchmark intake event data upload tool #7844)Support feature specific use cases, at least, there should be enough data, or data (trace_ids, ..) being manipulated without losing semantic meaning, to test tail based sampling. (Support benchmarking specific feature such as Tail Based Sampling #7845)(moved to out of scope)ECE Test/ESS (Environment)
The new benchmarking framework is going to mainly use ESS testing regions to run the Elastic Stack with APM Server for the that we run the benchmarks and the deployment will be destroyed after the benchmark suite has finished.
This creates the opportunity to benchmark the APM Server with a very specific set of parameters and greatly facilitates ad-hoc benchmarking as well, since any APM Server developer can easily create the necessary pieces for the benchmarks, run them, and tear them down after they've finished. Additionally, we can leverage the existing ESS metric beat metrics that are collected by default and can be used to proactively debug or monitor the benchmarks when required.
The goal is to store the necessary scripts and automation that create the infrastructure in the apm-server repo and run the benchmarks through a Jenkins job that is scheduled to run daily.
pprof_enabledin the APM Package #7733pprof_enabledtoggle in APM integrations UI kibana#131888Support benchmarking an unreleased locally built APM Server #7986(more effort than expected, labeled8.4-candidate)https://github.com/elastic/observability-dev/issues/2014(stretch goal, moved to out of scope)Analysis
The analysis part comprises of loading the data into a remote long-lived Elasticsearch cluster, and build the necessary dashboards that cover the needed use cases.
gobenchto collect the additional metrics (Parse apmbench results, add ES basicauth gobench#1).gobench#7856benchmarking: Create meaningful dashboards for apmbench results #7868.(labeled8.4-candidate)Automation
The automation part should only require setting up the right accounts and credentials for the automation jobs to create or access the required infrastructure
Documentation
Ad Hoc
Benchmark the impact of using a higher output MaxRequests default #7718(stretch goal, moved to out of scope)output.elasticsearch.max_requestsconfigurable #7719GOMAXPROCS#7967