Benchmarking 2.0 infrastructure and automation

## Description

The new benchmarking framework will take a slightly different route than our current approach, namely it will look to leverage either ESS (preferred) or an on-demand ECE environment (ecetest) to run nightly periodic benchmarks that allow us to track the APM Server performance.

The lowest effort would be to use a region in ESS where we can benchmark the APM Server's throughput, collect the results and index them to a long-lived remote Elasticsearch cluster with [`gobench`](https://github.com/elastic/gobench). If there are unforeseen limitations when using ESS to run the benchmarks, we can then look into spinning up an on-demand ECE environment, but that is much costlier, in terms of time, resources and monetary cost.

## Considerations

- Keep the benchmarks run by hey-apm for a while and only decommission those after we're happy with the new approach.
- Run `apmbench` in a machine that is appropriate for the work and is as close as possible to the workload (same CSP region).
- The hardware profile must be able to handle a high level of concurrency since we'll be looking to run `apmbench` with a medium to high number of agents, and decent network performance.

## Approach

### Docker image tag

Since the Elastic Stack and APM Server will be running in ESS, the software must be packaged in docker images, it is out of scope for this issue to build these images, but to reduce the risk of running the benchmarks against an upstream version that doesn't completely work, we should have some guarantees in place and a vetting process for the "latest" version.

Since we already have a workflow that updates each of the APM Server's active branches docker images, we could rely on the docker image tags that are used in our `docker-compose.yml` file and specify the current image's tag as the docker image to use in `<elasticsearch|kibana|apm>.config.docker_image` when creating the ESS deployment. [See the Terraform provider acceptance test](https://github.com/elastic/terraform-provider-ec/blob/master/ec/acc/testdata/deployment_docker_image_override.tf#L37) that uses `docker_image`.

### Deployment lifecycle

The most cost effective and efficient approach is to create a new deployment in ESS and a new VM in the same region with the desired hardware profile for the apmbench runner and upload the credentials for apmbench to connect to the deployment. After the benchmarks have been run, and the results uploaded to a persistent deployment where we'd store them, the deployment and apmbench vm should be town down, to cut down costs.

The terraform configuration for the benchmark deployment could live in the APM Server repo and it could also be used for APM Server developers when there are changes in the APM Server that wish to be benchmarked, a limitation, however, would be that a cloud docker image would need to be built and uploaded to allow the testing to take place.

## Automation work

- [x] Create or re-use an Elastic owned ESS account to hold the benchmarking deployments.
- [x] Decide which Cloud Service Provider to use and obtain the necessary credentials for it to work.
- [x] Set up a Jenkins job that runs daily and executes the benchmarks.
- [ ] Publish the results the benchmark results in #apm-server.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarking 2.0 infrastructure and automation #7846

Description

Considerations

Approach

Docker image tag

Deployment lifecycle

Automation work

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Benchmarking 2.0 infrastructure and automation #7846

Description

Description

Considerations

Approach

Docker image tag

Deployment lifecycle

Automation work

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions