Agent Configuration in Kibana, GA

e**Is your feature request related to a problem? Please describe.**

APM delivered Agent Configuration in Kibana as beta in 7.3, intentionally deferring some aspects for later phases.  The goal of this issue is to pick up where we left and have a more compelling feature towards GA. We are aiming 7.5. 

Two lines of work stand out:

1. Provide support for more settings.

2. Provide feedback to the user in Kibana.

   

**Describe the solution you'd like**

On the first point, ideally we add support for 2-3 more fields. Some reasonable candidates are:

- `ACTIVE/RECORDING` (depends on https://github.com/elastic/apm/issues/92)

- `CAPTURE_BODY`

- `METRICS_INTERVAL`

- `IGNORE_URLS`

- `TRANSACTION_MAX_SPANS`

~~- `SPAN_FRAMES_MIN_DURATION`~~

  

On the second point, a good start would be to show in Kibana whether some configuration was applied _by some agent_ or not. For this, we could use the Etags sent from the agents to know what is the last good value they have. More precisely:

- Kibana will be generating the Etags (either as content hashes or UIDs), instead of APM Server.
- APM Server will simply pass the Etags from agents to Kibana back and forth.
- Kibana will add a boolean field to the elasticsearch documents, initially `false`, and flip it to `true` when it receives an Etag from _any one_ agent matching the Etag in the document.
- That boolean indicates to the UI that _some_ agent is up to date. When the user changes some setting in Kibana, it resets the boolean.

This approach entails a feedback delay of up to 2 times the `POLL_INTERVAL`, which is acceptable. 

One downside is that if some agents succeed and others fail, those failures would be silently ignored. This is (hopefully!) an unlikely scenario. 

Another downside is that it is not possible for Server/Kibana to distinguish _failure_ from _missing_ (agent never querying) unless we keep track of how many agents are around. This would add significant complexity. A workaround is trough documentation, warning users that if they don't see feedback in  2x`POLL_INTERVAL` seconds it is probably because something went wrong.

Another option is that agents send the `ephemeral_id` upstream, and Kibana shows a count of agents that applied the last configuration based on the ephemeral ids (or maybe even show the ids themselves). This requires agents to calculate/generate their `ephemeral_id`.  Alternatively, agents can use their IP address.


**Describe alternatives you've considered**

We could come up with a way to aggregate data coming from different agents. For instance:

- If all agents "so far" report success, show a "success" visual indicator the UI.
- If all agents "so far" report failure, show a "failed" visual indicator the UI.
- If some report success and some report failure, show both indicators.
  - We could further help users to dig into which ones failed via the aforementioned `ephemeral_id`,  IP address, or similar. 

This would require agents to send data to apm-server (probably to the same endpoint) and define a new schema. A new schema would also allows us to show more information than a boolean, eg: timestamp of config application, error messages, etc.  

I suggest to not introduce a new data model until we know more how the feature is used, what problems users run into, how/why exactly agents might fail to apply configuration, etc.



------



We also need to decide if we want to support RUM in GA, later, or never. User feedback is probably not practical for the RUM agent.

Note that at the moment Agent Configuration in Kibana is unusable for customers with a Distributed Tracing setup, as the only available setting is `SAMPLING_RATE`, which in case of DT will be dictated by the RUM agent (almost always).

RUM status is tracked in https://github.com/elastic/apm-agent-rum-js/issues/253


Finally, during the first design we briefly touched on auditing. Some sort of audit logs could be achieved eg. by creating one elasticsearch document per update, and adding information about the user that did that change. However if we do not plan to add an UI interface for it, it might be enough to simply log configuration updates in Kibana.


**Implementation issues**


##### Kibana 
  - user feedback: https://github.com/elastic/kibana/issues/43354 
  - more fields: https://github.com/elastic/kibana/issues/43351
     
##### Server
  - user feedback: https://github.com/elastic/apm-server/issues/2630
  - RUM endpoint: https://github.com/elastic/apm-server/issues/2694
   
##### Agents
  - more fields:
     * [x] .NET - https://github.com/elastic/apm-agent-dotnet/issues/534
     * [ ] Go
     * [x] Java - https://github.com/elastic/apm-agent-java/issues/871
     * [ ] Node.js
     * [ ] Python
     * [ ] Ruby
     * [ ] RUM - https://github.com/elastic/apm-agent-rum-js/issues/253

See background in https://github.com/elastic/apm/issues/4 and https://github.com/elastic/apm/issues/76

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent Configuration in Kibana, GA #138

Kibana

Server

Agents

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Agent Configuration in Kibana, GA #138

Description

Kibana

Server

Agents

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions