Skip to content

Agent Configuration in Kibana, GA #138

@jalvz

Description

@jalvz

eIs your feature request related to a problem? Please describe.

APM delivered Agent Configuration in Kibana as beta in 7.3, intentionally deferring some aspects for later phases. The goal of this issue is to pick up where we left and have a more compelling feature towards GA. We are aiming 7.5.

Two lines of work stand out:

  1. Provide support for more settings.

  2. Provide feedback to the user in Kibana.

Describe the solution you'd like

On the first point, ideally we add support for 2-3 more fields. Some reasonable candidates are:

- SPAN_FRAMES_MIN_DURATION

On the second point, a good start would be to show in Kibana whether some configuration was applied by some agent or not. For this, we could use the Etags sent from the agents to know what is the last good value they have. More precisely:

  • Kibana will be generating the Etags (either as content hashes or UIDs), instead of APM Server.
  • APM Server will simply pass the Etags from agents to Kibana back and forth.
  • Kibana will add a boolean field to the elasticsearch documents, initially false, and flip it to true when it receives an Etag from any one agent matching the Etag in the document.
  • That boolean indicates to the UI that some agent is up to date. When the user changes some setting in Kibana, it resets the boolean.

This approach entails a feedback delay of up to 2 times the POLL_INTERVAL, which is acceptable.

One downside is that if some agents succeed and others fail, those failures would be silently ignored. This is (hopefully!) an unlikely scenario.

Another downside is that it is not possible for Server/Kibana to distinguish failure from missing (agent never querying) unless we keep track of how many agents are around. This would add significant complexity. A workaround is trough documentation, warning users that if they don't see feedback in 2xPOLL_INTERVAL seconds it is probably because something went wrong.

Another option is that agents send the ephemeral_id upstream, and Kibana shows a count of agents that applied the last configuration based on the ephemeral ids (or maybe even show the ids themselves). This requires agents to calculate/generate their ephemeral_id. Alternatively, agents can use their IP address.

Describe alternatives you've considered

We could come up with a way to aggregate data coming from different agents. For instance:

  • If all agents "so far" report success, show a "success" visual indicator the UI.
  • If all agents "so far" report failure, show a "failed" visual indicator the UI.
  • If some report success and some report failure, show both indicators.
    • We could further help users to dig into which ones failed via the aforementioned ephemeral_id, IP address, or similar.

This would require agents to send data to apm-server (probably to the same endpoint) and define a new schema. A new schema would also allows us to show more information than a boolean, eg: timestamp of config application, error messages, etc.

I suggest to not introduce a new data model until we know more how the feature is used, what problems users run into, how/why exactly agents might fail to apply configuration, etc.


We also need to decide if we want to support RUM in GA, later, or never. User feedback is probably not practical for the RUM agent.

Note that at the moment Agent Configuration in Kibana is unusable for customers with a Distributed Tracing setup, as the only available setting is SAMPLING_RATE, which in case of DT will be dictated by the RUM agent (almost always).

RUM status is tracked in elastic/apm-agent-rum-js#253

Finally, during the first design we briefly touched on auditing. Some sort of audit logs could be achieved eg. by creating one elasticsearch document per update, and adding information about the user that did that change. However if we do not plan to add an UI interface for it, it might be enough to simply log configuration updates in Kibana.

Implementation issues

Kibana
Server
Agents

See background in #4 and #76

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions