Skip to content

Add option to set duration threshold for spans #259

@felixbarny

Description

@felixbarny

Description of the issue

Currently, the experience of our APM solution is not ideal when monitoring an application that ends up generating thousands of spans per request. In elastic/apm-agent-java#1094, a feature has been requested to drop fast executing spans.

We currently have a default limit of 500 spans in the agents and 1000 spans in the UI to prevent overwhelming the system with too many spans. However, that just cuts off at a certain point without considering which spans are important and which are not.

Proposal

Introduce configuration option span_min_duration
Default: 0ms (don't discard any spans)

Sets the minimum duration of spans. Spans that execute faster than this threshold are attempted to be discarded. The attempt fails if they lead up to a span that can't be discarded.

Spans that propagate the trace context to downstream services, such as outgoing HTTP requests, can't be discarded. Additionally, spans that lead to an error or that may be a parent of async operations can't be discarded.

However, external calls that don't propagate context, such as calls to a database, can be discarded using this threshold.

The metrics for discarded spans are still considered in breakdown metrics.

Deprecating other threshold settings

A note on the Java agent-specific trace_methods_duration_threshold and profiling_inferred_spans_min_duration options. These options can be deprecated as the new option can also be applied to spans created via trace_methods or inferred spans.

Limitations

Spans that propagate context to downstream services can't be discarded

We only know whether to discard after the call has ended. At that point, the trace has already continued on the downstream service. Discarding the span for the external request would orphan the transaction of the downstream call.

An argument could be made that this is not a big problem as the trace view then just won't show the downstream transaction. But as this would introduce inconsistencies (e.g. the transaction can be seen in the transaction details view of the service but when viewing the full trace it disappears) I suggest not allowing this for now.

Intermediate spans

Discarding a single span that has both a parent and a child span is not possible as it would lead to orphaned child spans.

However, a whole subtree of spans can get discarded if all spans within that tree are requesting to be discarded. This means that if a leaf of the tree can't be discarded because it propagates context downstream, all spans leading up to it can't be discarded. If the leaf is a non-context propagating span, such as a manually created span or a SQL call, the subtree can be discarded.

Async spans

If the context of a span is propagated to another thread, it may not be discarded. That is because the other thread might create child spans of the first span even if it has already ended.

What this is not about

For now, we won't provide fine-grained control over what kind of spans are affected by the threshold as this gets quite complicated for both users and implementers. For example, to set a threshold only for db.sql spans or to set different thresholds for Elasticsearch spans and spans created by manual instrumentation. This could look like span_min_duration: *=10ms,db=5ms,db.elasticsearch=0ms to discard spans faster than 10ms, DB spans faster than 5ms, except for MySQL spans that should always be kept.

What we are voting on

Whether the option and its semantics are sensible

This does not have to be implemented right away by all agents. Prioritizing the implementation is a decision each agent can take on their own but the Java agent plans to implement this soon.

Vote

Agent Yes No Indifferent N/A Link to agent issue
.NET
Go
Java
elastic/apm-agent-java#1094
Node.js
Python
Ruby
RUM

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions