Skip to content

Reporting and showing cgroup-based metrics #291

@eyalkoren

Description

@eyalkoren

Description of the issue

APM agents currently send system metrics that are aligned with Metricbeat's metricset keys, as well as values. These cover system. metricsets and some specific platform-related metrics (see Java agent documentation for example).
However, these system metrics are inaccurate when monitoring containers. The most obvious miscalculation comes from the fact that agents currently collect host total memory rather than the effective cgroup limitation, but there are also considerable differences in the used bytes, depending on how they are retrieved, as well as CPU usage per cgroup quota.

Proposed solution

Introducing new cgroup metrics

As a first step, the new metrics will include:

  • system.process.cgroup.memory.mem.limit.bytes
  • system.process.cgroup.memory.mem.usage.bytes

Both are optional.
Both are numeric representing number of bytes.
When not available, these metrics should not be sent.

In the future, we may extend to collect and show additional memory metrics, as well as cpu metrics.

APM UI

System memory usage values will be calculated based on cgroup metrics if such are available, using mem.usage.bytes/mem.limit.bytes. Otherwise, use the existing system.memory metrics.

NOTE: whenever a cgroup is not explicitly limited in memory, the limit read from the corresponding file may be set to 9223372036854771712 (equivalent to 0x7ffffffffffff000), which basically means unlimited.
Agents conforming to the spec should not send this value (they should omit the max cgroup metric in such case)..

Formalizing that in pseudocode:

var total = system.process.cgroup.memory.mem.limit.bytes;
if (total == NA) {  
  total = system.memory.total;
}
var used = system.process.cgroup.memory.mem.usage.bytes;
if (used == NA) {
  used = system.memory.total - system.memory.actual.free;
} 
var usage = used / total;

Agent implementation details

https://github.com/elastic/apm/blob/master/specs/agents/metrics.md#cgroup-metrics

Related issues

Component Link to issue
Agents #292
APM Server elastic/apm-server#4070
APM UI elastic/kibana#69679

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions