Unified Pipeline/output metrics#4663
Conversation
ruflin
left a comment
There was a problem hiding this comment.
This is a great change. It heavily simplifies the internal and external handling of the metrics. Makes it very easy to filter by one output type etc.
Makes we wonder, if there are other places where we should do the same ;-)
libbeat/monitoring/metrics.go
Outdated
There was a problem hiding this comment.
tricky abbreviation. only know what it means because of the diff
There was a problem hiding this comment.
oh... to me CAS on atomics is pretty well known/common :)
|
@urso The wait_shutdown_ok test seems to fail, which worries me a bit. But not related to this PR I think. |
|
@ruflin checked the test. The test is completely unrelated to this PR. Although the test is failing (cause the shutdown timer times out), I verified the state in the registry still matches expectations. |
- use libbeat/common/atomic package - add monitoring.Uint type
- report pipeline metrics on: - libbeat.pipeline.... - xpack.monitoring.pipeline...
This PR adds metrics support to the publisher pipeline + unifies the metrics used by outputs.
The metrics are registered dynamically and can be removed later on (e.g. pipeline already removes metrics after on close).
Metrics support is somewhat standardised and decoupled from outputs/publisher pipeline, by having some kind of event listener/observer/... object defining a set of common events. On every event from the outputs, the metrics (potentially multiple metrics) are updated accordingly.
The original per output type metrics have been removed (no more
output.elasticsearch...and so on), in favor of a standardized set of metrics.The observer is passed to the outputs and publisher pipeline. This is used to collect metrics for different pipeline instances (
xpackandlibbeatnamespace).pipeline metrics:
pipeline.clients: number of beat.Client instances (internal connections to pipeline)pipeline.events.total: total number of events processed by a clientpipeline.events.filtered: total number of events removed by processorspipeline.events.published: total number of events pushed to the queue/brokerpipeline.events.failed: total number of events failed to be pushed to queue (e.g. disconnect)pipeline.events.dropped: total number of events droppedpipeline.events.retry: total number of events retriedpipeline.queue.acked: total number of events ACKed by the event queue/bufferpipeline.events.active: (gauge) number of active events in pipelineoutput metrics:
output.type: configured output type (logstash, elasticsearch, ...)output.events.batches: total number of batches processed by outputoutput.events.total: total number of events processed by outputoutput.events.acked: total number of events ACKed by outputoutput.events.failed: total number of events failed in outputoutput.events.active: (gauge) events sent and waiting for ACK/fail from outputoutput.write.bytes: total amount of bytes written by outputoutput.write.errors: total number of I/O errors on writeoutput.read.bytes: total amount of bytes readoutput.read.errors: total number of I/O errors while waiting for response on output