[system] - Add a flag to the system module to control whether metrics failures mark agent as degraded#42160
Conversation
|
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
metricbeat/metricbeat.reference.yml
Outdated
| # Enable below config to mark metricset as degraded if partial metrics are emitted | ||
| #degrade_on_partial: false |
There was a problem hiding this comment.
This either needs more documentation or a more specific name. Users reading this will have no idea what partial metrics mean or what can lead to this happening.
I like the hostmetrics receiver's mute_* settings as a model: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/hostmetricsreceiver#process
Maybe we can't detect in code the specific type of error yet, but we can at least document the error classes this actually covers. We can also expect we may need or want to extend this to be more specific.
There was a problem hiding this comment.
Sure. I'll document this.
There was a problem hiding this comment.
@cmacknz I've documented in detail in reference.yml. I'll also open up a separate ticket to update https://www.elastic.co/guide/en/beats/metricbeat/current/metricbeat-metricset-system-process.html, once we're satisfied with the wordings here. Let me know what do you think
…into add-degrading-config
cmacknz
left a comment
There was a problem hiding this comment.
Thanks, a minor nit on the wording but in reviewing this again I realized documenting this for standalone Beats doesn't really make sense. It has to be documented and exposed in the system integration or nobody can use it.
Co-authored-by: Craig MacKenzie <craig.mackenzie@elastic.co>
|
@cmacknz I've removed references of config from beats, as it doesn't make sense for standalone beats. I'll work on a followup PR for agent. Let me know if you're good with this PR! Thanks! |
cmacknz
left a comment
There was a problem hiding this comment.
LGTM, nit on the sleep in one of the unit tests.
… failures mark agent as degraded (#42160) * chore: initial commit * doc and test cases * lint * remove errors.Is * chore: notice * update docs * Update metricbeat/mb/event.go Co-authored-by: Craig MacKenzie <craig.mackenzie@elastic.co> * remove references * remove sleep --------- Co-authored-by: Craig MacKenzie <craig.mackenzie@elastic.co> (cherry picked from commit 7a2f8d4)
… failures mark agent as degraded (#42160) (#42339) * chore: initial commit * doc and test cases * lint * remove errors.Is * chore: notice * update docs * Update metricbeat/mb/event.go Co-authored-by: Craig MacKenzie <craig.mackenzie@elastic.co> * remove references * remove sleep --------- Co-authored-by: Craig MacKenzie <craig.mackenzie@elastic.co> (cherry picked from commit 7a2f8d4) Co-authored-by: Vihas Makwana <121151420+VihasMakwana@users.noreply.github.com>
Add a new config to mark metricsets as degarded if partial metrics are enabled.
Currently, it's disabled by default and we will only enable it in integration test cases in elastic-agent. Once we're confident enough, we will enable this feature by default in beats and eventually, remove the flag.
Closes #40543