Log monitoring bulk failures#14356
Log monitoring bulk failures#14356ycombinator merged 9 commits intoelastic:masterfrom ycombinator:lb-mon-log-bulk-failures
Conversation
There was a problem hiding this comment.
exported function ItemStatus should have comment or be unexported
There was a problem hiding this comment.
should omit 2nd value from range; this loop is equivalent to for i := range ...
|
Pinging @elastic/stack-monitoring (Stack monitoring) |
|
jenkins, test this |
ph
left a comment
There was a problem hiding this comment.
Code is OK to me, but I think we should have some tests added to cover that behavior and especially if the remote system changes his behavior. I don't link how the 200 vs the 403 response code is handled in this scenario.
Looking at existing code, there is currently no unit tests for the ES/reporter and adding that to the existing python system tests might be complicated but still worth investigating.
Also for BulkReadToItems we can surely add a test for it?
ph
left a comment
There was a problem hiding this comment.
LGTM, we need to find a better way with system test, I think its a problem and we need to have a proposal for that. Maybe a way to use a specific docker-compose file for a set of test.
|
Travis CI is green. Jenkins CI failures are unrelated. Merging. |
* Log monitoring bulk failures (#14356) * Log monitoring bulk failures * Renaming function * Simplifying type * Removing extraneous second value * Adding godoc comments * Adding CHANGELOG entry * Clarifying log messages * WIP: adding unit test stubs * Fleshing out unit tests * [DOCS] Deprecate central management (#14104) (#14594) * State minimum Go version (#14400) (#14598) * [DOCS] Fix description of rename processor (#14408) (#14600) * Log monitoring bulk failures (#14356) * Log monitoring bulk failures * Renaming function * Simplifying type * Removing extraneous second value * Adding godoc comments * Adding CHANGELOG entry * Clarifying log messages * WIP: adding unit test stubs * Fleshing out unit tests * Fixing up CHANGELOG
* Log monitoring bulk failures (elastic#14356) * Log monitoring bulk failures * Renaming function * Simplifying type * Removing extraneous second value * Adding godoc comments * Adding CHANGELOG entry * Clarifying log messages * WIP: adding unit test stubs * Fleshing out unit tests * [DOCS] Deprecate central management (elastic#14104) (elastic#14594) * State minimum Go version (elastic#14400) (elastic#14598) * [DOCS] Fix description of rename processor (elastic#14408) (elastic#14600) * Log monitoring bulk failures (elastic#14356) * Log monitoring bulk failures * Renaming function * Simplifying type * Removing extraneous second value * Adding godoc comments * Adding CHANGELOG entry * Clarifying log messages * WIP: adding unit test stubs * Fleshing out unit tests * Fixing up CHANGELOG
Resolves #14303.
As reported in #14303, when the Elasticsearch monitoring reporter in libbeat sends a bulk API request to Elasticsearch, and that request fails, the errors are currently swallowed. This is because the actual response code for the bulk API request is
200 OK; the actual errors are embedded in the request's response body.This PR teaches the Elasticsearch monitoring reporter to parse the bulk API response and log any errors. For the parsing, the same code as the Elasticsearch output is reused.
Testing this PR
Start up Elasticsearch with security enabled. Make sure you know the password for the
elasticsuperuser.Create a role that grants necessary privileges for managing and writing to
metricbeat-*indices.Create a user with the above role.
Build Metricbeat with this PR.
Start Metricbeat with monitoring enabled and specifying the credentials of the above user for the
elasticsearchoutput.Verify that
metricbeat-*indices are being created and populated in Elasticsearch but no.monitoring-beats-*indices are being created.Verify that there are warnings in the log like so: