Skip to content

Log monitoring bulk failures#14356

Merged
ycombinator merged 9 commits intoelastic:masterfrom
ycombinator:lb-mon-log-bulk-failures
Nov 14, 2019
Merged

Log monitoring bulk failures#14356
ycombinator merged 9 commits intoelastic:masterfrom
ycombinator:lb-mon-log-bulk-failures

Conversation

@ycombinator
Copy link
Copy Markdown
Contributor

@ycombinator ycombinator commented Oct 31, 2019

Resolves #14303.

As reported in #14303, when the Elasticsearch monitoring reporter in libbeat sends a bulk API request to Elasticsearch, and that request fails, the errors are currently swallowed. This is because the actual response code for the bulk API request is 200 OK; the actual errors are embedded in the request's response body.

This PR teaches the Elasticsearch monitoring reporter to parse the bulk API response and log any errors. For the parsing, the same code as the Elasticsearch output is reused.

Testing this PR

  1. Start up Elasticsearch with security enabled. Make sure you know the password for the elastic superuser.

  2. Create a role that grants necessary privileges for managing and writing to metricbeat-* indices.

    curl -s -u elastic -H 'Content-Type: application/json' 'http://localhost:9200/_security/role/mb_writer' -d '{ "cluster": [ "monitor", "manage_ilm", "manage_index_templates" ], "indices": [ { "names": [ "metricbeat-*" ], "privileges": [ "all" ] } ] }'
    
  3. Create a user with the above role.

    curl -s -u elastic -H 'Content-Type: application/json' 'http://localhost:9200/_security/user/mb_writer' -d '{ "password": "mb_writer", "roles": [ "mb_writer" ] }'
    
  4. Build Metricbeat with this PR.

    cd $GOPATH/src/github.com/elastic/beats/metricbeat
    mage build
    
  5. Start Metricbeat with monitoring enabled and specifying the credentials of the above user for the elasticsearch output.

    ./metricbeat -e -E output.elasticsearch.username=mb_writer -E output.elasticsearch.password=mb_writer -E monitoring.enabled=true
    
  6. Verify that metricbeat-* indices are being created and populated in Elasticsearch but no .monitoring-beats-* indices are being created.

    curl -s -u elastic 'http://localhost:9200/_cat/indices'
    
  7. Verify that there are warnings in the log like so:

    2019-11-01T08:57:43.910-0700    WARN    elasticsearch/client.go:258     monitoring bulk item insert failed (i=0, status=403): {"type":"security_exception","reason":"action [indices:admin/create] is unauthorized for user [mb_writer]"}
    

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported function ItemStatus should have comment or be unexported

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should omit 2nd value from range; this loop is equivalent to for i := range ...

@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/stack-monitoring (Stack monitoring)

@ycombinator
Copy link
Copy Markdown
Contributor Author

jenkins, test this

Copy link
Copy Markdown
Contributor

@ph ph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code is OK to me, but I think we should have some tests added to cover that behavior and especially if the remote system changes his behavior. I don't link how the 200 vs the 403 response code is handled in this scenario.

Looking at existing code, there is currently no unit tests for the ES/reporter and adding that to the existing python system tests might be complicated but still worth investigating.

Also for BulkReadToItems we can surely add a test for it?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 nice change

Copy link
Copy Markdown
Contributor

@ph ph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, we need to find a better way with system test, I think its a problem and we need to have a proposal for that. Maybe a way to use a specific docker-compose file for a set of test.

@ycombinator
Copy link
Copy Markdown
Contributor Author

Travis CI is green. Jenkins CI failures are unrelated. Merging.

@ycombinator ycombinator merged commit a9aff6f into elastic:master Nov 14, 2019
ycombinator added a commit that referenced this pull request Nov 19, 2019
* Log monitoring bulk failures (#14356)

* Log monitoring bulk failures

* Renaming function

* Simplifying type

* Removing extraneous second value

* Adding godoc comments

* Adding CHANGELOG entry

* Clarifying log messages

* WIP: adding unit test stubs

* Fleshing out unit tests

* [DOCS] Deprecate central management (#14104) (#14594)

* State minimum Go version (#14400) (#14598)

* [DOCS] Fix description of rename processor (#14408) (#14600)

* Log monitoring bulk failures (#14356)

* Log monitoring bulk failures

* Renaming function

* Simplifying type

* Removing extraneous second value

* Adding godoc comments

* Adding CHANGELOG entry

* Clarifying log messages

* WIP: adding unit test stubs

* Fleshing out unit tests

* Fixing up CHANGELOG
ycombinator added a commit that referenced this pull request Nov 21, 2019
* Log monitoring bulk failures

* Renaming function

* Simplifying type

* Removing extraneous second value

* Adding godoc comments

* Adding CHANGELOG entry

* Clarifying log messages

* WIP: adding unit test stubs

* Fleshing out unit tests
@ycombinator ycombinator deleted the lb-mon-log-bulk-failures branch December 25, 2019 11:08
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
* Log monitoring bulk failures (elastic#14356)

* Log monitoring bulk failures

* Renaming function

* Simplifying type

* Removing extraneous second value

* Adding godoc comments

* Adding CHANGELOG entry

* Clarifying log messages

* WIP: adding unit test stubs

* Fleshing out unit tests

* [DOCS] Deprecate central management (elastic#14104) (elastic#14594)

* State minimum Go version (elastic#14400) (elastic#14598)

* [DOCS] Fix description of rename processor (elastic#14408) (elastic#14600)

* Log monitoring bulk failures (elastic#14356)

* Log monitoring bulk failures

* Renaming function

* Simplifying type

* Removing extraneous second value

* Adding godoc comments

* Adding CHANGELOG entry

* Clarifying log messages

* WIP: adding unit test stubs

* Fleshing out unit tests

* Fixing up CHANGELOG
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Beats will not log monitoring bulk index failure

4 participants