Add Elasticsearch output status reporter#239
Conversation
|
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
| es.healthWatcher.Fail(err.Error()) | ||
| es.logger.Errorf("couldn't add to bulk index request: %v", err) | ||
| // This event couldn't be attempted, so mark it as finished. | ||
| batch.Done(1) |
There was a problem hiding this comment.
Why remove the Done call here? IIRC it was needed for bookkeeping or else the batch won't be fully acknowledged.
There was a problem hiding this comment.
ah, probably removed that by accident...
output/elasticsearch/config_test.go
Outdated
| time.Sleep(time.Millisecond * 100) | ||
| watcher.Fail("simulated failure") | ||
| // should fail | ||
| time.Sleep(time.Millisecond * 600) |
There was a problem hiding this comment.
The sleeps in these tests are some heavy costs to impose on every unit test run. Can we instead make the time callback a parameter of the watcher? Typical use could pass in time.Now (or maybe that could be the default) while unit tests could pass in a mocked timer so it can be tested deterministically without any delay.
There was a problem hiding this comment.
So, I tinkered with this for a bit, but after I started running into race issues between the tests and the main watch loop, I decided it would be easier to just shift things around to make the sleeps faster, which is hopefully good enough.
There was a problem hiding this comment.
Mmm I see what you mean but it makes me sad. Probably the Right thing to do here is still to pass in a mockable helper to generate timer channels or something so we can make it deterministic, but that would require a lot of revisions for something that already mostly works. Maybe we can do a more robust generic helper when we have more components with health to report; I'll approve this one for now :-)
There was a problem hiding this comment.
Yah, agreed, I'm not a fan of it either, and I experimented with adding a few helpers/callbacks, but it just felt like the added code complexity just wasn't worth making the tests fully deterministic, particularly after shortening the sleeps in the test.
|
/test |
faec
left a comment
There was a problem hiding this comment.
lgtm pending some minor tweaks
output/elasticsearch/watcher.go
Outdated
| select { | ||
| case <-ctx.Done(): | ||
| return | ||
| default: |
There was a problem hiding this comment.
How about replace default with <-time.After(hw.waitInterval) and then remove the sleep at the end of the loop? Similar effect but then sleeps still respect context cancellation.
There was a problem hiding this comment.
Oh, good idea!
output/elasticsearch/config_test.go
Outdated
| time.Sleep(time.Millisecond * 100) | ||
| watcher.Fail("simulated failure") | ||
| // should fail | ||
| time.Sleep(time.Millisecond * 600) |
There was a problem hiding this comment.
Mmm I see what you mean but it makes me sad. Probably the Right thing to do here is still to pass in a mockable helper to generate timer channels or something so we can make it deterministic, but that would require a lot of revisions for something that already mostly works. Maybe we can do a more robust generic helper when we have more components with health to report; I'll approve this one for now :-)
What does this PR do?
closes #174
This adds a reporting utility to the elasticsearch output, which accepts a callback that updates the unit state if the output fails for a given number of seconds.
I've tested this, but between #240 and elastic/beats#34319 it's a tad hard to test.
Checklist
CHANGELOG.mdorCHANGELOG-developer.md.