Wavefront output should distinguish between retryable and non-retryable errors#8404
Conversation
|
Hey @fishy . it's up to each output to decide what's retry-able and not. If an output gets an error trying to post a metric and it doesn't want it to be retried, it should log the error and continue, not returning an error from the |
|
@ssoroka Thanks for the feedback, Steven! That sounds reasonable, but the problems are that if we go that route we need to make sure that the output plugin has the actual working logger configured, or the attempt to log and swallow the error will go into a blackhole, or even cause panic. Looking at the plugin code, it does have the logger in the struct, but I'm not entirely sure whether or not it's configured (for example, it's not initialized during the init function: telegraf/plugins/outputs/wavefront/wavefront.go Lines 348 to 355 in ca04106 Log field in wavefront plugin?
|
|
if you're looking at |
07913a6 to
ace138e
Compare
|
Thanks, @ssoroka . I updated the PR to only check retryable inside wavefront plugin. Please take another look. (commit message and PR description also updated accordingly) |
| return fmt.Errorf("Wavefront sending error: %v", err) | ||
| } | ||
| w.Log.Errorf("non-retryable error during Wavefront.Write: %v", err) | ||
| return nil |
There was a problem hiding this comment.
You might not want to return here, and instead finish writing the batches, as some of those metrics are probably still good.
…le errors Currently we assume all wavefront send errors are retryable, and when an error happens during Write function we will reject the buffer and keep retrying the next tick. This means that when an actually non-retryable error happens, we'll just keep getting the same error on every tick, and never flush the buffer. One such error we encountered is "empty metric name" error. Add isRetryable function to detect non-retryable errors, and make it default to assume that all errors are retryable (so it matches the current behavior), but make it possible to mark certain errors as non-retryable. Currently only handled that "empty metric name" error as non-retryable. A support ticket has been filed against wavefront to provide a canonical way to distinguish between retryable and non-retryable errors. Signed-off-by: Yuxuan 'fishy' Wang <yuxuan.wang@reddit.com>
ace138e to
12a7d53
Compare
|
Merged! thank you |
|
@ssoroka Thanks! I assume this will be included in 1.16.3 release? Do you know when will that release happen? |
|
Yep! I think we might have another one in a week or two. Until then it'll be in the nightly release. |
Currently we assume all wavefront send errors are retryable, and when an
error happens during Write function we will reject the buffer and keep
retrying the next tick. This means that when an actually non-retryable
error happens, we'll just keep getting the same error on every tick, and
never flush the buffer.
One such error we encountered is "empty metric name" error.
Add isRetryable function to detect non-retryable errors, and make it
default to assume that all errors are retryable (so it matches the
current behavior), but make it possible to mark certain errors as
non-retryable.
Currently only handled that "empty metric name" error as non-retryable.
A support ticket has been filed against wavefront to provide a canonical
way to distinguish between retryable and non-retryable errors.
Signed-off-by: Yuxuan 'fishy' Wang yuxuan.wang@reddit.com
Required for all PRs: