outlier: fix consecutive 5xx without interleaved success#1493
outlier: fix consecutive 5xx without interleaved success#1493mattklein123 merged 6 commits intomasterfrom
Conversation
| // charged with only 5xx ResponseCodes without an intervening non-5xx code. Therefore, | ||
| // the consecutive_5xx_ counter should be reset to allow the Sink to detect another bout | ||
| // of 5xx from this host. | ||
| consecutive_5xx_ = 0; |
There was a problem hiding this comment.
It's a minor thing, but to avoid multiple threads thrashing I would do this in onConsecutive5xxWorker
| return; | ||
| } | ||
|
|
||
| // There are two outcomes here. The ejection will be enforced, |
There was a problem hiding this comment.
This comment does not fully make sense after the move.
There was a problem hiding this comment.
This comment still doesn't make sense unless I am confused. There is only 1 outcome here. The ejection has been enforced at this point, the host is being ejected, and we are resetting. Please also reflow the comment.
Also, I think the reset should happen after the ejection to avoid a small race condition and re-fire of the event.
There was a problem hiding this comment.
the enforcement decision happens in the ejectHost function, so at this point the ejection can still be enforced or not.
If we do the reset after enforcing the ejection, we leave hosts whose ejection was not enforced in the problematic state.
There was a problem hiding this comment.
OK I see. I just meant move this code to after the ejectHost() call (so end of function). It will happen whether it gets ejected or not but in the case it is ejected there will be no race. Please reflow the comment and you have a small typo below "host keeps been".
There was a problem hiding this comment.
I see as well :). Updated the comment. I think this one flows a bit more clearly. Let me know.
|
@junr03 also please update PR description to final commit message. Thank you. |
mattklein123
left a comment
There was a problem hiding this comment.
looks good. Small nit, please finalize the PR description for the commit message that will be used.
| // Reset the DetectorHostSink's consecutive_5xx_. This counter needs to be reset | ||
| // in order to allow the Sink to detect a bout of consecutive 5xx responses even | ||
| // if the sink is not charged with an interleaved non-5xx code. | ||
| // The reset is done here instead of the Sink's putHttpResponseCode call to prevent |
There was a problem hiding this comment.
Still needs reflow with previous line.
Fixes #1491.
The DetectorHostSink has a consecutive_5xx_ counter that gets incremented every time a 5xx response gets charged to the sink, and reset every time a non-5xx does. Previously if a Sink reached the consecutive 5xx limit, and the host was charged with a consecutive 5xx ejection, the consecutive_5xx_ counter was not been reset. This meant that the Sink would not trigger another consecutive 5xx ejection until a non-5xx code was charged and the counter was reset, and then subsequently increased to the limit. This PR fixes the issue by resetting the counter after the host has been charged with an ejection (whether the ejection is enforced or not).