Skip to content

outlier: fix consecutive 5xx without interleaved success#1493

Merged
mattklein123 merged 6 commits intomasterfrom
fix-consecutive-5xx
Aug 22, 2017
Merged

outlier: fix consecutive 5xx without interleaved success#1493
mattklein123 merged 6 commits intomasterfrom
fix-consecutive-5xx

Conversation

@junr03
Copy link
Copy Markdown
Member

@junr03 junr03 commented Aug 18, 2017

Fixes #1491.

The DetectorHostSink has a consecutive_5xx_ counter that gets incremented every time a 5xx response gets charged to the sink, and reset every time a non-5xx does. Previously if a Sink reached the consecutive 5xx limit, and the host was charged with a consecutive 5xx ejection, the consecutive_5xx_ counter was not been reset. This meant that the Sink would not trigger another consecutive 5xx ejection until a non-5xx code was charged and the counter was reset, and then subsequently increased to the limit. This PR fixes the issue by resetting the counter after the host has been charged with an ejection (whether the ejection is enforced or not).

// charged with only 5xx ResponseCodes without an intervening non-5xx code. Therefore,
// the consecutive_5xx_ counter should be reset to allow the Sink to detect another bout
// of 5xx from this host.
consecutive_5xx_ = 0;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a minor thing, but to avoid multiple threads thrashing I would do this in onConsecutive5xxWorker

return;
}

// There are two outcomes here. The ejection will be enforced,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment does not fully make sense after the move.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment still doesn't make sense unless I am confused. There is only 1 outcome here. The ejection has been enforced at this point, the host is being ejected, and we are resetting. Please also reflow the comment.

Also, I think the reset should happen after the ejection to avoid a small race condition and re-fire of the event.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the enforcement decision happens in the ejectHost function, so at this point the ejection can still be enforced or not.

If we do the reset after enforcing the ejection, we leave hosts whose ejection was not enforced in the problematic state.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I see. I just meant move this code to after the ejectHost() call (so end of function). It will happen whether it gets ejected or not but in the case it is ejected there will be no race. Please reflow the comment and you have a small typo below "host keeps been".

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see as well :). Updated the comment. I think this one flows a bit more clearly. Let me know.

@mattklein123
Copy link
Copy Markdown
Member

@junr03 also please update PR description to final commit message. Thank you.

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. Small nit, please finalize the PR description for the commit message that will be used.

// Reset the DetectorHostSink's consecutive_5xx_. This counter needs to be reset
// in order to allow the Sink to detect a bout of consecutive 5xx responses even
// if the sink is not charged with an interleaved non-5xx code.
// The reset is done here instead of the Sink's putHttpResponseCode call to prevent
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still needs reflow with previous line.

@mattklein123 mattklein123 merged commit b137f0f into master Aug 22, 2017
@mattklein123 mattklein123 deleted the fix-consecutive-5xx branch August 22, 2017 13:58
jpsim pushed a commit that referenced this pull request Nov 28, 2022
Signed-off-by: Mike Schore <mike.schore@gmail.com>
Signed-off-by: JP Simard <jp@jpsim.com>
jpsim pushed a commit that referenced this pull request Nov 29, 2022
Signed-off-by: Mike Schore <mike.schore@gmail.com>
Signed-off-by: JP Simard <jp@jpsim.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Outlier detection: consecutive_5xx_ counter not reset after host ejection.

2 participants