Conversation
Instead of just a boolean, return the reason of failure in isConnectedAndHasQuorum() Signed-off-by: Thomas Graf <thomas@cilium.io>
In general, the etcd client library should fail over to a healthy etcd endpoint and quorum errors should automatically resolve. If this does not happen for some reason, report unhealthy status to trigger a restart of the agent. Signed-off-by: Thomas Graf <thomas@cilium.io>
965f374 to
0c3c493
Compare
Contributor
Author
|
test-me-please |
qmonnet
requested changes
Jul 6, 2020
Member
qmonnet
left a comment
There was a problem hiding this comment.
One concern on the tests, looks good to me otherwise.
aanm
approved these changes
Jul 6, 2020
Shuffling the list of etcd endpoints on each bootstrap has two positive effects: * Agents in the cluster will connect to different etcd members. * In case etcd fails to fail-over to due to a bug, a Cilium agent restart has a chance to fail-over to a healthy etcd member. Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Thomas Graf <thomas@cilium.io>
0c3c493 to
6316e8b
Compare
Contributor
Author
|
test-me-please |
brb
approved these changes
Jul 8, 2020
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The main change introduced by this PR Is to fail Cilium status if etcd quorum cannot be detected for 3 consecutive status intervals. By triggering a restart of the agent, the situation can hopefully be resolved. In addition, the endpoint list is shuffled on bootstrap to further improve the chances to resolve the issue.