Skip to content

Ensure endpoint validation occurs before initial regeneration#11714

Merged
tgraf merged 2 commits intomasterfrom
pr/tgraf/fix-restore-validation
May 29, 2020
Merged

Ensure endpoint validation occurs before initial regeneration#11714
tgraf merged 2 commits intomasterfrom
pr/tgraf/fix-restore-validation

Conversation

@tgraf
Copy link
Copy Markdown
Contributor

@tgraf tgraf commented May 27, 2020

Due to a performance optimization, validateEndpoint() was changed to use
the pod store instad of contacting the apiserver directly. This required
to make the validation async to wait until the pod cache is available.

This change however caused the endpoint to be exposed and regenerated
before the endpoint was fully validated. This in turn caused the
validation to occur while the regeneration was being performed. When the
validation then failed, the endpoint was deleted while the endpoint was
being regenerated, causing the endpoint to fail with a "JoinEP" message
in the logs.

This commit requires the endpoint to be validated before exposure and
initial regeneration.

This problem only exists in master. No need to backport.

Fixes: #11645
Fixes: #11076
Fixes: #10856
Fixes: #10667
Fixes: #10278
Fixes: 9975bba ("pkg/endpoint: fetch pod and namespace labels from local stores")

@tgraf tgraf added kind/bug This is a bug in the Cilium logic. area/CI Continuous Integration testing issue or flake release-note/misc This PR makes changes that have no direct user impact. ci/flake This is a known failure that occurs in the tree. Please investigate me! labels May 27, 2020
@tgraf tgraf requested a review from a team May 27, 2020 13:57
@tgraf tgraf requested a review from a team as a code owner May 27, 2020 13:57
@tgraf tgraf requested a review from a team May 27, 2020 13:57
@tgraf tgraf force-pushed the pr/tgraf/fix-restore-validation branch from 0e82d79 to a71b026 Compare May 27, 2020 14:00
@tgraf
Copy link
Copy Markdown
Contributor Author

tgraf commented May 27, 2020

test-me-please

@coveralls
Copy link
Copy Markdown

coveralls commented May 27, 2020

Coverage Status

Coverage increased (+0.07%) to 36.913% when pulling 8ff6f77 on pr/tgraf/fix-restore-validation into 5e5c626 on master.

@tgraf
Copy link
Copy Markdown
Contributor Author

tgraf commented May 27, 2020

restart-1.11

Copy link
Copy Markdown
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me, besides one question that I have.

Comment thread daemon/cmd/daemon.go Outdated
@tgraf tgraf marked this pull request as draft May 28, 2020 09:11
@tgraf tgraf force-pushed the pr/tgraf/fix-restore-validation branch from a71b026 to facb7a0 Compare May 28, 2020 09:41
@tgraf
Copy link
Copy Markdown
Contributor Author

tgraf commented May 28, 2020

test-me-please

@tgraf tgraf force-pushed the pr/tgraf/fix-restore-validation branch from facb7a0 to 22b8c5d Compare May 28, 2020 09:45
@tgraf
Copy link
Copy Markdown
Contributor Author

tgraf commented May 28, 2020

test-me-please

1 similar comment
@tgraf
Copy link
Copy Markdown
Contributor Author

tgraf commented May 28, 2020

test-me-please

@tgraf
Copy link
Copy Markdown
Contributor Author

tgraf commented May 29, 2020

test-me-please

tgraf added 2 commits May 29, 2020 21:57
A following commit will require to wait on the K8sAPIGroupPodV1Core
cache

Signed-off-by: Thomas Graf <thomas@cilium.io>
Due to a performance optimization, validateEndpoint() was changed to use
the pod store instad of contacting the apiserver directly. This required
to make the validation async to wait until the pod cache is available.

This change however caused the endpoint to be exposed and regenerated
before the endpoint was fully validated. This in turn caused the
validation to occur while the regeneration was being performed. When the
validation then failed, the endpoint was deleted while the endpoint was
being regenerated, causing the endpoint to fail with a "JoinEP" message
in the logs.

This commit requires the endpoint to be validated before exposure and
initial regeneration.

Fixes: #11645
Fixes: #11076
Fixes: #10856
Fixes: #10667
Fixes: #10278
Fixes: 9975bba ("pkg/endpoint: fetch pod and namespace labels from local stores")

Signed-off-by: Thomas Graf <thomas@cilium.io>
@tgraf tgraf force-pushed the pr/tgraf/fix-restore-validation branch from 22b8c5d to 8ff6f77 Compare May 29, 2020 20:01
@tgraf
Copy link
Copy Markdown
Contributor Author

tgraf commented May 29, 2020

test-me-please

@tgraf tgraf marked this pull request as ready for review May 29, 2020 22:56
@tgraf tgraf merged commit 05163ed into master May 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/CI Continuous Integration testing issue or flake ci/flake This is a known failure that occurs in the tree. Please investigate me! kind/bug This is a bug in the Cilium logic. release-note/misc This PR makes changes that have no direct user impact.

Projects

None yet

5 participants