Skip to content

daemon: Fix error logic flow for pod store being out of date#34389

Merged
christarazi merged 1 commit intocilium:mainfrom
christarazi:pr/christarazi/fix-sts-case
Aug 21, 2024
Merged

daemon: Fix error logic flow for pod store being out of date#34389
christarazi merged 1 commit intocilium:mainfrom
christarazi:pr/christarazi/fix-sts-case

Conversation

@christarazi
Copy link
Copy Markdown
Member

@christarazi christarazi commented Aug 14, 2024

In the endpoint creation path, if Cilium does not have the K8s Pod
reference in the local store, then the logic is to fetch the latest from
the apiserver directly. However, in the case that the fetch succeeds,
the error variable is not clear. This results in Cilium continuing in
the "unhappy" path, even though it should be the "happy" path.

Relevant log msgs from a sysdump displaying this behavior:

level=info msg="Create endpoint request" addressing="&{10.0.0.133 ba039cfd-7061-4c45-8526-1a5f569d16de default   }" containerID=3bed19eb3aabbb100ec39037a364cf1cbb10ca2666c14faec9893a2dc129844a containerInterface=eth0 datapathConfiguration="&{false false false false false <nil>}" interface=lxc5c692ead6e7a k8sPodName=default/nginx-static-pod-master-node k8sUID=cc8a369e7eac1fc96b6e3b51830c86e9 labels="[]" subsys=daemon sync-build=true
level=warning msg="Detected outdated Pod UID during Endpoint creation. Endpoint creation cannot proceed with an outdated Pod store. Attempting to fetch latest Pod." k8sPodName=default/nginx-static-pod-master-node k8sUID=cc8a369e7eac1fc96b6e3b51830c86e9 subsys=daemon
level=warning msg="Timeout occurred waiting for Pod store, fetching latest Pod via the apiserver." k8sPodName=default/nginx-static-pod-master-node k8sUID=cc8a369e7eac1fc96b6e3b51830c86e9 subsys=daemon
level=warning msg="Unable to fetch kubernetes labels" ciliumEndpointName=/ containerID= containerInterface= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=0 error="pod store outdated" ipv4= ipv6= k8sPodName=/ subsys=api

Fixes: f6606c6 ("daemon,endpoint,cni: Pass pod UID through CNI ADD")
Signed-off-by: Chris Tarazi chris@isovalent.com


Fixes: #34197

@christarazi christarazi added area/daemon Impacts operation of the Cilium daemon. release-note/bug This PR fixes an issue in a previous release of Cilium. area/agent Cilium agent related. affects/v1.13 This issue affects v1.13 branch affects/v1.14 This issue affects v1.14 branch needs-backport/1.15 kind/bug This is a bug in the Cilium logic. labels Aug 14, 2024
@christarazi
Copy link
Copy Markdown
Member Author

The checkpatch job is failing on a false positive, FYI.

@christarazi
Copy link
Copy Markdown
Member Author

/test

@christarazi christarazi marked this pull request as ready for review August 14, 2024 18:53
@christarazi christarazi requested a review from a team as a code owner August 14, 2024 18:53
@christarazi christarazi requested a review from squeed August 14, 2024 18:53
@christarazi christarazi enabled auto-merge August 14, 2024 18:53
Copy link
Copy Markdown
Contributor

@squeed squeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extremely lgtm.

@aanm
Copy link
Copy Markdown
Member

aanm commented Aug 14, 2024

/test

@christarazi
Copy link
Copy Markdown
Member Author

Images failed to build so rerunning tests.

@christarazi
Copy link
Copy Markdown
Member Author

/test

In the endpoint creation path, if Cilium does not have the K8s Pod
reference in the local store, then the logic is to fetch the latest from
the apiserver directly. However, in the case that the fetch succeeds,
the error variable is not clear. This results in Cilium continuing in
the "unhappy" path, even though it should be the "happy" path.

Relevant log msgs from a sysdump displaying this behavior:

```
level=info msg="Create endpoint request" addressing="&{10.0.0.133 ba039cfd-7061-4c45-8526-1a5f569d16de default   }" containerID=3bed19eb3aabbb100ec39037a364cf1cbb10ca2666c14faec9893a2dc129844a containerInterface=eth0 datapathConfiguration="&{false false false false false <nil>}" interface=lxc5c692ead6e7a k8sPodName=default/nginx-static-pod-master-node k8sUID=cc8a369e7eac1fc96b6e3b51830c86e9 labels="[]" subsys=daemon sync-build=true
level=warning msg="Detected outdated Pod UID during Endpoint creation. Endpoint creation cannot proceed with an outdated Pod store. Attempting to fetch latest Pod." k8sPodName=default/nginx-static-pod-master-node k8sUID=cc8a369e7eac1fc96b6e3b51830c86e9 subsys=daemon
level=warning msg="Timeout occurred waiting for Pod store, fetching latest Pod via the apiserver." k8sPodName=default/nginx-static-pod-master-node k8sUID=cc8a369e7eac1fc96b6e3b51830c86e9 subsys=daemon
level=warning msg="Unable to fetch kubernetes labels" ciliumEndpointName=/ containerID= containerInterface= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=0 error="pod store outdated" ipv4= ipv6= k8sPodName=/ subsys=api
```

Fixes: f6606c6 ("daemon,endpoint,cni: Pass pod UID through CNI ADD")
Signed-off-by: Chris Tarazi <chris@isovalent.com>
@christarazi christarazi force-pushed the pr/christarazi/fix-sts-case branch from 4047b4f to 12a3635 Compare August 20, 2024 23:46
@christarazi
Copy link
Copy Markdown
Member Author

/test

@christarazi christarazi added this pull request to the merge queue Aug 21, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Aug 21, 2024
Merged via the queue into cilium:main with commit 143ca65 Aug 21, 2024
@christarazi christarazi deleted the pr/christarazi/fix-sts-case branch August 21, 2024 18:52
@tklauser tklauser mentioned this pull request Aug 27, 2024
13 tasks
@tklauser tklauser mentioned this pull request Aug 27, 2024
4 tasks
@github-actions github-actions bot added backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. backport-done/1.16 The backport for Cilium 1.16.x for this PR is done. and removed backport-pending/1.15 labels Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

affects/v1.13 This issue affects v1.13 branch affects/v1.14 This issue affects v1.14 branch area/agent Cilium agent related. area/daemon Impacts operation of the Cilium daemon. backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. backport-done/1.16 The backport for Cilium 1.16.x for this PR is done. kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cilium Error: "metadata resolver: pod store out-of-date" for Static Pod in Kubernetes

4 participants