Skip to content

feat : implementation of BGP readiness probe#40335

Closed
rabelmervin wants to merge 6 commits intocilium:mainfrom
rabelmervin:bgp
Closed

feat : implementation of BGP readiness probe#40335
rabelmervin wants to merge 6 commits intocilium:mainfrom
rabelmervin:bgp

Conversation

@rabelmervin
Copy link
Copy Markdown

@rabelmervin rabelmervin commented Jul 2, 2025

Fixes #39468

Add optional BGP readiness probe feedback into cilium-agent health status

@rabelmervin rabelmervin requested review from a team as code owners July 2, 2025 17:24
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jul 2, 2025
@rabelmervin rabelmervin requested review from aanm and squeed July 2, 2025 17:24
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Jul 2, 2025
@maintainer-s-little-helper
Copy link
Copy Markdown

Commit 9558c87 does not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Jul 2, 2025
@maintainer-s-little-helper
Copy link
Copy Markdown

Commit 9558c87 does not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

Copy link
Copy Markdown
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! One last piece of feedback, then this looks good from my side. Please make sure to squash your commits and remove the merge commit (as discussed on Slack)

@maintainer-s-little-helper
Copy link
Copy Markdown

Commit 9558c87 does not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

ipPool: ""
# -- BGP readiness settings
bgpReadiness:
# -- Enable/Disable BGP readiness.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a bit more information here? e.g. "When enabled, Cilium pods will not be marked as Ready until configured BGP sessions have been established"

@squeed
Copy link
Copy Markdown
Contributor

squeed commented Jul 3, 2025

Do we need to worry about a flaky peer causing all pods to go non-Ready? I see that we mark the agent as ready as long as a single peer is up, which is a good idea. We might want to permanently mark the agent as ready once a single session has been fully established. Thoughts (@YutaroHayakawa)?

Copy link
Copy Markdown
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Two formatting issues appeared. Other than that looks good. Please make sure to squash your commits again!

package healthcheck

import (
"context"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The imports here are now no longer properly formatted

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an issue again

@maintainer-s-little-helper
Copy link
Copy Markdown

Commit 9558c87 does not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@YutaroHayakawa
Copy link
Copy Markdown
Member

Do we need to worry about a flaky peer causing all pods to go non-Ready? I see that we mark the agent as ready as long as a single peer is up, which is a good idea. We might want to permanently mark the agent as ready once a single session has been fully established. Thoughts (@YutaroHayakawa)?

Yeah, given the use case in the original issue (the agent rollout scenario), this makes sense I think. Because we're only interested in the initial readiness in that case. @pasteley, do you have any thoughts around that?

@pasteley
Copy link
Copy Markdown
Contributor

pasteley commented Jul 4, 2025

Do we need to worry about a flaky peer causing all pods to go non-Ready? I see that we mark the agent as ready as long as a single peer is up, which is a good idea. We might want to permanently mark the agent as ready once a single session has been fully established. Thoughts (@YutaroHayakawa)?

Yeah, given the use case in the original issue (the agent rollout scenario), this makes sense I think. Because we're only interested in the initial readiness in that case. @pasteley, do you have any thoughts around that?

Agree, but does it mean that startupProbe should be used instead?
Something like:

startupProbe:
  httpGet:
    path: /bgp/status
    port: 9879
  failureThreshold: 30
  periodSeconds: 10

@YutaroHayakawa
Copy link
Copy Markdown
Member

Agree, but does it mean that startupProbe should be used instead?

Yeah, makes sense to me. However I think we either need to introduce an endpoint dedicated for the startupProbe or introduce modifier to distinguish startup vs readiness in the /healthz endpoint in that case. Implementing it on the existing /healthz doesn't work because it is also used in the readiness probe.

That's gonna be an overkill for this small use case IMO, so I'd vote for the stateful approach that Casey mentioned (mark BGP "ready" permanently after initial session establishment).

@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Jul 9, 2025
Signed-off-by: rabelmervin <rabelmervin@gmail.com>
Signed-off-by: rabelmervin <rabelmervin@gmail.com>
@dylandreimerink
Copy link
Copy Markdown
Member

/test

Copy link
Copy Markdown
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Please make sure to address the unresolved feedback and squash your commits into one again.

@gandro
Copy link
Copy Markdown
Member

gandro commented Jul 14, 2025

Also note that CI is failing because the Helm and API files need to be re-generated.

@joestringer joestringer added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Jul 17, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jul 17, 2025
@joestringer joestringer marked this pull request as draft July 17, 2025 00:09
Copy link
Copy Markdown
Member

@joestringer joestringer Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the API change necessary if there's just a commandline flag for this?

The intent for API change would be if there is a flag on the cilium-dbg status ... command-line to optionally require BGP readiness in certain scenarios. If instead you're deploying the cilium-agent with a static requirement based on a cilium-agent flag, then you don't need these API changes at all.

@joestringer
Copy link
Copy Markdown
Member

Converting to draft while you address the feedback. When that is complete, use the "ready for review" button at the bottom of the page.

I also added a proposed release note into the description to help communicate this feature more clearly to users.

@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Aug 16, 2025
@github-actions
Copy link
Copy Markdown

This pull request has not seen any activity since it was marked stale.
Closing.

@github-actions github-actions bot closed this Aug 31, 2025
@rabelmervin
Copy link
Copy Markdown
Author

This pull request has not seen any activity since it was marked stale. Closing.

still working on this

Signed-off-by: rabelmervin <rabelmervin@gmail.com>
Signed-off-by: rabelmervin <rabelmervin@gmail.com>
Signed-off-by: rabelmervin <rabelmervin@gmail.com>
Signed-off-by: rabelmervin <rabelmervin@gmail.com>
@joestringer joestringer reopened this Jan 16, 2026
@github-actions github-actions bot removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jan 17, 2026
@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Feb 16, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 3, 2026

This pull request has not seen any activity since it was marked stale.
Closing.

@github-actions github-actions bot closed this Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/community-contribution This was a contribution made by a community member. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants