Skip to content

feat(helm): Add readiness and liveness probes to query-scheduler (resolves #1813).#1816

Merged
junhaoliao merged 3 commits into
y-scope:mainfrom
junhaoliao:k8s-query-scheduler-health
Dec 19, 2025
Merged

feat(helm): Add readiness and liveness probes to query-scheduler (resolves #1813).#1816
junhaoliao merged 3 commits into
y-scope:mainfrom
junhaoliao:k8s-query-scheduler-health

Conversation

@junhaoliao

Copy link
Copy Markdown
Member

Description

Note

This PR is part of the ongoing work for #1309. More PRs will be submitted until the Helm chart is complete and fully functional.

Add readiness and liveness probes to the query-scheduler deployment to improve reliability and enable Kubernetes to properly manage the pod lifecycle. This aligns the Helm deployment with the healthcheck settings already defined in the docker-compose configuration.

The probes use:

  • tcpSocket check on port 7000 (the query-scheduler internal port)
  • Standard probe timings from clp.readinessProbeTimings and clp.livenessProbeTimings helpers
  • YAML anchors for DRY configuration (consistent with webui-deployment.yaml pattern)

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

cd tools/deployment/package-helm
./test.sh

# observed "All jobs completed and services are ready."

@junhaoliao junhaoliao requested a review from a team as a code owner December 19, 2025 04:18
@coderabbitai

coderabbitai Bot commented Dec 19, 2025

Copy link
Copy Markdown
Contributor

Warning

Rate limit exceeded

@junhaoliao has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 16 minutes and 27 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 4158c61 and c3d8277.

📒 Files selected for processing (2)
  • tools/deployment/package-helm/Chart.yaml (1 hunks)
  • tools/deployment/package-helm/templates/query-scheduler-deployment.yaml (1 hunks)
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

sitaowang1998
sitaowang1998 previously approved these changes Dec 19, 2025

@sitaowang1998 sitaowang1998 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

However, I do feel that in long term as our system faces scalability limits, the liveness check will fail even though our system is still functioning, but just slow. A restart in this case would further degrade the system, and probably propagate the issue and render the whole system not functioning at all. This is well-documented in the k8s doc.

A naïve check on a serving port is fine for now, but we need to come up with better solution that works under heavy load.

@junhaoliao junhaoliao merged commit c7c2ffc into y-scope:main Dec 19, 2025
20 checks passed
davidlion pushed a commit to davidlion/clp that referenced this pull request Jan 17, 2026
@junhaoliao junhaoliao deleted the k8s-query-scheduler-health branch May 7, 2026 19:46
junhaoliao added a commit to junhaoliao/clp that referenced this pull request May 17, 2026
junhaoliao added a commit to junhaoliao/clp that referenced this pull request May 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants