fix(helm): Improve chart test script (fixes #1782): by junhaoliao · Pull Request #1784 · y-scope/clp

junhaoliao · 2025-12-16T02:53:22Z

Fix directory name from results-cache to results_cache to match PV's hostPath.
Add wait_for_pods() function to wait for jobs and pods to be ready.
Replace helm uninstall || true with helm uninstall --ignore-not-found.

Description

Note

This PR is part of the ongoing work for #1309. More PRs will be submitted until the Helm chart is complete and fully functional.

Fix the directory name mismatch introduced in #1771, where the Kubernetes PV hostPath was changed from results-cache to results_cache but the test script wasn't updated to match.

Also improve the Helm test script by adding a wait_for_pods function that waits for all jobs to complete and all non-job pods to be ready before considering the test successful.

Changes include:

Fix directory name from results-cache to results_cache to match the PV hostPath (fixes Update test.sh to use results_cache directory naming #1782)
Add wait_for_pods() function with configurable timeout, poll interval, and wait timeout parameters
Use kubectl wait to wait for job completion and pod readiness
Replace helm uninstall test || true with helm uninstall test --ignore-not-found for cleaner error handling

Checklist

The PR satisfies the contribution guidelines.
This is a breaking change and that has been indicated in the PR title, OR this isn't a
breaking change.
Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Ran the test script locally to verify:
- The wait_for_pods function correctly waits for all pods to be ready
- The script properly times out if pods don't become ready
- Helm installation and cleanup work as expected

Summary by CodeRabbit

Chores
- Enhanced deployment process reliability with improved pod readiness verification before proceeding with installation steps.
- Implemented safer uninstall procedures to prevent errors during deployment cleanup.
- Updated internal deployment infrastructure paths for consistency.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

- Fix directory name from `results-cache` to `results_cache` to match PV hostPath - Add `wait_for_pods()` function to wait for jobs and pods to be ready - Replace `helm uninstall || true` with `helm uninstall --ignore-not-found`

coderabbitai · 2025-12-16T02:53:32Z

Walkthrough

The test script was updated to standardise directory naming from results-cache to results_cache, add a new wait_for_pods function to verify pod readiness after Helm installation, and improve Helm release cleanup with the --ignore-not-found flag.

Changes

Cohort / File(s)	Change Summary
Test Script Enhancement `tools/deployment/package-helm/test.sh`	Added `wait_for_pods()` function to poll for Job completion and Pod readiness with configurable timeouts and progress messaging. Updated directory path from `results-cache` to `results_cache` for naming consistency. Modified Helm uninstall command to include `--ignore-not-found` flag. Integrated `wait_for_pods` invocation after Helm installation.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

wait_for_pods function logic: Verify kubectl wait command syntax, timeout/polling implementation, and error handling are correct
Directory naming consistency: Confirm results_cache aligns with Helm configuration and other deployment scripts

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Out of Scope Changes check	❓ Inconclusive	The pull request includes a wait_for_pods function and helm uninstall command improvements that are enhancements beyond the linked issue scope but support robust testing.	Clarify whether the wait_for_pods function and helm uninstall improvements were intentional enhancements or if they should be addressed separately.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	The pull request fully addresses the primary objective from issue #1782: updating the directory path from results-cache to results_cache in test.sh.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main changes: improving the Helm test script and fixing the directory naming issue (#1782), which aligns with the primary objectives of the pull request.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bbc3b71 and bbe6faf.

📒 Files selected for processing (1)

tools/deployment/package-helm/test.sh (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: package-image
GitHub Check: build (macos-15)
GitHub Check: build (ubuntu-24.04)
GitHub Check: lint-check (ubuntu-24.04)
GitHub Check: lint-check (macos-15)

🔇 Additional comments (3)

tools/deployment/package-helm/test.sh (3)

56-56: Correct directory name fix.

This change correctly addresses issue #1782 by changing results-cache to results_cache, aligning the directory name with the PV hostPath rename from PR #1771.

75-75: Improved Helm cleanup command.

The change from helm uninstall || true to helm uninstall test --ignore-not-found is a good improvement. The --ignore-not-found flag is more explicit and only suppresses "not found" errors rather than all errors, providing better error handling.

79-79: Appropriate wait parameters.

The function call with a 300-second overall timeout, 5-second poll interval, and 5-second wait timeout is well-suited for a test environment. This ensures pods are ready before the script proceeds.

coderabbitai · 2025-12-16T03:02:35Z

+# Waits for all jobs to complete and all non-job pods to be ready.
+#
+# @param {int} timeout_seconds Overall timeout in seconds
+# @param {int} poll_interval_seconds Interval between status checks
+# @param {int} wait_timeout_seconds Timeout for each kubectl wait call
+# @return {int} 0 on success, 1 on timeout
+wait_for_pods() {
+    local timeout_seconds=$1
+    local poll_interval_seconds=$2
+    local wait_timeout_seconds=$3
+
+    echo "Waiting for all pods to be ready" \
+        "(timeout=${timeout_seconds}s, poll=${poll_interval_seconds}s," \
+        "wait=${wait_timeout_seconds}s)..."
+
+    # Reset bash built-in SECONDS counter
+    SECONDS=0
+
+    while true; do
+        sleep "${poll_interval_seconds}"
+        kubectl get pods
+
+        if kubectl wait job \
+                --all \
+                --for=condition=Complete \
+                --timeout="${wait_timeout_seconds}s" 2>/dev/null \
+            && kubectl wait pods \
+                --all \
+                --selector='!job-name' \
+                --for=condition=Ready \
+                --timeout="${wait_timeout_seconds}s" 2>/dev/null
+        then
+            echo "All jobs completed and services are ready."
+            return 0
+        fi
+
+        if [[ ${SECONDS} -ge ${timeout_seconds} ]]; then
+            echo "ERROR: Timed out waiting for pods to be ready"
+            return 1
+        fi
+
+        echo "---"
+    done
+}


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Check if the file exists and read the relevant lines if [ -f "tools/deployment/package-helm/test.sh" ]; then echo "File found. Reading lines 9-52:" sed -n '9,52p' "tools/deployment/package-helm/test.sh" else echo "File not found at tools/deployment/package-helm/test.sh" echo "Searching for test.sh files:" find . -name "test.sh" -type f | head -20 fi

Repository: y-scope/clp

Length of output: 1482

🏁 Script executed:

# Verify kubectl wait behavior with no jobs using kubectl documentation and basic testing logic # We'll check the kubectl API behavior for wait command # Check kubectl wait help documentation kubectl wait --help 2>/dev/null | grep -A 20 "for=condition=" || echo "kubectl documentation check" # Test the behavior logically: # According to Kubernetes documentation, kubectl wait with --all # should return 0 when there are no resources matching the selector echo "Checking kubectl wait behavior assumptions..." </shell

Repository: y-scope/clp

Length of output: 199

🌐 Web query:

kubectl wait command behavior when no resources match selector empty namespace

💡 Result:

Short answer

kubectl wait will not block if your selector matches no resources in the given namespace — it returns immediately with "No resources found" / "error: no matching resources found" (it only waits for resources that already exist). [1][2][3]

Why

wait operates on the set of matching resources and only watches existing objects for the requested condition. If the selector/namespace yields an empty set, there's nothing to observe, so the command exits instead of waiting. [1][2]

Workarounds

If you want to wait for a resource to be created, use --for=create. Example: kubectl wait --for=create -l app=foo pod -n myns --timeout=60s.

Alternatively, wait on a higher-level object that already exists (e.g., Deployment with condition=available) or implement a loop that retries kubectl wait until resources appear. [1][2][3]

Sources

kubectl wait reference. [1]

Community discussion / issue explaining immediate return when selector matches nothing. [2]

Stack Overflow examples of the "no matching resources found" behavior and mitigation. [3]

Critical issue: Function fails when namespace has no jobs.

The wait_for_pods function has a logic flaw that causes it to timeout instead of succeeding in deployments without Kubernetes Jobs. When kubectl wait job --all executes with no matching jobs, it immediately returns an error (not success) with "error: no matching resources found". Because the function uses && logic between the job wait and pod wait commands (lines 27-32), the entire condition fails when no jobs exist, forcing the loop to retry until the overall timeout is exceeded.

This breaks the function for common scenarios such as Helm charts that deploy only Pods/Services without Jobs. The function needs to handle empty resource sets gracefully, either by:

Checking if jobs exist before attempting kubectl wait job, or

Using conditional logic to treat "no resources found" as a success condition

The timeout handling logic is sound, but this condition check must be fixed before merging.

🤖 Prompt for AI Agents

In tools/deployment/package-helm/test.sh around lines 9-52, the current compound condition uses `kubectl wait job --all` which fails when there are no jobs and causes the function to timeout; update the logic to treat "no matching resources found" as success by first checking for jobs (e.g., `kubectl get jobs --no-headers -o name` and only run `kubectl wait job` if any exist) or by running the wait and treating an exit indicating no resources as non-fatal (capture exit code/stderr and proceed to the pods wait). Ensure the overall condition succeeds when there are zero jobs so only the pod readiness check blocks progress.

kirkrodrigues

Updated the PR title & body directly.

) - Fix directory name from `results-cache` to `results_cache` to match PV's `hostPath`. - Add `wait_for_pods()` function to wait for jobs and pods to be ready. - Replace `helm uninstall || true` with `helm uninstall --ignore-not-found`.

fix(helm): Improve Helm test script (fixes y-scope#1782):

43d4fa1

- Fix directory name from `results-cache` to `results_cache` to match PV hostPath - Add `wait_for_pods()` function to wait for jobs and pods to be ready - Replace `helm uninstall || true` with `helm uninstall --ignore-not-found`

junhaoliao requested a review from a team as a code owner December 16, 2025 02:53

junhaoliao and others added 4 commits December 15, 2025 21:54

Merge branch 'main' into fix-helm-test

d848b72

add newline

353aef1

chore(helm): Bump chart version to 0.1.2-dev.5.

bba0d20

revert helm version change

bbe6faf

This was referenced Dec 16, 2025

refactor(helm): Refactor createLocalPv: #1771

Merged

refactor(helm): Replace dependency waiting logic with waitFor helper leveraging new job-watcher service account privileges. #1772

Merged

junhaoliao requested a review from kirkrodrigues December 16, 2025 02:59

chore(helm): Bump chart version to 0.1.2-dev.5.

9eb8b50

coderabbitai Bot reviewed Dec 16, 2025

View reviewed changes

junhaoliao mentioned this pull request Dec 16, 2025

feat(helm): Add compression & query job orchestration and the webui to the chart. #1749

Merged

3 tasks

kirkrodrigues changed the title ~~fix(helm): Improve Helm test script (fixes #1782):~~ fix(helm): Improve chart test script (fixes #1782): Dec 16, 2025

kirkrodrigues approved these changes Dec 16, 2025

View reviewed changes

junhaoliao merged commit 49c20df into y-scope:main Dec 16, 2025
22 checks passed

junhaoliao deleted the fix-helm-test branch December 16, 2025 04:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(helm): Improve chart test script (fixes #1782):#1784

fix(helm): Improve chart test script (fixes #1782):#1784
junhaoliao merged 6 commits into
y-scope:mainfrom
junhaoliao:fix-helm-test

junhaoliao commented Dec 16, 2025 •

edited

Loading

Uh oh!

coderabbitai Bot commented Dec 16, 2025 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Dec 16, 2025

Uh oh!

kirkrodrigues left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

junhaoliao commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Validation performed

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

kirkrodrigues left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

junhaoliao commented Dec 16, 2025 •

edited

Loading

coderabbitai Bot commented Dec 16, 2025 •

edited

Loading