Skip to content

fix: deploy.sh retry logic, CUJ2 doc cleanup, and test reporting guide#404

Merged
mchmarny merged 1 commit intomainfrom
fix/cuj2-timeout-issue
Mar 13, 2026
Merged

fix: deploy.sh retry logic, CUJ2 doc cleanup, and test reporting guide#404
mchmarny merged 1 commit intomainfrom
fix/cuj2-timeout-issue

Conversation

@lockwobr
Copy link
Copy Markdown
Contributor

@lockwobr lockwobr commented Mar 13, 2026

Summary

  • Add retry logic to the deploy.sh
  • Cleaned a regex's in deploy.sh
  • Add INT/TERM signal trap to deploy.sh template so Ctrl+C exits cleanly
  • Document --retries 0 (fail-fast) in deploy.sh usage comment
  • Remove non-existent hf-token-secret.yaml step from CUJ2 doc
  • Add demos/README.md with demo index and session recording guide
  • Add example CUJ2 test report in demos/examples/
  • fix in install script to cross platform with linux

closes #335

Motivation / Context

Fixes:
Related:

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Refactoring (no functional changes)
  • Build/CI/tooling

Component(s) Affected

  • CLI (cmd/aicr, pkg/cli)
  • API server (cmd/aicrd, pkg/api, pkg/server)
  • Recipe engine / data (pkg/recipe)
  • Bundlers (pkg/bundler, pkg/component/*)
  • Collectors / snapshotter (pkg/collector, pkg/snapshotter)
  • Validator (pkg/validator)
  • Core libraries (pkg/errors, pkg/k8s)
  • Docs/examples (docs/, examples/)
  • Other: ____________

Implementation Notes

Testing

# Commands run (prefer `make qualify` for non-trivial changes)
make qualify

Risk Assessment

  • Low — Isolated change, well-tested, easy to revert
  • Medium — Touches multiple components or has broader impact
  • High — Breaking change, affects critical paths, or complex rollout

Rollout notes:

Checklist

  • Tests pass locally (make test with -race)
  • Linter passes (make lint)
  • I did not skip/disable tests to make CI green
  • I added/updated tests for new functionality
  • I updated docs if user-facing behavior changed
  • Changes follow existing patterns in the codebase
  • Commits are cryptographically signed (git commit -S) — GPG signing info

@mchmarny mchmarny self-requested a review March 13, 2026 22:27
Copy link
Copy Markdown
Member

@mchmarny mchmarny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good PR — retry logic, signal handling, CUJ2 doc cleanup, and cert-manager migration are all solid improvements. The cosign version detection simplification is a nice cleanup too.

Two issues need fixing before merge:

Must fix:

  1. retry() off-by-oneMAX_RETRIES=5 only gives 4 retries (5 total runs), not 5 retries (6 total) as documented. The error message also overcounts by 1. Fix: -ge-gt and $((attempt + 1))${attempt}.
  2. apply_ignoring_crd_race lost fail-safe — The old inline code explicitly failed on unrecognized error formats (elif [[ -z "${err_lines}" ]]). The new function silently returns success when errors match neither CRD-race nor real-error patterns. This is a safety regression for unexpected kubectl failure modes.

Nice to have:
3. Validate --retries is numeric (avoid arithmetic errors on bad input)
4. Test assertion for cert-manager values checks "enabled" which is too generic

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 13, 2026

Coverage Report ✅

Metric Value
Coverage 73.4%
Threshold 70%
Status Pass
Coverage Badge
![Coverage](https://img.shields.io/badge/coverage-73.4%25-green)

No Go source files changed in this PR.

- Add retry logic to the deploy.sh
- Cleaned a regex's in deploy.sh
- Add INT/TERM signal trap to deploy.sh template so Ctrl+C exits cleanly
- Document --retries 0 (fail-fast) in deploy.sh usage comment
- Remove non-existent hf-token-secret.yaml step from CUJ2 doc
- Add demos/README.md with demo index and session recording guide
- Add example CUJ2 test report in demos/examples/
@lockwobr lockwobr force-pushed the fix/cuj2-timeout-issue branch from f1ffe2f to 989de3f Compare March 13, 2026 22:35
@lockwobr lockwobr requested a review from mchmarny March 13, 2026 22:36
@mchmarny mchmarny merged commit 3ad60d3 into main Mar 13, 2026
48 checks passed
@mchmarny mchmarny deleted the fix/cuj2-timeout-issue branch March 13, 2026 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: deploy.sh is flakey with context deadline exceeded on CRD application

2 participants