fix(e2e): stabilize runtime overrides and Docker auth#4928
Conversation
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
📝 WalkthroughWalkthroughThis PR addresses two independent CI/E2E reliability issues: Docker Hub authentication failures in workflows are now retried up to three times with timeouts and backoff, and a race condition in E2E test config capture is fixed by routing JSON output through a dedicated file descriptor to bypass tee buffering. ChangesDocker Hub Authentication Retry Logic
E2E Config Capture Reliability
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
E2E Advisor RecommendationRequired E2E: Dispatch hint: Auto-dispatched E2E: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
E2E Scenario Advisor RecommendationRequired scenario E2E: None Full scenario advisor summaryE2E Scenario AdvisorBase: Required scenario E2E
Optional scenario E2E
Relevant changed files
|
Selective E2E Results — ✅ All requested jobs passedRun: 27111643288
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/e2e-script-workflow.test.ts (1)
120-123: ⚡ Quick winStrengthen contract coverage for retry warnings and backoff.
Add assertions for the intermediate warning and exponential backoff expression so the tests lock the full retry contract, not only loop/timeout/final fallback.
♻️ Suggested assertion additions
expect(authStep?.run).toContain("for attempt in 1 2 3"); expect(authStep?.run).toContain("timeout 30s docker login"); + expect(authStep?.run).toContain("Docker Hub login attempt ${attempt} failed; retrying."); + expect(authStep?.run).toContain("sleep $((attempt * 5))"); expect(authStep?.run).toContain("Docker Hub login failed after 3 attempts");expect(authStep?.run, name).toContain("for attempt in 1 2 3"); expect(authStep?.run, name).toContain("timeout 30s docker login"); + expect(authStep?.run, name).toContain("Docker Hub login attempt ${attempt} failed; retrying."); + expect(authStep?.run, name).toContain("sleep $((attempt * 5))"); expect(authStep?.run, name).toContain("Docker Hub login failed after 3 attempts");As per coding guidelines, the workflow contract requires warnings on attempts 1-2 and exponential backoff (
sleep $((attempt * 5))), so tests should assert those strings too.Also applies to: 169-172
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e-script-workflow.test.ts` around lines 120 - 123, The test must assert the intermediate retry warnings and backoff expression in addition to the loop/timeout/fallback checks: add expects against authStep?.run (and the other analogous block at the 169-172 range) that .toContain("Docker Hub login failed on attempt 1"), .toContain("Docker Hub login failed on attempt 2") and .toContain("sleep $((attempt * 5))") so the test locks the warnings for attempts 1–2 and the exponential backoff expression.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@test/e2e-script-workflow.test.ts`:
- Around line 120-123: The test must assert the intermediate retry warnings and
backoff expression in addition to the loop/timeout/fallback checks: add expects
against authStep?.run (and the other analogous block at the 169-172 range) that
.toContain("Docker Hub login failed on attempt 1"), .toContain("Docker Hub login
failed on attempt 2") and .toContain("sleep $((attempt * 5))") so the test locks
the warnings for attempts 1–2 and the exponential backoff expression.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: e48cb5cc-4529-493f-b99e-d07fe4331bad
📒 Files selected for processing (4)
.github/workflows/e2e-script.yaml.github/workflows/nightly-e2e.yamltest/e2e-script-workflow.test.tstest/e2e/test-runtime-overrides.sh
PR Review AdvisorFindings: 0 needs attention, 3 worth checking, 0 nice ideas Review findings🛠️ Needs attention
🔎 Worth checking
🌱 Nice ideas
Consider writing more tests for
This is an automated advisory review. A human maintainer must make the final merge decision. |
Selective E2E Results — ✅ All requested jobs passedRun: 27111736171
|
Summary
Builds on the fd 3 output bypass from #4924 to stabilize runtime override E2E captures, and extends the same protection to config hash probes that can hit the same short-lived container stdout race. Docker Hub auth now retries transient registry timeouts and falls back to anonymous pulls so unrelated nightly jobs can still execute.
Related Issue
Fixes #4926
Changes
Type of Change
Verification
npx prek run --all-filespassesnpm testpassesnpm run docsbuilds without warnings (doc changes only)Signed-off-by: Carlos Villela cvillela@nvidia.com
Summary by CodeRabbit
Chores
Tests