Skip to content

fix(e2e): stabilize runtime overrides and Docker auth#4928

Merged
cv merged 2 commits into
mainfrom
codex/e2e-flake-hardening
Jun 8, 2026
Merged

fix(e2e): stabilize runtime overrides and Docker auth#4928
cv merged 2 commits into
mainfrom
codex/e2e-flake-hardening

Conversation

@cv

@cv cv commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Summary

Builds on the fd 3 output bypass from #4924 to stabilize runtime override E2E captures, and extends the same protection to config hash probes that can hit the same short-lived container stdout race. Docker Hub auth now retries transient registry timeouts and falls back to anonymous pulls so unrelated nightly jobs can still execute.

Related Issue

Fixes #4926

Changes

  • Write runtime override config and hash-check probe output directly to fd 3 to avoid entrypoint tee teardown races.
  • Validate and retry captured runtime override configs before extracting required fields.
  • Retry Docker Hub login in the reusable and direct nightly E2E workflows, then warn and continue anonymously after repeated timeouts.
  • Extend workflow contract coverage for Docker Hub retry and fallback behavior.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Carlos Villela cvillela@nvidia.com

Summary by CodeRabbit

  • Chores

    • Strengthened CI/CD pipeline reliability by implementing automatic retry logic for Docker Hub authentication with timeout protection and graceful fallback to anonymous Docker pulls when authentication fails.
  • Tests

    • Expanded test coverage to validate the enhanced Docker authentication retry mechanism and fallback behavior.

Signed-off-by: Carlos Villela <cvillela@nvidia.com>
@cv cv self-assigned this Jun 8, 2026
@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR addresses two independent CI/E2E reliability issues: Docker Hub authentication failures in workflows are now retried up to three times with timeouts and backoff, and a race condition in E2E test config capture is fixed by routing JSON output through a dedicated file descriptor to bypass tee buffering.

Changes

Docker Hub Authentication Retry Logic

Layer / File(s) Summary
Workflow retry implementation
.github/workflows/e2e-script.yaml, .github/workflows/nightly-e2e.yaml
Both workflows replace single docker login with 3-attempt retry loops using 30s timeout per attempt, logging warnings on intermediate failures and continuing with anonymous pulls on total failure.
Workflow test assertions
test/e2e-script-workflow.test.ts
Test assertions extended to verify that both the reusable runner and direct nightly E2E jobs' authentication steps contain retry-loop syntax, timeout-wrapped docker login, and multi-attempt failure messages.

E2E Config Capture Reliability

Layer / File(s) Summary
Config capture helper functions
test/e2e/test-runtime-overrides.sh
Introduces run_override, run_config_hash_check, valid_config, capture_config (with structured validation and up to 3 retries), and jq_config to extract config fields reliably by routing output to dedicated fd 3.
Override test refactoring
test/e2e/test-runtime-overrides.sh
Baseline setup and all override test scenarios (model, context window, max tokens, reasoning, CORS origins, combined, and rejected override) refactored to use the new reliable capture/validation pipeline and jq_config extraction helpers.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#4697: Added initial Docker Hub guarded login/auth flow and secrets to workflows; this PR updates the same authentication step's run script with retry/timeout logic and matching test assertions.

Suggested labels

E2E, nightly-e2e, Docker, CI/CD, area: ci, area: e2e, fix

Suggested reviewers

  • prekshivyas
  • cjagwani

Poem

🐰 A race condition fled and retries now bloom,
When Docker's gates stumble, we try once more soon,
File descriptors flutter through fd 3 so keen,
The flakiest tests find a fix so serene!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title directly addresses both main changes: stabilizing Docker Hub authentication (retry logic) and fixing flaky runtime overrides test (capture reliability).
Linked Issues check ✅ Passed Changes fully implement the proposed fix [#4926]: writing baseline config directly to fd 3, validating with retries, and adding Docker auth retry logic with fallback to anonymous pulls.
Out of Scope Changes check ✅ Passed All changes align with stated objectives: fixing container capture races, adding config validation/retries, implementing Docker auth retries, and extending test coverage.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/e2e-flake-hardening

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: runtime-overrides-e2e
Optional E2E: cloud-inference-e2e, docs-validation-e2e

Dispatch hint: runtime-overrides-e2e

Auto-dispatched E2E: runtime-overrides-e2e via nightly-e2e.yaml at 34d695a8b33bedf98839d40602ebfe8aa4a1955anightly run

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • runtime-overrides-e2e (medium; timeout 45 minutes): This PR changes test/e2e/test-runtime-overrides.sh itself. Run the owning E2E job to verify the new fd-3 capture path, retry/validation behavior, config hash check helper, and all runtime override scenarios still pass in the real Docker image/entrypoint environment.

Optional E2E

  • cloud-inference-e2e (medium; timeout 30 minutes): Useful representative reusable-workflow smoke for .github/workflows/e2e-script.yaml because it runs through the shared E2E script runner used by most nightly jobs. It provides confidence that checkout, env export, secret gating, install, and inference still work with the workflow changes.
  • docs-validation-e2e (low; timeout 15 minutes): Useful low-cost direct nightly job using the shared direct-job Docker Hub auth anchor changed in nightly-e2e.yaml. It helps validate the direct-job path distinct from reusable-workflow jobs.

New E2E recommendations

  • E2E CI workflow runner / Docker Hub authentication (medium): Selective target_ref dispatches usually execute the trusted main workflow definition, so PR changes to workflow YAML retry logic may not be exercised by existing E2E jobs. Add a hermetic workflow-level integration test or dry-run job that mocks docker login failures/successes and verifies retry, timeout, anonymous fallback, and secret-withholding behavior for both reusable and direct-job auth steps.
    • Suggested test: Add an E2E workflow-auth smoke/dry-run job for Docker Hub auth retry and target_ref secret gating.

Dispatch hint

  • Workflow: nightly-e2e.yaml
  • jobs input: runtime-overrides-e2e

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

  • None. Changed files affect legacy/non-scenario E2E workflows and tests only; no files under test/e2e-scenario/ or .github/workflows/e2e-scenarios*.yaml were changed, and the diff does not affect scenario E2E runner/runtime/catalog behavior.

Optional scenario E2E

  • None.

Relevant changed files

  • None.

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 27111643288
Target ref: 761eb274f125947ba4f0bd01f8b6c81b60a2cb7d
Workflow ref: main
Requested jobs: runtime-overrides-e2e,cloud-inference-e2e
Summary: 0 passed, 0 failed, 0 skipped

Job Result
cloud-inference-e2e ⚠️ cancelled
runtime-overrides-e2e ⚠️ cancelled

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/e2e-script-workflow.test.ts (1)

120-123: ⚡ Quick win

Strengthen contract coverage for retry warnings and backoff.

Add assertions for the intermediate warning and exponential backoff expression so the tests lock the full retry contract, not only loop/timeout/final fallback.

♻️ Suggested assertion additions
     expect(authStep?.run).toContain("for attempt in 1 2 3");
     expect(authStep?.run).toContain("timeout 30s docker login");
+    expect(authStep?.run).toContain("Docker Hub login attempt ${attempt} failed; retrying.");
+    expect(authStep?.run).toContain("sleep $((attempt * 5))");
     expect(authStep?.run).toContain("Docker Hub login failed after 3 attempts");
       expect(authStep?.run, name).toContain("for attempt in 1 2 3");
       expect(authStep?.run, name).toContain("timeout 30s docker login");
+      expect(authStep?.run, name).toContain("Docker Hub login attempt ${attempt} failed; retrying.");
+      expect(authStep?.run, name).toContain("sleep $((attempt * 5))");
       expect(authStep?.run, name).toContain("Docker Hub login failed after 3 attempts");

As per coding guidelines, the workflow contract requires warnings on attempts 1-2 and exponential backoff (sleep $((attempt * 5))), so tests should assert those strings too.

Also applies to: 169-172

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-script-workflow.test.ts` around lines 120 - 123, The test must
assert the intermediate retry warnings and backoff expression in addition to the
loop/timeout/fallback checks: add expects against authStep?.run (and the other
analogous block at the 169-172 range) that .toContain("Docker Hub login failed
on attempt 1"), .toContain("Docker Hub login failed on attempt 2") and
.toContain("sleep $((attempt * 5))") so the test locks the warnings for attempts
1–2 and the exponential backoff expression.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/e2e-script-workflow.test.ts`:
- Around line 120-123: The test must assert the intermediate retry warnings and
backoff expression in addition to the loop/timeout/fallback checks: add expects
against authStep?.run (and the other analogous block at the 169-172 range) that
.toContain("Docker Hub login failed on attempt 1"), .toContain("Docker Hub login
failed on attempt 2") and .toContain("sleep $((attempt * 5))") so the test locks
the warnings for attempts 1–2 and the exponential backoff expression.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e48cb5cc-4529-493f-b99e-d07fe4331bad

📥 Commits

Reviewing files that changed from the base of the PR and between 5aa2c95 and 761eb27.

📒 Files selected for processing (4)
  • .github/workflows/e2e-script.yaml
  • .github/workflows/nightly-e2e.yaml
  • test/e2e-script-workflow.test.ts
  • test/e2e/test-runtime-overrides.sh

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 3 worth checking, 0 nice ideas
Top item: Clarify fd 3 workaround ownership/removal and overlap with #4924

Review findings

🛠️ Needs attention

  • None.

🔎 Worth checking

  • Source-of-truth review needed: test/e2e/test-runtime-overrides.sh fd 3 output bypass: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: test/e2e/test-runtime-overrides.sh adds `>&3` in run_override and run_config_hash_check; scripts/nemoclaw-start.sh establishes fd 3 before redirecting stdout through tee.
  • Clarify fd 3 workaround ownership/removal and overlap with fix(e2e): write config capture to fd 3, bypassing tee race #4924 (test/e2e/test-runtime-overrides.sh:54): The fd 3 capture bypass is a localized workaround for entrypoint tee teardown races and this PR overlaps active PR fix(e2e): write config capture to fd 3, bypassing tee race #4924 on the same file and linked issue. The code identifies the invalid state and uses the right fd 3 boundary, but it does not document when the workaround can be removed, and the comment references the competing PR rather than the linked issue/root source. This creates source-of-truth drift risk if both PRs evolve the same workaround differently.
  • Docker auth retry path lacks behavioral negative coverage (.github/workflows/e2e-script.yaml:184): The workflow trusted-code boundary remains intact and no credential leak was found, but the new Docker Hub login recovery path is covered only by workflow string assertions. For a secret-handling path, behavior-level coverage would better prove failed logins retry exactly as intended, continue without exposing credentials, and preserve the target_ref secret-withholding boundary.
    • Recommendation: Add or identify a small behavioral test using a fake `docker` command/script fixture for the auth shell block that verifies three failed login attempts, generic warnings only, continuation after failure, and no execution of auth when workflow_dispatch supplies an explicit target_ref.
    • Evidence: The retry/fallback logic loops `for attempt in 1 2 3`, runs `timeout 30s docker login`, and warns after failure in .github/workflows/e2e-script.yaml and the nightly anchor. test/e2e-script-workflow.test.ts only checks that these strings are present.

🌱 Nice ideas

  • None.
Consider writing more tests for
  • **Runtime validation** — Docker Hub auth retries exactly three failed docker login attempts and logs only generic warnings before continuing.. The changed workflow and sandbox/E2E capture paths depend on runtime behavior: GitHub Actions secret guards, Docker CLI retry/fallback, and entrypoint fd inheritance. Existing tests are useful contract/string checks plus the E2E script itself, but behavior-level validation would improve confidence.
  • **Runtime validation** — Docker Hub auth skips credential exposure on workflow_dispatch with an explicit target_ref after retry logic is added.. The changed workflow and sandbox/E2E capture paths depend on runtime behavior: GitHub Actions secret guards, Docker CLI retry/fallback, and entrypoint fd inheritance. Existing tests are useful contract/string checks plus the E2E script itself, but behavior-level validation would improve confidence.
  • **Runtime validation** — runtime-overrides capture rejects empty or truncated JSON instead of deriving empty baseline fields.. The changed workflow and sandbox/E2E capture paths depend on runtime behavior: GitHub Actions secret guards, Docker CLI retry/fallback, and entrypoint fd inheritance. Existing tests are useful contract/string checks plus the E2E script itself, but behavior-level validation would improve confidence.
  • **Runtime validation** — runtime-overrides hash probe uses the same fd 3 capture path as config capture.. The changed workflow and sandbox/E2E capture paths depend on runtime behavior: GitHub Actions secret guards, Docker CLI retry/fallback, and entrypoint fd inheritance. Existing tests are useful contract/string checks plus the E2E script itself, but behavior-level validation would improve confidence.
  • **test/e2e/test-runtime-overrides.sh fd 3 output bypass** — The E2E harness now writes config and hash outputs to fd 3 and validates config shape before extraction, but there is no isolated test proving fd 3 inheritance or the hash-probe path.. test/e2e/test-runtime-overrides.sh adds `>&3` in run_override and run_config_hash_check; scripts/nemoclaw-start.sh establishes fd 3 before redirecting stdout through tee.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 27111736171
Target ref: 34d695a8b33bedf98839d40602ebfe8aa4a1955a
Workflow ref: main
Requested jobs: runtime-overrides-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job Result
runtime-overrides-e2e ✅ success

@cv cv added the v0.0.61 Release target label Jun 8, 2026
@cv cv merged commit 40d68d7 into main Jun 8, 2026
33 checks passed
@cv cv deleted the codex/e2e-flake-hardening branch June 8, 2026 02:41
@wscurran wscurran added the bug-fix PR fixes a bug or regression label Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug-fix PR fixes a bug or regression v0.0.61 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

nightly-e2e: runtime-overrides-e2e baseline capture race — tee flush vs container teardown

2 participants