Skip to content

test(openshell): keep gateway drift mocks isolated#4860

Merged
cv merged 2 commits into
mainfrom
4858-gateway-drift-tests-leak-host-docker-state-when-injected-image-ref-is-null
Jun 5, 2026
Merged

test(openshell): keep gateway drift mocks isolated#4860
cv merged 2 commits into
mainfrom
4858-gateway-drift-tests-leak-host-docker-state-when-injected-image-ref-is-null

Conversation

@amata-human

@amata-human amata-human commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes gateway drift detection so injected test dependencies remain authoritative when they return null. This prevents host-process drift tests from accidentally probing local Docker state when modeling a missing cluster gateway container.

Related Issue

Fixes #4858

Changes

  • Preserve an injected null cluster image ref instead of falling back to the real Docker probe.
  • Apply the same dependency-resolution behavior in both cluster-image drift and host-process drift detection.
  • Keep the fix scoped to the drift detector without reworking the existing test suite.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Angel Mata amata@nvidia.com

Summary by CodeRabbit

  • Refactor
    • Improved the reliability of system drift detection mechanisms through more robust internal dependency function validation.

Signed-off-by: Angel Mata <amata@nvidia.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 5, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Too much diff to scan? Review this PR in Change Stack to start with the highest-impact changes.

Review Change Stack

📝 Walkthrough

Walkthrough

This PR replaces optional chaining with explicit function-type checks in two drift detection functions to preserve intentional null returns from injected test dependencies, preventing unintended fallback to real host Docker state during testing.

Changes

Gateway drift dependency injection

Layer / File(s) Summary
Explicit function-type checks for gateway cluster image reference
src/lib/adapters/openshell/gateway-drift.ts
getGatewayClusterImageDrift and getGatewayHostProcessDrift now explicitly type-check deps.getGatewayClusterImageRef using typeof === "function" instead of optional chaining, preserving injected null returns as authoritative "no cluster image" signals rather than falling through to host Docker probes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#4602: Both PRs touch src/lib/adapters/openshell/gateway-drift.ts in the host-process drift path—main PR adds an explicit function-type guard around deps.getGatewayClusterImageRef that affects how getGatewayHostProcessDrift (and cluster drift probing) behaves, aligning with the host-process drift/preflight wiring introduced in PR #4602.

Suggested labels

OpenShell, Docker, bug-fix

Suggested reviewers

  • cv
  • prekshivyas

Poem

🐰 A rabbit hops through drift detection code,
Where null means null—no fallback road,
Injected truths now stand their ground,
No docker probes where tests are found! 🎯

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title references isolated mocks, which directly relates to the core change of preventing injected null test dependencies from falling through to real Docker probes.
Linked Issues check ✅ Passed The code changes replace optional chaining with explicit function checks in both drift detection functions, exactly matching the suggested fix in issue #4858, preserving injected null values and preventing fallback to real Docker probes.
Out of Scope Changes check ✅ Passed All changes are scoped to gateway-drift.ts drift detection logic and directly address the dependency-resolution behavior outlined in the linked issue; no unrelated modifications detected.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 4858-gateway-drift-tests-leak-host-docker-state-when-injected-image-ref-is-null

Comment @coderabbitai help to get the list of available commands and usage tips.

@amata-human amata-human added the v0.0.60 Release target label Jun 5, 2026
@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 2 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 1 still applies, 0 new items found

Review findings

🛠️ Needs attention

  • None.

🔎 Worth checking

  • Source-of-truth review needed: Injected `getGatewayClusterImageRef` fallback in gateway drift detection: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: The two changed sites now branch on `typeof deps.getGatewayClusterImageRef === "function"`; no test file changed.
  • Add direct regression coverage for injected null image refs (src/lib/adapters/openshell/gateway-drift.ts:247): The implementation now preserves an injected `null` from `deps.getGatewayClusterImageRef`, which fixes the source bug. However, this is gateway/sandbox preflight code and the PR changes no test file, so there is still no direct assertion that injected null prevents the real Docker inspection path from being called. Existing tests cover the resulting drift shapes, but not the mock-boundary contract that caused host Docker state to leak into tests.
    • Recommendation: Add focused tests that spy or mock Docker inspection and verify it is not called when `getGatewayClusterImageRef` is provided and returns `null`, especially for `getGatewayClusterImageDrift`, `getGatewayHostProcessDrift`, and the protobuf mismatch drift attachment path.
    • Evidence: The diff replaces `deps.getGatewayClusterImageRef?.(gatewayName) ?? getGatewayClusterImageRef(...)` with an explicit function-present branch at the changed cluster-image and host-process drift sites, but `src/lib/adapters/openshell/gateway-drift.test.ts` was not changed.

🌱 Nice ideas

  • None.
Consider writing more tests for
  • **Runtime validation** — getGatewayClusterImageDrift respects an injected null image ref without calling dockerContainerInspectFormat.. The source fix is narrow and correct, but gateway drift preflight participates in sandbox/runtime safety. Existing unit tests cover named output paths, while direct boundary coverage would guard against reintroducing host Docker state leakage.
  • **Runtime validation** — getGatewayHostProcessDrift treats an injected null cluster image as no cluster container and does not query real Docker before using the injected host-process runtime.. The source fix is narrow and correct, but gateway drift preflight participates in sandbox/runtime safety. Existing unit tests cover named output paths, while direct boundary coverage would guard against reintroducing host Docker state leakage.
  • **Runtime validation** — detectOpenShellStateRpcResultIssue attaches host-process drift with injected null even if Docker inspection would otherwise report a stale cluster image.. The source fix is narrow and correct, but gateway drift preflight participates in sandbox/runtime safety. Existing unit tests cover named output paths, while direct boundary coverage would guard against reintroducing host Docker state leakage.
  • **Add direct regression coverage for injected null image refs** — Add focused tests that spy or mock Docker inspection and verify it is not called when `getGatewayClusterImageRef` is provided and returns `null`, especially for `getGatewayClusterImageDrift`, `getGatewayHostProcessDrift`, and the protobuf mismatch drift attachment path.
  • **Injected `getGatewayClusterImageRef` fallback in gateway drift detection** — Existing tests cover affected result paths, but a direct mock-boundary test proving Docker inspection is not called for injected `null` is still missing.. The two changed sites now branch on `typeof deps.getGatewayClusterImageRef === "function"`; no test file changed.
Since last review details

Current findings:

  • Source-of-truth review needed: Injected `getGatewayClusterImageRef` fallback in gateway drift detection: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: The two changed sites now branch on `typeof deps.getGatewayClusterImageRef === "function"`; no test file changed.
  • Add direct regression coverage for injected null image refs (src/lib/adapters/openshell/gateway-drift.ts:247): The implementation now preserves an injected `null` from `deps.getGatewayClusterImageRef`, which fixes the source bug. However, this is gateway/sandbox preflight code and the PR changes no test file, so there is still no direct assertion that injected null prevents the real Docker inspection path from being called. Existing tests cover the resulting drift shapes, but not the mock-boundary contract that caused host Docker state to leak into tests.
    • Recommendation: Add focused tests that spy or mock Docker inspection and verify it is not called when `getGatewayClusterImageRef` is provided and returns `null`, especially for `getGatewayClusterImageDrift`, `getGatewayHostProcessDrift`, and the protobuf mismatch drift attachment path.
    • Evidence: The diff replaces `deps.getGatewayClusterImageRef?.(gatewayName) ?? getGatewayClusterImageRef(...)` with an explicit function-present branch at the changed cluster-image and host-process drift sites, but `src/lib/adapters/openshell/gateway-drift.test.ts` was not changed.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: gateway-drift-preflight-e2e
Optional E2E: openshell-gateway-upgrade-e2e

Dispatch hint: gateway-drift-preflight-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • gateway-drift-preflight-e2e (low (~15 min)): Direct regression coverage for this module. It exercises stale gateway image drift, host-process gateway binary drift, protobuf mismatch fail-closed behavior, and verifies unsafe sandbox list/state RPCs are blocked before mutation-prone flows proceed.

Optional E2E

  • openshell-gateway-upgrade-e2e (medium-high (~60 min)): Adjacent high-confidence validation of real OpenShell gateway upgrade behavior and live sandbox survival across gateway replacement. Useful because this change is in gateway drift/runtime detection, but it is heavier than the focused preflight regression.

New E2E recommendations

  • Gateway drift dependency-injection robustness (medium): The diff specifically changes optional-call handling to require deps.getGatewayClusterImageRef to be a function before invocation. Existing gateway drift E2E covers production-like drift paths but does not appear to explicitly exercise a malformed/non-function dependency override.
    • Suggested test: Add a focused regression case to test/e2e/test-gateway-drift-preflight.sh or a nearby integration test that injects a non-function getGatewayClusterImageRef dependency and verifies drift preflight does not throw and still falls back to real inspection.

Dispatch hint

  • Workflow: .github/workflows/regression-e2e.yaml
  • jobs input: gateway-drift-preflight-e2e

@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

E2E Scenario Advisor Recommendation

Required scenario E2E: ubuntu-repo-cloud-openclaw
Optional scenario E2E: None

Dispatch required scenario E2E:

  • gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: medium

Required scenario E2E

  • ubuntu-repo-cloud-openclaw: Touches OpenShell gateway drift/preflight logic used by repo-current cloud OpenClaw onboarding and gateway/sandbox state checks; the baseline Ubuntu OpenClaw scenario is the smallest routed scenario covering the normal Docker gateway path.
    • Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Optional scenario E2E

  • None.

Relevant changed files

  • src/lib/adapters/openshell/gateway-drift.ts

@cv cv merged commit bb856a2 into main Jun 5, 2026
19 checks passed
@cv cv deleted the 4858-gateway-drift-tests-leak-host-docker-state-when-injected-image-ref-is-null branch June 5, 2026 20:34
@wscurran wscurran added the chore Build, CI, dependency, or tooling maintenance label Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore Build, CI, dependency, or tooling maintenance v0.0.60 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gateway drift tests leak host Docker state when injected image ref is null

3 participants