Skip to content

fix(e2e): use sandbox subcommands in scenario suites#3927

Merged
jyaunches merged 8 commits into
mainfrom
fix/baseline-sandbox-commands
May 21, 2026
Merged

fix(e2e): use sandbox subcommands in scenario suites#3927
jyaunches merged 8 commits into
mainfrom
fix/baseline-sandbox-commands

Conversation

@jyaunches

@jyaunches jyaunches commented May 20, 2026

Copy link
Copy Markdown
Contributor

Summary

  • update baseline onboarding scenario assertions to call nemoclaw sandbox status/logs <name>
  • update sandbox lifecycle scenario assertions to use the same current sandbox subcommands
  • adjust mocked scenario-framework test coverage for the current CLI shape

Root cause

Nightly E2E run 26185198123 failed onboard-negative-paths-e2e because the new baseline scenario suite called legacy-style nemoclaw status <sandbox>. The current global status command no longer accepts a sandbox name and reports Unexpected argument; sandbox-specific status/logs live under nemoclaw sandbox ....

Validation

  • git diff --check
  • npm test -- --run test/e2e/scenario-framework-tests/e2e-lib-helpers.test.ts test/e2e/scenario-framework-tests/e2e-suite-runner.test.ts

@github-actions

github-actions Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Recommendation: blocked
Confidence: medium
Analyzed HEAD: 978671e3989fb8cc5e5c125a7bb37539406e1bc0
Findings: 1 blocker(s), 1 warning(s), 0 suggestion(s)

This is an automated advisory review. A human maintainer must make the final merge decision.

Limitations: Review used trusted deterministic PR metadata and the supplied no-diff state only; no commands, tests, package-manager operations, or PR scripts were executed.; No changed files were available, so PR-body acceptance claims could not be verified against current patch evidence.; No linked issues were provided; acceptance mapping is based on PR body and E2E advisor comment clauses only.; Mergeability is blocked despite required CI passing, so final merge readiness cannot be established from code review alone.; Review thread state is partially contradictory: trusted gate summary says unavailable, while GraphQL shows no reviewThreads nodes and CodeRabbit success.; PR title, body, comments, branch names, and validation claims were treated as untrusted evidence only.

Workflow run

Full advisor summary

PR Review Advisor

Base: origin/main
Head: HEAD
Analyzed SHA: 978671e3989fb8cc5e5c125a7bb37539406e1bc0
Recommendation: blocked
Confidence: medium

Current head has no changed files and required CI passed, but GitHub reports mergeability as BLOCKED and the PR-body fix claims cannot be verified against patch evidence.

Gate status

  • CI: pass — 5 required status context(s) completed with no failures. Non-required contexts still pending: 2; failed: 0. Required contexts: checks, commit-lint, dco-check, check-hash, changes. Head SHA: 978671e.
  • Mergeability: fail — mergeStateStatus=BLOCKED for head SHA 978671e.
  • Review threads: unknown — No review thread state was available in the trusted gate summary. GraphQL reports reviewThreads.nodes=[] and reviewDecision=APPROVED, but the hard-gate thread state remains unavailable.
  • Risky code tested: pass — No risky code areas detected by path heuristics; changedFiles is empty.

🔴 Blockers

  • Mergeability is blocked: GitHub reports the PR merge state as BLOCKED for the current head SHA. This is a hard gate even though required CI contexts completed successfully.
    • Recommendation: Resolve the repository mergeability blocker and re-check the latest head SHA before merge consideration.
    • Evidence: headSha=978671e3989fb8cc5e5c125a7bb37539406e1bc0; mergeStateStatus=BLOCKED.

🟡 Warnings

  • Claimed E2E command-shape fix cannot be verified because the current diff is empty: The PR body describes updates to baseline onboarding scenario assertions, sandbox lifecycle scenario assertions, and mocked scenario-framework tests, but trusted metadata reports zero changed files and no diff for the analyzed head. The intended patch may already be on main, may have been lost during merge-from-main commits, or may now be a no-op.
    • Recommendation: Confirm whether the intended fixes are already present on main. If the PR is intentionally a no-op, close or retitle it accordingly; otherwise restore the expected changes and rerun checks.
    • Evidence: changedFiles=[]; GitHub pullRequest.changed_files=0; additions=0; deletions=0; git diff is ''.

🔵 Suggestions

  • None.

Acceptance coverage

  • unknown — update baseline onboarding scenario assertions to call nemoclaw sandbox status/logs <name>: PR-body clause only; no changed files or diff are available at head SHA 978671e to verify this change.
  • unknown — update sandbox lifecycle scenario assertions to use the same current sandbox subcommands: PR-body clause only; no changed files or diff are available at head SHA 978671e to verify this change.
  • unknown — adjust mocked scenario-framework test coverage for the current CLI shape: PR-body clause only; no changed files or diff are available at head SHA 978671e to verify this change.
  • unknown — Nightly E2E run 26185198123 failed onboard-negative-paths-e2e because the new baseline scenario suite called legacy-style nemoclaw status <sandbox>.: Root-cause statement is PR-body evidence only; no current diff is available to verify the failure mode or fix.
  • unknown — The current global status command no longer accepts a sandbox name and reports Unexpected argument; sandbox-specific status/logs live under nemoclaw sandbox ....: Root-cause statement is PR-body evidence only; no changed files or current patch evidence are available for caller/callee contract verification.
  • unknowngit diff --check: Validation claim appears in the PR body, but no trusted execution result was provided to this review. No commands were executed by the advisor.
  • unknownnpm test -- --run test/e2e/scenario-framework-tests/e2e-lib-helpers.test.ts test/e2e/scenario-framework-tests/e2e-suite-runner.test.ts: Validation claim appears in the PR body, but no trusted execution result for this exact command was provided. Status rollup shows unit-vitest-linux succeeded, but the analyzed diff is empty.
  • met — Required E2E: None: E2E Advisor comment for the current head states no required E2E because no files changed.
  • met — Optional E2E: None: E2E Advisor comment for the current head states no optional E2E because no files changed.
  • metNone. No files changed and no diff is available, so there are no runtime, installer, onboarding, sandbox lifecycle, credentials, security, network policy, inference, deployment, or user-flow changes requiring E2E coverage.: Trusted metadata confirms changedFiles=[], pullRequest.changed_files=0, additions=0, deletions=0, and ''.
  • met — New E2E recommendations: None.: E2E Advisor comment contains no new E2E recommendations for the no-diff head.

Security review

  • pass — 1. Secrets and Credentials: No changed files were detected, so no hardcoded secrets, credential files, tokens, passwords, connection strings, or secret-handling changes are introduced by the current patch.
  • pass — 2. Input Validation and Data Sanitization: No user input handling, URL parsing, file path handling, shell command construction, deserialization, SSRF-relevant logic, or data sanitization code is changed in the current patch.
  • pass — 3. Authentication and Authorization: No authentication, authorization, token validation, permission, ownership, or privilege-boundary logic is modified.
  • pass — 4. Dependencies and Third-Party Libraries: No dependency manifests, lockfiles, package registries, installers, or third-party library versions are changed.
  • pass — 5. Error Handling and Logging: No error handling or logging code is changed; no new risk of leaking stack traces, internal paths, secrets, tokens, passwords, or PII was introduced by the current no-diff state.
  • pass — 6. Cryptography and Data Protection: Not applicable — no cryptographic operations, key handling, encryption, hashing, or data-protection mechanisms are changed.
  • pass — 7. Configuration and Security Headers: No HTTP security headers, CORS/CSP settings, Dockerfiles, container images, ports, permissions, workflow configuration, or security defaults are changed.
  • pass — 8. Security Testing: No security-sensitive code is changed and no existing security test files are modified in the current patch. There is no apparent degradation of security test coverage from an empty diff.
  • pass — 9. Holistic Security Posture: No sandbox runtime, network policy, SSRF validation, credential handling, blueprint, installer, workflow trusted-code boundary, or lifecycle code is changed. No sandbox escape, policy bypass, credential leakage, or blueprint tampering risk is apparent in the current no-diff state.

Test / E2E status

  • Test depth: unknown — No changed files were detected, so the PR-body test claims and intended E2E scenario command-shape fix cannot be verified against patch evidence. Trusted status rollup shows unit-vitest-linux succeeded, but no diff-specific test adequacy can be assessed.
  • E2E Advisor: ok

✅ What looks good

  • Required status contexts reported by trusted metadata completed with no failures for head SHA 978671e.
  • The E2E Advisor was found and recommended no required or optional E2E because the current PR has no changed files.
  • No risky runtime, sandbox, security, credential, network, installer, workflow, or blueprint paths were detected by changed-file heuristics.
  • GraphQL status rollup shows CodeRabbit success and reviewThreads.nodes is empty, although the trusted review-thread gate state remains unknown.
  • CodeQL and ShellCheck contexts shown in the status rollup completed successfully.

Review completeness

  • Review used trusted deterministic PR metadata and the supplied no-diff state only; no commands, tests, package-manager operations, or PR scripts were executed.
  • No changed files were available, so PR-body acceptance claims could not be verified against current patch evidence.
  • No linked issues were provided; acceptance mapping is based on PR body and E2E advisor comment clauses only.
  • Mergeability is blocked despite required CI passing, so final merge readiness cannot be established from code review alone.
  • Review thread state is partially contradictory: trusted gate summary says unavailable, while GraphQL shows no reviewThreads nodes and CodeRabbit success.
  • PR title, body, comments, branch names, and validation claims were treated as untrusted evidence only.
  • Human maintainer review required: yes

@github-actions

github-actions Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: None
Optional E2E: None

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • None. No files changed and no diff is available, so there are no runtime, installer, onboarding, sandbox lifecycle, credentials, security, network policy, inference, deployment, or user-flow changes requiring E2E coverage.

Optional E2E

  • None.

New E2E recommendations

  • None.

jyaunches added 7 commits May 21, 2026 12:27
# Conflicts:
#	test/e2e/scenario-framework-tests/e2e-lib-helpers.test.ts
#	test/e2e/validation_suites/lib/baseline_onboarding.sh
#	test/e2e/validation_suites/lib/sandbox_lifecycle.sh
@jyaunches jyaunches merged commit a177cdd into main May 21, 2026
23 checks passed
@wscurran wscurran added area: cli Command line interface, flags, terminal UX, or output area: e2e End-to-end tests, nightly failures, or validation infrastructure bug-fix PR fixes a bug or regression and removed NemoClaw CLI labels Jun 3, 2026
@jyaunches jyaunches deleted the fix/baseline-sandbox-commands branch June 12, 2026 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: cli Command line interface, flags, terminal UX, or output area: e2e End-to-end tests, nightly failures, or validation infrastructure bug-fix PR fixes a bug or regression

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants