Skip to content

test: eliminate source-shape assertions#2829

Merged
cv merged 15 commits into
mainfrom
autoresearch/source-shape-zero-final/01-eliminate-source-shape-tests
May 1, 2026
Merged

test: eliminate source-shape assertions#2829
cv merged 15 commits into
mainfrom
autoresearch/source-shape-zero-final/01-eliminate-source-shape-tests

Conversation

@cv

@cv cv commented May 1, 2026

Copy link
Copy Markdown
Collaborator

Summary

Replace production-source-text assertions across the NemoClaw test suite with behavior coverage and add a budgeted source-shape detector to prevent regressions. The final detector budget is zero source-shape cases, assertions, and files.

Changes

  • Add scripts/find-source-shape-tests.ts, source-shape:scan, source-shape:check, and a pre-commit budget hook.
  • Ratchet ci/source-shape-test-budget.json to zero source-shape cases.
  • Replace source-shape checks in CLI, installer, credentials, sandbox startup, DNS/proxy, deploy, and provisioning tests with subprocess, temp-fixture, shell-snippet, emitted-file, stdout/stderr, and direct helper behavior assertions.
  • Remove redundant source scans where equivalent behavior coverage already exists.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • make docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

AI Disclosure

  • AI-assisted — tool: Pi coding agent

Signed-off-by: Carlos Villela cvillela@nvidia.com

Summary by CodeRabbit

  • New Features

    • Enforced source-shape test budget via a project pre-commit check, CI budget, and a new repo-scanning CLI.
  • Tests

    • Broad refactor converting many brittle source-text checks into runtime/execution-style tests for more reliable behavior validation, tighter credential/environment propagation checks, and improved installer/startup/script integration coverage.
  • Chores

    • Added npm scripts to emit metrics and run the source-shape budget check.

Add a source-shape detector and budget gate, then replace production-source-text assertions across the CLI/plugin test suite with behavior coverage. Behavior replacements use subprocess fakes, temp fixtures, shell snippet execution, emitted files, stdout/stderr/exit-code assertions, and direct helper behavior instead of asserting on implementation strings.

Experiments: kept runs #1-#88
Metric: source_shape_cases 317 → 0 (-100%); source_shape_assertions 819 → 0; source_shape_files 23 → 0
@cv cv self-assigned this May 1, 2026
@coderabbitai

coderabbitai Bot commented May 1, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Introduces a source-shape scanner CLI, budget JSON, npm scripts and a pre-commit hook to enforce a test-budget; concurrently refactors many brittle text-inspection tests into runtime execution-based tests, and adds/updates several unit tests tightening credential and deploy behavior assertions.

Changes

Cohort / File(s) Summary
Pre-commit & NPM scripts
\.pre-commit-config.yaml, package.json
Adds project pre-commit hook that runs npm run source-shape:check and two npm scripts that run the scanner in --metrics and --check modes.
Budget config
ci/source-shape-test-budget.json
New JSON budget file setting "maxSourceShapeCases": 0.
Source-shape scanner CLI
scripts/find-source-shape-tests.ts
New TypeScript CLI that parses test files (TS API), infers source-read variables and assertions, emits JSON/metrics, and enforces budget when run with --check.
Large-scale test refactors
test/...
test/cli.test.ts, test/credential-exposure.test.ts, test/credentials.test.ts, test/dns-proxy.test.ts, test/fetch-guard-patch-regression.test.ts, test/http-proxy-fix-sync.test.ts, test/install-preflight.test.ts, test/local-inference-setup.test.ts, test/nemoclaw-start.test.ts, test/nemotron-inference-fix.test.ts, test/onboard.test.ts, test/policies.test.ts, test/runner.test.ts, test/sandbox-init.test.ts, test/sandbox-provisioning.test.ts, test/seccomp-guard.test.ts, test/secret-redaction.test.ts, test/service-env.test.ts, test/shellquote-sandbox.test.ts, test/shields.test.ts, test/slack-token-rewriter-sync.test.ts, test/wsl2-probe-timeout.test.ts
Replaces many static source/text inspections with runtime execution tests (extracted shell blocks, generated wrappers, stubbed binaries, process harnesses, filesystem and IO assertions).
Test removals / simplified checks
test/exec-approvals-path-regression.test.ts, test/security-c2-dockerfile-injection.test.ts
Removes/shortens prior static-inspection tests; replaces with placeholders or simplified runtime validations.
Unit / integration test updates
nemoclaw/src/blueprint/runner.test.ts, src/lib/deploy.test.ts
Tightens credential-subprocess env assertions and adds deploy-path tests (brev create/rsync/SSH host-key, VM install, sandbox connect, error reporting).
Misc test helpers & harnesses
dist/..., src/..., various test helper files
Multiple tests now import/require dist modules, spawn helper processes, and use temporary PATH/fixtures instead of scanning source text.

Sequence Diagram

sequenceDiagram
    participant Dev as Developer
    participant Pre as Pre-commit Hook
    participant NPM as npm
    participant CLI as Scanner CLI
    participant AST as AST / Heuristics
    participant CFG as Budget Config
    participant Exit as Process Exit

    Dev->>Pre: git commit
    Pre->>NPM: run `npm run source-shape:check`
    NPM->>CLI: invoke scanner CLI
    CLI->>AST: parse repo, analyze tests
    AST->>AST: detect source-read vars & assertions
    CLI->>CFG: read `ci/source-shape-test-budget.json`
    CLI->>AST: compare detected cases to budget
    alt budget exceeded
        CLI->>Exit: set exit code = 1
        Exit->>Pre: fail hook
        Pre->>Dev: reject commit
    else budget OK
        CLI->>Exit: set exit code = 0
        Exit->>Pre: pass hook
        Pre->>Dev: commit proceeds
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

refactor

Poem

A rabbit nibbles at the test tree,
Replaces greps with sandboxed spree.
Hooks now count each source-shaped case,
Scans the code, then guards the place.
Wiggle whiskers — commit with glee. 🐇

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'test: eliminate source-shape assertions' is fully related to the main change. The PR substantially refactors test files to remove source-shape (source-text) based assertions throughout the test suite and replaces them with behavior-based testing.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch autoresearch/source-shape-zero-final/01-eliminate-source-shape-tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (7)
test/fetch-guard-patch-regression.test.ts (2)

87-92: ⚡ Quick win

Run the generated patch script in strict bash mode.

Line 88 writes a script without set -euo pipefail; intermediate failures inside a multi-command block can be missed and still report success.

Proposed change
-    fs.writeFileSync(scriptPath, ["#!/usr/bin/env bash", command].join("\n"), { mode: 0o700 });
+    fs.writeFileSync(
+      scriptPath,
+      ["#!/usr/bin/env bash", "set -euo pipefail", command].join("\n"),
+      { mode: 0o700 },
+    );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/fetch-guard-patch-regression.test.ts` around lines 87 - 92, The
generated test script is missing strict bash options, so update the script
content written by fs.writeFileSync (the script created at scriptPath) to
include a strict mode header (e.g., add "set -euo pipefail" immediately after
the shebang) so any intermediate failures in multi-command blocks cause the
script to fail; keep the shebang (#!/usr/bin/env bash) and then insert the
strict flags before writing the command string that is later executed via
spawnSync.

83-87: ⚡ Quick win

Use replaceAll for the OpenClaw dist-path rewrite.

Line 86 only rewrites the first occurrence. If the Dockerfile patch block gains another matching path, the test can hit the real absolute path unexpectedly.

Proposed change
-    ).replace("/usr/local/lib/node_modules/openclaw/dist", dist);
+    ).replaceAll("/usr/local/lib/node_modules/openclaw/dist", dist);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/fetch-guard-patch-regression.test.ts` around lines 83 - 87, The test
constructs the Docker command into the variable command using
dockerRunCommandBetween and then calls .replace to rewrite the OpenClaw dist
path, but .replace only replaces the first occurrence; change that call to use a
global replace (e.g., replaceAll) so every instance of
"/usr/local/lib/node_modules/openclaw/dist" is rewritten to the test-local dist
value; update the expression that assigns command (the code calling .replace) to
use replaceAll with the same search string and the dist variable so no stray
absolute paths remain.
test/sandbox-provisioning.test.ts (1)

166-170: 💤 Low value

Consider making whitespace normalization more robust.

The .replace('python3 -c " ', 'python3 -c "') fix depends on exact Dockerfile formatting. If the Dockerfile's line continuation style changes, this could silently break the test.

Consider normalizing whitespace more broadly in dockerRunCommandBetween, or adding a comment explaining why this specific fix is needed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sandbox-provisioning.test.ts` around lines 166 - 170, The test's fragile
post-processing uses a specific string replace after calling
dockerRunCommandBetween, which will break if spacing/line-continuation in the
Dockerfile changes; instead update dockerRunCommandBetween to normalize
whitespace and line-continuations (collapse consecutive spaces, trim
leading/trailing spaces, and join backslash-continuation lines) so returned
command is stable, or at minimum replace the ad-hoc .replace call with a broader
normalization (e.g., collapse multiple spaces around python3 -c and remove stray
space after the opening quote) and add a short comment in the test referencing
dockerRunCommandBetween to explain why normalization is required.
test/service-env.test.ts (1)

202-238: 🏗️ Heavy lift

These tests still key off nemoclaw-start.sh comment anchors.

Both cases locate a block with indexOf(...) and execute src.slice(...). That means a benign reorder or comment rewrite in the entrypoint can fail the test without any behavioral regression, which keeps the suite coupled to source shape. Prefer driving the real script with controlled env/paths, or extracting these shell helpers into directly invokable files.

Also applies to: 323-367

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/service-env.test.ts` around lines 202 - 238, The test currently relies
on fragile string anchors (start/end, src.slice) and writes a temporary shell
snippet (tmpFile) to assert SSL_CERT_FILE handling; instead, change the test to
drive the real entrypoint or extracted helper directly by executing the actual
nemoclaw-start.sh (or refactor the shell helper into a committed, invokable
script) with controlled environment and temporary paths, replacing the
indexOf/src.slice logic and temporary concatenation (writeFileSync/execFileSync)
with a direct execFileSync call against the real script (passing a fake CA
bundle path via env and asserting output), or load the shell helper file and
invoke it as a standalone script so the test verifies behavior not source
layout.
test/sandbox-init.test.ts (1)

512-543: 🏗️ Heavy lift

This still depends on nemoclaw-start.sh source layout.

Locating the block via _SANDBOX_INIT= and # Harden: limit process count means a harmless refactor of the entrypoint can break the test even if it still sources sandbox-init.sh correctly. That keeps the test coupled to production text shape; a more robust version would invoke the real script path (or a thin wrapper around it) and assert the sentinel effect without slicing by marker text.

scripts/find-source-shape-tests.ts (1)

77-85: ⚡ Quick win

Use readdirSync(..., { withFileTypes: true }) to avoid an extra statSync per entry.

This script runs in local hooks/CI paths; cutting one syscall per directory entry improves scan latency with no behavior change.

Suggested refactor
-  for (const entry of readdirSync(dir)) {
-    const abs = join(dir, entry);
+  for (const entry of readdirSync(dir, { withFileTypes: true })) {
+    const abs = join(dir, entry.name);
     if (isSkippedPath(abs)) continue;

-    const stats = statSync(abs);
-    if (stats.isDirectory()) {
+    if (entry.isDirectory()) {
       yield* walkFiles(abs);
-    } else if (stats.isFile()) {
+    } else if (entry.isFile()) {
       yield abs;
     }
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/find-source-shape-tests.ts` around lines 77 - 85, The loop in
walkFiles uses readdirSync(dir) plus statSync(abs) per entry; replace it by
calling readdirSync(dir, { withFileTypes: true }) and iterate Dirent objects so
you can avoid statSync: compute abs with join(dir, dirent.name), continue when
isSkippedPath(abs), then use dirent.isDirectory() to recurse with yield*
walkFiles(abs) or dirent.isFile() to yield abs. Update references in
scripts/find-source-shape-tests.ts (walkFiles) accordingly and remove the
now-unnecessary statSync call.
test/credential-exposure.test.ts (1)

145-167: ⚡ Quick win

Extend the secret non-forwarding assertion to plugin env builder too.

Line 153 only verifies buildCliSubprocessEnv(). If plugin env filtering regresses to include NVIDIA_API_KEY, this test still passes. Please assert both builders here, similar to the TLS parity tests.

Suggested patch
   try {
     process.env.NVIDIA_API_KEY = "nvapi-secret-should-not-leak";
     process.env.PATH = `/tmp/nemoclaw-fake-bin:${process.env.PATH || ""}`;
-    const env = buildCliSubprocessEnv();
-    expect(env.NVIDIA_API_KEY).toBeUndefined();
-    expect(env.PATH).toContain("/tmp/nemoclaw-fake-bin");
+    for (const buildSubprocessEnv of [buildCliSubprocessEnv, buildPluginSubprocessEnv]) {
+      const env = buildSubprocessEnv();
+      expect(env.NVIDIA_API_KEY).toBeUndefined();
+      expect(env.PATH).toContain("/tmp/nemoclaw-fake-bin");
+    }
   } finally {
Based on learnings: Security-sensitive code paths (credential sanitization, SSRF validation, policy enforcement) require extra test coverage to maintain security integrity.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/credential-exposure.test.ts` around lines 145 - 167, The current test
only checks buildCliSubprocessEnv for leaking NVIDIA_API_KEY; update the test to
also call buildPluginSubprocessEnv() and assert the same expectations
(expect(env.NVIDIA_API_KEY).toBeUndefined() and
expect(env.PATH).toContain("/tmp/nemoclaw-fake-bin")) so both subprocess env
builders are covered; locate the assertions around buildCliSubprocessEnv in the
"subprocess env builder does not spread full process.env into subprocesses" test
and duplicate the checks for buildPluginSubprocessEnv.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/find-source-shape-tests.ts`:
- Around line 324-337: The detector misses direct expect(readFileSync(...))
assertions because assertionFromCall only tests isReadFileCall on the expect
call (expectBase) and checks textual membership against sourceVars; fix by
inspecting the actual subject expression node: grab expectBase.arguments[0] as
an Expression (not only getText) and set referencesSource =
[...sourceVars].some(name => textContainsIdentifier(subjectText, name)) OR
isReadFileCall(subjectNode); then allow the assertion when either condition is
true. Apply the same change to the analogous check referenced around the other
occurrence (lines noted) so direct expect(readFileSync(...)) assertions are
detected.
- Around line 350-359: The isTestCall function currently only checks direct
Identifier and single PropertyAccessExpression names and misses wrapped/chained
forms like it.only, test.each(...)(...), or obj.test.only; update isTestCall
(and add a small helper e.g. getRootIdentifierName) to recursively unwrap
CallExpression and PropertyAccessExpression nodes: if node.expression is a
CallExpression, recurse into its expression; if PropertyAccessExpression,
recurse into its expression but also check the final name; finally when you
reach an Identifier check for "it" or "test". Ensure isTestCall accepts
ts.CallExpression and uses that helper so wrapped forms are detected.

In `@test/exec-approvals-path-regression.test.ts`:
- Around line 7-13: The test case it("unified layout and mutable-default
permissions are covered by behavior suites", ...) currently contains a no-op
assertion expect(true).toBe(true); replace this placeholder with a real
regression check that exercises the exec-approvals path (for example, invoke the
function/fixture that triggers exec-approvals behavior and assert expected side
effects or outputs), or if you cannot implement such an assertion here, remove
the entire test block to avoid a misleading passing test; ensure the replacement
references the actual helper/fixture names used to exercise exec-approvals so
the test genuinely guards the regression.

In `@test/nemoclaw-start.test.ts`:
- Around line 760-768: The harness currently in runSlackGuardHarness injects the
guard script but doesn't enable the sandbox gate so SLACK_GUARD_EOF returns
early; modify the generated inline script so it sets
process.env.OPENSHELL_SANDBOX = "1" (or otherwise enables the sandbox) before
inserting the extractGuardScript() content so the Slack guard actually installs
and the subsequent assertions exercise the interception path.

In `@test/runner.test.ts`:
- Around line 703-707: Replace the current existence-only assertions in the
"installers expose explicit, reviewable scripts instead of piping curl to shell"
test with a runtime harness: locate the test (the it(...) block) and execute
both ../install.sh and ../scripts/install.sh via child_process.spawn/exec with a
controlled env that injects a fake "curl" and "sh" into PATH (or uses PATH
overrides) to detect any attempt to pipe to shell; assert the fake curl is never
executed (or that the script exits without invoking external network fetches)
and/or that the script explicitly reads local files rather than invoking
subprocesses that match the "curl | sh" pattern—mirror the installer harness
pattern used elsewhere in the test suite to stub external commands and assert
calls (use the test name and the path.join(import.meta.dirname, "..",
"install.sh") and path.join(import.meta.dirname, "..", "scripts", "install.sh")
targets to locate the scripts).
- Around line 638-661: The test is nondeterministic because the stub's PATH
includes "/usr/bin" which may contain a real gh binary and it doesn't assert
process exit status; update the stub to set PATH to only the controlled tmpBin
(remove "/usr/bin:/bin") so the test cannot accidentally hit the gh-present
branch, and add an explicit assertion on spawnSync's result.status (use the
local variable result and assert result.status === 0 or the expected exit code)
after spawning to ensure the script executed successfully; locate and modify the
variables/script snippet named scriptPath, tmpBin, stub and check result.status
from spawnSync in the same test block.

In `@test/sandbox-provisioning.test.ts`:
- Around line 206-209: The spawnSync call that executes the sourced script (the
variable sourceScript) can hang because it lacks a timeout; update the options
object passed to spawnSync (the call where env is set to { ...process.env,
NEMOCLAW_CODEX_ACP_HOME: tmp }) to include a timeout (use the same 5000 ms used
elsewhere in this test file) so that the test fails fast instead of blocking
indefinitely.

In `@test/wsl2-probe-timeout.test.ts`:
- Around line 66-105: The helper runProbeWithCurlStatuses should freeze
platform.isWsl so retry-arg calculations stay hermetic: inside the function,
require the platform module used by
getValidationProbeCurlArgs/onboard-inference-probes, save the original
platform.isWsl, set platform.isWsl = () => false (or a boolean) before requiring
probeOpenAiLikeEndpoint, then restore the original isWsl in the finally block
(alongside restoring httpProbe.runCurlProbe and clearing require.cache). This
ensures probeOpenAiLikeEndpoint and getValidationProbeCurlArgs compute the
expected 20/30 retry args regardless of the real host.

---

Nitpick comments:
In `@scripts/find-source-shape-tests.ts`:
- Around line 77-85: The loop in walkFiles uses readdirSync(dir) plus
statSync(abs) per entry; replace it by calling readdirSync(dir, { withFileTypes:
true }) and iterate Dirent objects so you can avoid statSync: compute abs with
join(dir, dirent.name), continue when isSkippedPath(abs), then use
dirent.isDirectory() to recurse with yield* walkFiles(abs) or dirent.isFile() to
yield abs. Update references in scripts/find-source-shape-tests.ts (walkFiles)
accordingly and remove the now-unnecessary statSync call.

In `@test/credential-exposure.test.ts`:
- Around line 145-167: The current test only checks buildCliSubprocessEnv for
leaking NVIDIA_API_KEY; update the test to also call buildPluginSubprocessEnv()
and assert the same expectations (expect(env.NVIDIA_API_KEY).toBeUndefined() and
expect(env.PATH).toContain("/tmp/nemoclaw-fake-bin")) so both subprocess env
builders are covered; locate the assertions around buildCliSubprocessEnv in the
"subprocess env builder does not spread full process.env into subprocesses" test
and duplicate the checks for buildPluginSubprocessEnv.

In `@test/fetch-guard-patch-regression.test.ts`:
- Around line 87-92: The generated test script is missing strict bash options,
so update the script content written by fs.writeFileSync (the script created at
scriptPath) to include a strict mode header (e.g., add "set -euo pipefail"
immediately after the shebang) so any intermediate failures in multi-command
blocks cause the script to fail; keep the shebang (#!/usr/bin/env bash) and then
insert the strict flags before writing the command string that is later executed
via spawnSync.
- Around line 83-87: The test constructs the Docker command into the variable
command using dockerRunCommandBetween and then calls .replace to rewrite the
OpenClaw dist path, but .replace only replaces the first occurrence; change that
call to use a global replace (e.g., replaceAll) so every instance of
"/usr/local/lib/node_modules/openclaw/dist" is rewritten to the test-local dist
value; update the expression that assigns command (the code calling .replace) to
use replaceAll with the same search string and the dist variable so no stray
absolute paths remain.

In `@test/sandbox-provisioning.test.ts`:
- Around line 166-170: The test's fragile post-processing uses a specific string
replace after calling dockerRunCommandBetween, which will break if
spacing/line-continuation in the Dockerfile changes; instead update
dockerRunCommandBetween to normalize whitespace and line-continuations (collapse
consecutive spaces, trim leading/trailing spaces, and join
backslash-continuation lines) so returned command is stable, or at minimum
replace the ad-hoc .replace call with a broader normalization (e.g., collapse
multiple spaces around python3 -c and remove stray space after the opening
quote) and add a short comment in the test referencing dockerRunCommandBetween
to explain why normalization is required.

In `@test/service-env.test.ts`:
- Around line 202-238: The test currently relies on fragile string anchors
(start/end, src.slice) and writes a temporary shell snippet (tmpFile) to assert
SSL_CERT_FILE handling; instead, change the test to drive the real entrypoint or
extracted helper directly by executing the actual nemoclaw-start.sh (or refactor
the shell helper into a committed, invokable script) with controlled environment
and temporary paths, replacing the indexOf/src.slice logic and temporary
concatenation (writeFileSync/execFileSync) with a direct execFileSync call
against the real script (passing a fake CA bundle path via env and asserting
output), or load the shell helper file and invoke it as a standalone script so
the test verifies behavior not source layout.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f5d8537f-19c8-4962-8e5e-b361100e14f0

📥 Commits

Reviewing files that changed from the base of the PR and between f9d21af and 6d08ab7.

📒 Files selected for processing (30)
  • .pre-commit-config.yaml
  • ci/source-shape-test-budget.json
  • nemoclaw/src/blueprint/runner.test.ts
  • package.json
  • scripts/find-source-shape-tests.ts
  • src/lib/deploy.test.ts
  • test/cli.test.ts
  • test/credential-exposure.test.ts
  • test/credentials.test.ts
  • test/dns-proxy.test.ts
  • test/exec-approvals-path-regression.test.ts
  • test/fetch-guard-patch-regression.test.ts
  • test/http-proxy-fix-sync.test.ts
  • test/install-preflight.test.ts
  • test/local-inference-setup.test.ts
  • test/nemoclaw-start.test.ts
  • test/nemotron-inference-fix.test.ts
  • test/onboard.test.ts
  • test/policies.test.ts
  • test/runner.test.ts
  • test/sandbox-init.test.ts
  • test/sandbox-provisioning.test.ts
  • test/seccomp-guard.test.ts
  • test/secret-redaction.test.ts
  • test/security-c2-dockerfile-injection.test.ts
  • test/service-env.test.ts
  • test/shellquote-sandbox.test.ts
  • test/shields.test.ts
  • test/slack-token-rewriter-sync.test.ts
  • test/wsl2-probe-timeout.test.ts
💤 Files with no reviewable changes (1)
  • test/security-c2-dockerfile-injection.test.ts

Comment thread scripts/find-source-shape-tests.ts
Comment thread scripts/find-source-shape-tests.ts Outdated
Comment thread test/exec-approvals-path-regression.test.ts Outdated
Comment thread test/nemoclaw-start.test.ts
Comment thread test/runner.test.ts
Comment thread test/runner.test.ts Outdated
Comment thread test/sandbox-provisioning.test.ts
Comment thread test/wsl2-probe-timeout.test.ts
cv added 2 commits May 1, 2026 01:23
Add a source-shape detector and budget gate, then replace production-source-text assertions across the CLI/plugin test suite with behavior coverage. Behavior replacements use subprocess fakes, temp fixtures, shell snippet execution, emitted files, stdout/stderr/exit-code assertions, and direct helper behavior instead of asserting on implementation strings.

Experiments: kept runs #1-#88
Metric: source_shape_cases 317 → 0 (-100%); source_shape_assertions 819 → 0; source_shape_files 23 → 0

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/shellquote-sandbox.test.ts`:
- Around line 9-13: The test currently uses CommonJS loading via createRequire
and require; remove the createRequire(import.meta.url) call and any require(...)
usages and switch to ESM imports instead: replace static CommonJS imports with
top-level import statements (e.g., import shellquote or other modules directly)
and where dynamic loading was used, use await import('module') inside the test
(e.g., in your describe/it blocks) to load modules asynchronously; update any
references to the old require variable and ensure the file uses only ESM syntax
(no createRequire or require).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 901c1b54-24e7-4fb7-9e6d-828ef084bc40

📥 Commits

Reviewing files that changed from the base of the PR and between 6d08ab7 and 4494999.

📒 Files selected for processing (6)
  • test/cli.test.ts
  • test/install-preflight.test.ts
  • test/onboard.test.ts
  • test/policies.test.ts
  • test/sandbox-provisioning.test.ts
  • test/shellquote-sandbox.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • test/policies.test.ts
  • test/sandbox-provisioning.test.ts

Comment thread test/shellquote-sandbox.test.ts Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/shellquote-sandbox.test.ts (1)

163-169: 💤 Low value

Consider simplifying type narrowing with non-null assertion.

The as unknown as double cast is a workaround for TypeScript not narrowing after Vitest's expect().not.toBeNull(). Since the assertion on line 163 already validates non-null, you can use the non-null assertion operator directly for cleaner code.

♻️ Proposed simplification
       expect(captured).not.toBeNull();
-      const capturedRun = captured as unknown as {
-        command: string[];
-        opts: { ignoreError?: boolean };
-      };
-      expect(capturedRun.command).toEqual(["/tmp/custom-openshell", "--version"]);
-      expect(capturedRun.opts.ignoreError).toBe(true);
+      expect(captured!.command).toEqual(["/tmp/custom-openshell", "--version"]);
+      expect(captured!.opts.ignoreError).toBe(true);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/shellquote-sandbox.test.ts` around lines 163 - 169, The test uses a
double cast "as unknown as" to coerce `captured` into the `capturedRun` shape
after `expect(captured).not.toBeNull()`; replace that double cast by using the
non-null assertion operator to simplify narrowing (i.e., assign `capturedRun`
from `captured!` and keep the shape `{ command: string[]; opts: { ignoreError?:
boolean } }`), so subsequent assertions on `capturedRun.command` and
`capturedRun.opts.ignoreError` remain valid.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@test/shellquote-sandbox.test.ts`:
- Around line 163-169: The test uses a double cast "as unknown as" to coerce
`captured` into the `capturedRun` shape after `expect(captured).not.toBeNull()`;
replace that double cast by using the non-null assertion operator to simplify
narrowing (i.e., assign `capturedRun` from `captured!` and keep the shape `{
command: string[]; opts: { ignoreError?: boolean } }`), so subsequent assertions
on `capturedRun.command` and `capturedRun.opts.ignoreError` remain valid.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: bca5e815-7957-42e7-9ea2-7572bb9bfc77

📥 Commits

Reviewing files that changed from the base of the PR and between 4494999 and 0e2d923.

📒 Files selected for processing (2)
  • test/runner.test.ts
  • test/shellquote-sandbox.test.ts

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/shellquote-sandbox.test.ts`:
- Around line 20-179: The test still performs a source-shape assertion by
reading src/lib/onboard.ts and regex-scanning for run()/runCapture() template
literals; remove that last static source inspection from
test/shellquote-sandbox.test.ts and replace it with behavior-based checks (use
the existing runtime mocks around createSandbox, run(), runCapture(), and
runCaptureOpenshell to assert correct behavior) or move the policy enforcement
into scripts/find-source-shape-tests.ts; specifically, eliminate any code that
reads src/lib/onboard.ts and any regex assertions about run()/runCapture() and
instead assert the same contract by verifying calls to runner.run,
runner.runFile, runner.runCapture, and the exported runCaptureOpenshell
function.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b9fd20f1-65e4-45c3-b282-22826575099d

📥 Commits

Reviewing files that changed from the base of the PR and between 0e2d923 and 05bb310.

📒 Files selected for processing (1)
  • test/shellquote-sandbox.test.ts

Comment thread test/shellquote-sandbox.test.ts Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
test/wsl2-probe-timeout.test.ts (1)

79-80: ⚡ Quick win

Fail fast when scripted probe responses are exhausted.

Line 79 and Line 155 currently default to success/last-result when extra calls happen. That can mask unintended retries. Throwing when the scripted queue is exceeded makes these tests much more regression-sensitive.

Proposed patch
-        const status = statuses[index++] ?? 0;
+        const status = statuses[index++];
+        if (status === undefined) {
+          throw new Error(`Unexpected extra probe invocation at call ${index}`);
+        }
@@
-        return results[index++] ?? results[results.length - 1];
+        const next = results[index++];
+        if (!next) {
+          throw new Error(`Unexpected extra probe invocation at call ${index}`);
+        }
+        return next;

Also applies to: 155-156

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/wsl2-probe-timeout.test.ts` around lines 79 - 80, The test currently
masks extra probe calls by defaulting to a success/last-result when the scripted
arrays are exhausted; update the logic around the scripted response consumption
(the statuses array access using index and the results/lastResult access later)
to throw an error when index is out of bounds instead of using the
nullish/default fallback. Concretely, replace the expressions that use
"statuses[index++] ?? 0" and the similar "results/lastResult" fallback with an
explicit bounds check that throws a clear error like "No more scripted probe
responses" when index >= statuses.length (and similarly for results.length)
before advancing the index and returning the scripted value.
test/nemoclaw-start.test.ts (1)

576-582: 💤 Low value

Consider consolidating duplicate helper functions.

extractShellFunction (lines 576-582, 678-684) and extractFunction (lines 1221-1227) duplicate the top-level extractShellFunctionFromSource (lines 44-50). You could reuse the existing helper:

-  function extractShellFunction(name: string): string {
-    const match = src.match(new RegExp(`${name}\\(\\) \\{([\\s\\S]*?)^\\}`, "m"));
-    if (!match) {
-      throw new Error(`Expected ${name} in scripts/nemoclaw-start.sh`);
-    }
-    return `${name}() {${match[1]}\n}`;
-  }
+  const extractShellFunction = (name: string) => extractShellFunctionFromSource(src, name);

Regarding the static analysis ReDoS warnings: these are false positives since name always comes from hardcoded string literals in the test code (e.g., "apply_model_override", "_read_gateway_token").

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/nemoclaw-start.test.ts` around lines 576 - 582, There are duplicate
helpers—extractShellFunction and extractFunction—that replicate
extractShellFunctionFromSource; refactor tests to call the existing
extractShellFunctionFromSource instead of redefining those functions (search for
extractShellFunction and extractFunction and replace their bodies/calls with
calls to extractShellFunctionFromSource), and leave a comment near the call
sites noting that the ReDoS static-analysis warnings are false positives because
name is a hardcoded literal (e.g., "apply_model_override",
"_read_gateway_token").
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/wsl2-probe-timeout.test.ts`:
- Around line 77-89: The mock for httpProbe.runCurlProbe must return objects
that exactly match the ProbeResult shape; update both branches in the mocked
function (the successful branch returning { ok: true, ... } and the failure
branch returning { ok: false, ... }) to include all ProbeResult fields (notably
stderr and ensure body is present) so tests mirror production contract; also
update the other fixtures referenced (the ones around the statuses array and
fixtures at the other mentioned spots) to include the same full set of fields so
all test doubles consistently return complete ProbeResult objects.

---

Nitpick comments:
In `@test/nemoclaw-start.test.ts`:
- Around line 576-582: There are duplicate helpers—extractShellFunction and
extractFunction—that replicate extractShellFunctionFromSource; refactor tests to
call the existing extractShellFunctionFromSource instead of redefining those
functions (search for extractShellFunction and extractFunction and replace their
bodies/calls with calls to extractShellFunctionFromSource), and leave a comment
near the call sites noting that the ReDoS static-analysis warnings are false
positives because name is a hardcoded literal (e.g., "apply_model_override",
"_read_gateway_token").

In `@test/wsl2-probe-timeout.test.ts`:
- Around line 79-80: The test currently masks extra probe calls by defaulting to
a success/last-result when the scripted arrays are exhausted; update the logic
around the scripted response consumption (the statuses array access using index
and the results/lastResult access later) to throw an error when index is out of
bounds instead of using the nullish/default fallback. Concretely, replace the
expressions that use "statuses[index++] ?? 0" and the similar
"results/lastResult" fallback with an explicit bounds check that throws a clear
error like "No more scripted probe responses" when index >= statuses.length (and
similarly for results.length) before advancing the index and returning the
scripted value.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 21b64f1b-a626-41c0-83e0-0bdb54883795

📥 Commits

Reviewing files that changed from the base of the PR and between 05bb310 and b2d9d34.

📒 Files selected for processing (7)
  • scripts/find-source-shape-tests.ts
  • test/exec-approvals-path-regression.test.ts
  • test/nemoclaw-start.test.ts
  • test/runner.test.ts
  • test/sandbox-provisioning.test.ts
  • test/shellquote-sandbox.test.ts
  • test/wsl2-probe-timeout.test.ts
💤 Files with no reviewable changes (1)
  • test/exec-approvals-path-regression.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • test/shellquote-sandbox.test.ts
  • test/sandbox-provisioning.test.ts

Comment thread test/wsl2-probe-timeout.test.ts
@cv cv requested review from ericksoa and jyaunches May 1, 2026 19:19
…hape-zero-final/01-eliminate-source-shape-tests

# Conflicts:
#	test/sandbox-provisioning.test.ts
@cv cv enabled auto-merge (squash) May 1, 2026 22:09
…hape-zero-final/01-eliminate-source-shape-tests

# Conflicts:
#	test/nemoclaw-start.test.ts
@cv cv merged commit a05d44b into main May 1, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: e2e End-to-end tests, nightly failures, or validation infrastructure feature PR adds or expands user-visible functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants