Skip to content

test(cli): subprocess integration test harness + regression suite for opencode run#28230

Merged
kitlangton merged 5 commits into
devfrom
worktree-run-integration-harness
May 18, 2026
Merged

test(cli): subprocess integration test harness + regression suite for opencode run#28230
kitlangton merged 5 commits into
devfrom
worktree-run-integration-harness

Conversation

@kitlangton

@kitlangton kitlangton commented May 18, 2026

Copy link
Copy Markdown
Contributor

Closes the integration-test gap for the opencode run command. Today every cli/run/*.test.ts is a unit test of an extracted helper — nothing exercises the full handler end-to-end. Bugs that span argv → server boot → SDK call → event consumption → exit (like the original /event race or #27371's invalid-model hang) were invisible to in-process tests.

What's in this PR

Harness (test/lib/run-process.ts, test/lib/test-provider.ts)

  • withRunFixture(fn) — provisions a TestLLMServer running in-process at a random port, an isolated tmpdir for HOME/XDG_*, and a typed CLI invoker.
  • runIt.live(name, fixture => effect) — test-runner wrapper that's it.live + withRunFixture in one. Saves one nesting level at every call site.
  • OpencodeCli object exposed by the fixture:
    • opencode.run(message, opts?) — typed builder for opencode run invocations.
    • opencode.spawn(argv, opts?) — escape hatch for arbitrary CLI args.
    • opencode.expectExit(result, code) — assertion helper that dumps captured stderr/stdout on mismatch.
    • opencode.parseJsonEvents(stdout) — parses --format json line-delimited output.
  • testProviderConfig(url) — shared between the new harness and the existing httpapi-sdk.test.ts (extracted from a near-duplicate).

Configuration flows through opencode's built-in test affordances — OPENCODE_CONFIG_CONTENT (inline JSON), OPENCODE_TEST_HOME, OPENCODE_DISABLE_PROJECT_CONFIG, OPENCODE_PURE, plus the OPENCODE_DISABLE_* flags that suppress auto-update / auto-compact / models-fetch noise. No config files written.

Tests (test/cli/run/run-process.test.ts)

  1. Happy path — prompt completes, output reaches stdout, exit 0.
  2. Unknown-model regression for fix(run): restore non-interactive exit behavior #27371 — exits nonzero AND wall-clock under 15s. A re-introduced hang would expire the inner timeout, fail the durationMs assertion, and report distinctly from a real SDK error.
  3. Mid-stream LLM error contract lock-in — when llm.fail(...) errors the SSE stream after the prompt was accepted, opencode currently exits 0. Captures the contract so a future cleanup (flipping session errors to nonzero exit) is explicit.
  4. --format json shape — emits one JSON object per line on stdout. Each event has type and sessionID. At least one text event with the LLM's response. Locks in the wire shape for CI scripts and tooling.

All 4 pass locally in ~10s serial.

Verified

  • bun run test test/cli/run/run-process.test.ts — 4/4 green
  • bun typecheck — clean
  • bun run test test/server/httpapi-sdk.test.ts -t "streams sync-backed" — still green after the testProviderConfig extraction
  • A Windows CI failure on the latest run is a pre-existing scrollback.surface.test.ts flake unrelated to this PR — my new test passes on Windows in 4.2s.

Why subprocess and not in-process

In-process tests are faster but miss exactly the bugs we're worried about: argv parsing, signal handling, exit codes from the OS perspective, server auto-start, the SDK consuming a real SSE stream over a socket. Subprocess testing is the right tier for "integration" — it's what would have caught the /event race had it existed.

Per-test cost is ~3-5 seconds (opencode startup). Acceptable for a small focused suite. If CI cares later, a shared warm server via --attach is the natural next step.

Follow-ups (not in this PR)

  • Generalize the harness to withCliFixture so other commands (serve, acp, auth) can reuse the same pattern.
  • Smoke tests across all CLI commands as a behavioral fingerprint — useful safety net for the eventual Effect CLI migration.
  • Apply the run-exit cleanup that flips session errors to nonzero exit — test 3 captures the current contract, so the change becomes explicit rather than invisible.

Phase 1 of a broader effort to close the integration test gap for the
run command. Today every test under cli/run/*.test.ts is a unit test of
an extracted helper — nothing exercises the RunCommand handler end-to-
end. Bugs that span argv → server boot → SDK call → event consumption
→ exit (like #27371 or the /event race) are invisible to in-process
tests.

This commit adds:

- `test/lib/run-process.ts` — a `withRunFixture` helper that provisions
  a TestLLMServer running in-process, an isolated tmpdir for HOME/XDG,
  and a `runOpencode(args)` spawn function. The CLI subprocess talks to
  the fake LLM over real HTTP at a random port. Configuration flows
  through `OPENCODE_CONFIG_CONTENT` (inline JSON env var), bypassing
  file-search complexity. Background work (auto-update, auto-compact,
  models fetch, external plugins) is disabled via opencode's built-in
  test env vars.

- `test/cli/run/run-process.test.ts` — one smoke test that proves the
  harness wires up correctly: spawn `opencode run "say hi"` against a
  TestLLMServer queued with a single text response, assert exit 0 and
  the response appears on stdout.

The smoke test runs in ~4s. With this harness in place, Phase 2 will
add the regression test suite (invalid-model hang, JSON format,
midstream errors, --command path).
Two ergonomic changes after the first pass:

1. Replace the freestanding `runOpencode(["run", "--model", modelID, msg])`
   with a typed builder on the fixture: `opencode.run(msg, opts?)`. The
   fixture defaults the model so tests don't repeat it, and flags like
   `format`, `agent`, `command`, `printLogs` are typed instead of stringly.
   `opencode.spawn(argv)` stays as the escape hatch for arbitrary args.

2. Introduce `runIt.live(name, fixture => effect)` that wraps
   `it.live(name, () => withRunFixture(fixture))`. Saves one nesting
   level + the arrow-to-fixture closure at every call site.

`expectExit` and `RunResult` move under the `opencode` namespace returned
by the fixture (no separate top-level export to chase).

Before:

  it.live("happy path", () =>
    withRunFixture(({ llm, modelID, runOpencode }) =>
      Effect.gen(function*() {
        yield* llm.text("hello")
        const result = yield* runOpencode(
          ["run", "--model", modelID, "say hi"],
          { timeoutMs: 30_000 },
        )
        expectExit(result, 0, "happy path")
        expect(result.stdout).toContain("hello")
      })))

After:

  runIt.live("happy path", ({ llm, opencode }) =>
    Effect.gen(function*() {
      yield* llm.text("hello")
      const result = yield* opencode.run("say hi")
      opencode.expectExit(result, 0)
      expect(result.stdout).toContain("hello")
    }))
Two small wins from a simplify-pass review:

1. The fake-LLM provider config was duplicated between
   test/lib/run-process.ts (new) and test/server/httpapi-sdk.test.ts
   (existing). Same shape, modulo whitespace and parameter name.
   Extracted to test/lib/test-provider.ts and re-used from both.

2. Added a comment on `runIt` explaining why only `.live` is exposed —
   subprocess tests must use the real clock; TestClock can't drive a
   child process. Future readers won't have to wonder if `.only`/`.skip`
   were oversights.

Other simplify-pass findings reviewed and skipped:
- Using `tmpdirScoped` would add a ChildProcessSpawner layer dependency
  for code paths that don't need git; current 4-line mkdir+cleanup is
  simpler.
- Collapsing the argv conditionals via `.concat([])` is a style
  preference, not clearer.
- `OPENCODE_AUTH_CONTENT` is conceptually part of "test env isolation",
  not a separate concern; staying in `isolatedEnv`.
- crypto.randomUUID() for tmpdir naming — same Math.random pattern as
  the existing tmpdirScoped helper; collisions are theoretical given
  bun:test's default serial execution.
@kitlangton kitlangton marked this pull request as ready for review May 18, 2026 21:21
Three new tests using the harness from earlier commits:

1. Unknown-model regression for #27371 — used to hang forever waiting on
   a session.status === idle event that never arrived. Asserts both
   nonzero exit and wall-clock under 15s (a hang would expire timeout
   and produce a different signal-killed failure).

2. Mid-stream LLM error contract lock-in — when llm.fail(...) errors the
   SSE response after the prompt was accepted, opencode currently exits
   0. Captures that as the contract so a future cleanup (e.g. flipping
   session.error events to nonzero exit) is explicit.

3. --format json shape — emits one JSON object per line on stdout. Each
   event has `type` and `sessionID`. At least one `text` event with the
   LLM response. Locks in the wire shape for CI scripts and tooling.

Total: 4 tests, 10.8s in serial.
The `--format json` regression test was parsing stdout into events with
six inline lines (split, trim, filter, parse, then validating). All
three simplify-pass reviewers flagged this as a reusable helper.

Move it to OpencodeCli as `opencode.parseJsonEvents(stdout)`. The test
collapses to a single call. Any future --format json test gets the same
parsing for free, including the "throws loudly on malformed line" check.

Other simplify-pass findings reviewed and skipped:
- expectDurationUnder helper — only one test would use it; premature.
- Tighter outer test timeouts — the per-test durationMs assertion
  already detects hangs at the subprocess level; outer timeout is
  pure safety net.
- Streaming JSON parser — "O(n²)" claim doesn't apply at our scale
  (<100 events per test).
- Parallel test execution — TestLLMServer-style singletons in the
  existing test pattern make cross-test pollution likely; not worth
  the risk for marginal CI speedup.
@kitlangton kitlangton changed the title test(cli): subprocess integration test harness for opencode run [phase 1] test(cli): subprocess integration test harness + regression suite for opencode run May 18, 2026
@kitlangton kitlangton merged commit 0f3d168 into dev May 18, 2026
10 of 12 checks passed
@kitlangton kitlangton deleted the worktree-run-integration-harness branch May 18, 2026 22:32
AIALRA-0 pushed a commit to AIALRA-0/opencode-turn-engine that referenced this pull request Jun 10, 2026
AIALRA-0 pushed a commit to AIALRA-0/opencode-turn-engine that referenced this pull request Jun 10, 2026
avion23 pushed a commit to avion23/opencode that referenced this pull request Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant