test(cli): subprocess integration test harness + regression suite for opencode run#28230
Merged
Conversation
Phase 1 of a broader effort to close the integration test gap for the run command. Today every test under cli/run/*.test.ts is a unit test of an extracted helper — nothing exercises the RunCommand handler end-to- end. Bugs that span argv → server boot → SDK call → event consumption → exit (like #27371 or the /event race) are invisible to in-process tests. This commit adds: - `test/lib/run-process.ts` — a `withRunFixture` helper that provisions a TestLLMServer running in-process, an isolated tmpdir for HOME/XDG, and a `runOpencode(args)` spawn function. The CLI subprocess talks to the fake LLM over real HTTP at a random port. Configuration flows through `OPENCODE_CONFIG_CONTENT` (inline JSON env var), bypassing file-search complexity. Background work (auto-update, auto-compact, models fetch, external plugins) is disabled via opencode's built-in test env vars. - `test/cli/run/run-process.test.ts` — one smoke test that proves the harness wires up correctly: spawn `opencode run "say hi"` against a TestLLMServer queued with a single text response, assert exit 0 and the response appears on stdout. The smoke test runs in ~4s. With this harness in place, Phase 2 will add the regression test suite (invalid-model hang, JSON format, midstream errors, --command path).
Two ergonomic changes after the first pass:
1. Replace the freestanding `runOpencode(["run", "--model", modelID, msg])`
with a typed builder on the fixture: `opencode.run(msg, opts?)`. The
fixture defaults the model so tests don't repeat it, and flags like
`format`, `agent`, `command`, `printLogs` are typed instead of stringly.
`opencode.spawn(argv)` stays as the escape hatch for arbitrary args.
2. Introduce `runIt.live(name, fixture => effect)` that wraps
`it.live(name, () => withRunFixture(fixture))`. Saves one nesting
level + the arrow-to-fixture closure at every call site.
`expectExit` and `RunResult` move under the `opencode` namespace returned
by the fixture (no separate top-level export to chase).
Before:
it.live("happy path", () =>
withRunFixture(({ llm, modelID, runOpencode }) =>
Effect.gen(function*() {
yield* llm.text("hello")
const result = yield* runOpencode(
["run", "--model", modelID, "say hi"],
{ timeoutMs: 30_000 },
)
expectExit(result, 0, "happy path")
expect(result.stdout).toContain("hello")
})))
After:
runIt.live("happy path", ({ llm, opencode }) =>
Effect.gen(function*() {
yield* llm.text("hello")
const result = yield* opencode.run("say hi")
opencode.expectExit(result, 0)
expect(result.stdout).toContain("hello")
}))
Two small wins from a simplify-pass review: 1. The fake-LLM provider config was duplicated between test/lib/run-process.ts (new) and test/server/httpapi-sdk.test.ts (existing). Same shape, modulo whitespace and parameter name. Extracted to test/lib/test-provider.ts and re-used from both. 2. Added a comment on `runIt` explaining why only `.live` is exposed — subprocess tests must use the real clock; TestClock can't drive a child process. Future readers won't have to wonder if `.only`/`.skip` were oversights. Other simplify-pass findings reviewed and skipped: - Using `tmpdirScoped` would add a ChildProcessSpawner layer dependency for code paths that don't need git; current 4-line mkdir+cleanup is simpler. - Collapsing the argv conditionals via `.concat([])` is a style preference, not clearer. - `OPENCODE_AUTH_CONTENT` is conceptually part of "test env isolation", not a separate concern; staying in `isolatedEnv`. - crypto.randomUUID() for tmpdir naming — same Math.random pattern as the existing tmpdirScoped helper; collisions are theoretical given bun:test's default serial execution.
Three new tests using the harness from earlier commits: 1. Unknown-model regression for #27371 — used to hang forever waiting on a session.status === idle event that never arrived. Asserts both nonzero exit and wall-clock under 15s (a hang would expire timeout and produce a different signal-killed failure). 2. Mid-stream LLM error contract lock-in — when llm.fail(...) errors the SSE response after the prompt was accepted, opencode currently exits 0. Captures that as the contract so a future cleanup (e.g. flipping session.error events to nonzero exit) is explicit. 3. --format json shape — emits one JSON object per line on stdout. Each event has `type` and `sessionID`. At least one `text` event with the LLM response. Locks in the wire shape for CI scripts and tooling. Total: 4 tests, 10.8s in serial.
The `--format json` regression test was parsing stdout into events with six inline lines (split, trim, filter, parse, then validating). All three simplify-pass reviewers flagged this as a reusable helper. Move it to OpencodeCli as `opencode.parseJsonEvents(stdout)`. The test collapses to a single call. Any future --format json test gets the same parsing for free, including the "throws loudly on malformed line" check. Other simplify-pass findings reviewed and skipped: - expectDurationUnder helper — only one test would use it; premature. - Tighter outer test timeouts — the per-test durationMs assertion already detects hangs at the subprocess level; outer timeout is pure safety net. - Streaming JSON parser — "O(n²)" claim doesn't apply at our scale (<100 events per test). - Parallel test execution — TestLLMServer-style singletons in the existing test pattern make cross-test pollution likely; not worth the risk for marginal CI speedup.
AIALRA-0
pushed a commit
to AIALRA-0/opencode-turn-engine
that referenced
this pull request
Jun 10, 2026
AIALRA-0
pushed a commit
to AIALRA-0/opencode-turn-engine
that referenced
this pull request
Jun 10, 2026
avion23
pushed a commit
to avion23/opencode
that referenced
this pull request
Jun 10, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes the integration-test gap for the
opencode runcommand. Today everycli/run/*.test.tsis a unit test of an extracted helper — nothing exercises the full handler end-to-end. Bugs that span argv → server boot → SDK call → event consumption → exit (like the original/eventrace or #27371's invalid-model hang) were invisible to in-process tests.What's in this PR
Harness (
test/lib/run-process.ts,test/lib/test-provider.ts)withRunFixture(fn)— provisions aTestLLMServerrunning in-process at a random port, an isolated tmpdir forHOME/XDG_*, and a typed CLI invoker.runIt.live(name, fixture => effect)— test-runner wrapper that'sit.live+withRunFixturein one. Saves one nesting level at every call site.OpencodeCliobject exposed by the fixture:opencode.run(message, opts?)— typed builder foropencode runinvocations.opencode.spawn(argv, opts?)— escape hatch for arbitrary CLI args.opencode.expectExit(result, code)— assertion helper that dumps captured stderr/stdout on mismatch.opencode.parseJsonEvents(stdout)— parses--format jsonline-delimited output.testProviderConfig(url)— shared between the new harness and the existinghttpapi-sdk.test.ts(extracted from a near-duplicate).Configuration flows through opencode's built-in test affordances —
OPENCODE_CONFIG_CONTENT(inline JSON),OPENCODE_TEST_HOME,OPENCODE_DISABLE_PROJECT_CONFIG,OPENCODE_PURE, plus theOPENCODE_DISABLE_*flags that suppress auto-update / auto-compact / models-fetch noise. No config files written.Tests (
test/cli/run/run-process.test.ts)durationMsassertion, and report distinctly from a real SDK error.llm.fail(...)errors the SSE stream after the prompt was accepted, opencode currently exits 0. Captures the contract so a future cleanup (flipping session errors to nonzero exit) is explicit.--format jsonshape — emits one JSON object per line on stdout. Each event hastypeandsessionID. At least onetextevent with the LLM's response. Locks in the wire shape for CI scripts and tooling.All 4 pass locally in ~10s serial.
Verified
bun run test test/cli/run/run-process.test.ts— 4/4 greenbun typecheck— cleanbun run test test/server/httpapi-sdk.test.ts -t "streams sync-backed"— still green after thetestProviderConfigextractionscrollback.surface.test.tsflake unrelated to this PR — my new test passes on Windows in 4.2s.Why subprocess and not in-process
In-process tests are faster but miss exactly the bugs we're worried about: argv parsing, signal handling, exit codes from the OS perspective, server auto-start, the SDK consuming a real SSE stream over a socket. Subprocess testing is the right tier for "integration" — it's what would have caught the
/eventrace had it existed.Per-test cost is ~3-5 seconds (opencode startup). Acceptable for a small focused suite. If CI cares later, a shared warm server via
--attachis the natural next step.Follow-ups (not in this PR)
withCliFixtureso other commands (serve,acp,auth) can reuse the same pattern.