What task are you trying to do?
Cut the wall time of `ci / unit` on PRs. It currently spends ~4 minutes per PR run, which adds friction on every iteration.
What do you do today?
Recent run on PR #72 (representative):
- Total job time: 232s
- Checkout + setup + cache + install: 26s
- `bun turbo test:ci`: 186s
- Artifact upload + cache post: 20s
Per-package breakdown of the 186s test block:
| Package |
Tests |
Files |
Runtime |
| `@opencode-ai/app` |
371 |
66 |
1.97s |
| `@opencode-ai/desktop-electron` |
29 |
12 |
32.7s |
| `opencode` |
2006 |
178 |
172.8s |
| upstream builds (sdk/plugin/app) |
— |
— |
~40s |
`opencode` owns roughly 93% of the test time. `desktop-electron` runs ~1.1s/test on 29 tests, which suggests real I/O or process spawning. Also observed on the same run: the turbo cache step reported `Cache not found for input keys: turbo-Linux-...` with all three restore-key fallbacks missing. A YAML-only PR (where no package source actually changed) should in principle cache-hit every `test:ci` task and finish in seconds, but today it runs the full suite from scratch.
What would a good result look like?
- Docs / YAML-only PRs finish `ci / unit` in under 45 seconds via turbo cache hits.
- PRs that only touch a single package run `test:ci` for that package only (plus its dependents), not the whole workspace.
- Full-cold runs (first touch after cache eviction) drop meaningfully by parallelising `bun test` across cores.
- Each lever below is verified with a before/after benchmark in the PR description, not "looks right to me".
Which audience does this matter to most?
Maintainers (CI velocity).
Extra context
Optimization levers, ranked by expected win
- P1 — Fix turbo cache hit rate. The cache missed on a YAML-only PR that should have hit. Likely causes to investigate: `passThroughEnv: ["*"]` on every `test:ci` task in `turbo.json` (overly broad env invalidation), the `${{ github.sha }}` suffix in the cache key combined with restore-key prefixes that may not match across workspaces, and missing per-task `inputs` declarations. Verify: run a no-op docs PR against a warmed cache and confirm `test:ci` shows as `FULL TURBO`.
- P2 — Use `turbo run --filter` with affected-packages detection. `bun turbo test:ci --filter=...[origin/dev]` would skip packages unaffected by the current diff. Needs careful review of how turbo computes "affected" for workspaces that import each other. Verify: PR touching only `packages/app/` should not trigger `opencode:test:ci`.
- P3 — Increase `bun test` file-level concurrency. 172.8s / 178 files ≈ 0.97s/file suggests files are running close to serial on the 4-vCPU runner. Try `--concurrency=4` or the equivalent runtime flag. Verify: benchmark opencode tests with and without the flag locally on a 4-core container, confirm no flakiness regression.
- P3 — Sandbox or mock the slow desktop-electron tests. 1.1s/test on 29 tests is high and likely reflects process spawning or real FS access. Identify the worst offenders with `bun test --timeout --verbose` and convert to unit-level mocks where possible.
Out of scope
- Cross-OS caching (Windows is deferred per the Phase 1 macOS-only scope in AGENTS.md).
- Parallel jobs via a test matrix: turbo already parallelises packages within one job; splitting into multiple jobs adds runner cold-start overhead that likely eats the savings.
Sequencing recommendation
Benchmark before any change. Land the turbo cache fix first in its own PR (biggest win, lowest correctness risk). The `--filter` and concurrency changes are each their own PR with before/after numbers.
What task are you trying to do?
Cut the wall time of `ci / unit` on PRs. It currently spends ~4 minutes per PR run, which adds friction on every iteration.
What do you do today?
Recent run on PR #72 (representative):
Per-package breakdown of the 186s test block:
`opencode` owns roughly 93% of the test time. `desktop-electron` runs ~1.1s/test on 29 tests, which suggests real I/O or process spawning. Also observed on the same run: the turbo cache step reported `Cache not found for input keys: turbo-Linux-...` with all three restore-key fallbacks missing. A YAML-only PR (where no package source actually changed) should in principle cache-hit every `test:ci` task and finish in seconds, but today it runs the full suite from scratch.
What would a good result look like?
Which audience does this matter to most?
Maintainers (CI velocity).
Extra context
Optimization levers, ranked by expected win
Out of scope
Sequencing recommendation
Benchmark before any change. Land the turbo cache fix first in its own PR (biggest win, lowest correctness risk). The `--filter` and concurrency changes are each their own PR with before/after numbers.