Problem
PawWork CI is currently too flaky. Recent dev runs often fail in ci / unit even when the triggering commit does not touch the failing code path. The current unit job also runs all package tests through one large bun turbo test:ci command, so one flaky package makes the whole required CI gate look broken and slows down diagnosis.
Evidence
Recent failure on commit 935ab331ffc835c0066ee8377891ea905192c97c was triggered by ci / unit, while typecheck, desktop-smoke, and codeql passed. The commit only bumped versions from 0.2.4 to 0.2.5, but the full unit cache miss exposed two flaky tests in packages/opencode: seed e2e script > exits cleanly after creating the seeded session and tool.edit > editing existing files > emits change event for existing files.
Earlier recent failures had different causes, including app test import resolution errors around @solidjs/router. That pattern suggests the CI surface has multiple unstable points rather than one deterministic product regression.
Scope
First pass should keep the existing required check name ci / check and avoid a full release or branch-protection redesign. Split the Linux unit gate by package, add Windows unit as a non-blocking signal, fix only the two currently known flaky tests, preserve the existing docs-only behavior, and adjust CI concurrency so dev runs do not cancel each other into misleading red or cancelled history.
Proposed changes
- Split the current
unit job into unit-app, unit-opencode, and unit-desktop. Each Linux package unit job should preserve Turbo dependency semantics, for example bun turbo test:ci --filter=@opencode-ai/app, bun turbo test:ci --filter=opencode, and bun turbo test:ci --filter=@opencode-ai/desktop-electron, unless the implementation preserves the same dependency builds another way.
- Publish package-specific JUnit reports and artifacts for each Linux unit job.
- Keep the existing
changes docs-only filter. ci / check should continue to depend on changes, and should gate typecheck, unit-app, unit-opencode, and unit-desktop.
- Add a Windows unit signal job that runs the full package unit command on Windows, likely
bun turbo test:ci, with continue-on-error: true. This job should also depend on changes and skip docs-only changes. It must remain outside the blocking aggregate or be explicitly ignored by it, so Windows flakes are visible but do not block PRs or dev.
- Change CI concurrency so
dev runs are not cancelled by later pushes, while PR runs still cancel stale attempts.
- Fix the known flaky tests:
packages/opencode/test/config/seed-e2e.test.ts: avoid the tight 5 second abort race and expose stdout/stderr when the seed process exits non-zero.
packages/opencode/test/tool/edit.test.ts: make the bus event assertion deterministic instead of depending on a callback timing race.
- Update workflow contract tests so future CI edits cannot forget to pin actions, disable checkout credential persistence, upload JUnit reports, preserve docs-only behavior, preserve Turbo-filtered package unit semantics, or include required unit jobs in the aggregate
check.
Non-goals
- Do not redesign release workflows.
- Do not make Windows unit blocking in this pass.
- Do not move e2e into the required CI gate in this pass.
- Do not broadly rewrite all watcher, timeout, env, or tmp-dir tests. Fix only the known flaky tests unless a direct blocker appears during implementation.
Acceptance criteria
- Linux package unit jobs are separately visible in GitHub Actions.
- Linux package unit jobs preserve Turbo dependency build semantics.
ci / check still remains the single required aggregate status for code CI.
- Docs-only behavior is preserved: when
changes.outputs.docs_only == 'true', ci / check still passes as it does today, and the new unit jobs do not run unnecessary tests for docs-only changes.
- Windows unit test signal runs but does not block merge.
- The two known flaky tests are deterministic locally and in CI.
- Workflow contract tests cover the new CI shape.
Problem
PawWork CI is currently too flaky. Recent
devruns often fail inci / uniteven when the triggering commit does not touch the failing code path. The currentunitjob also runs all package tests through one largebun turbo test:cicommand, so one flaky package makes the whole required CI gate look broken and slows down diagnosis.Evidence
Recent failure on commit
935ab331ffc835c0066ee8377891ea905192c97cwas triggered byci / unit, whiletypecheck,desktop-smoke, andcodeqlpassed. The commit only bumped versions from0.2.4to0.2.5, but the full unit cache miss exposed two flaky tests inpackages/opencode:seed e2e script > exits cleanly after creating the seeded sessionandtool.edit > editing existing files > emits change event for existing files.Earlier recent failures had different causes, including app test import resolution errors around
@solidjs/router. That pattern suggests the CI surface has multiple unstable points rather than one deterministic product regression.Scope
First pass should keep the existing required check name
ci / checkand avoid a full release or branch-protection redesign. Split the Linux unit gate by package, add Windows unit as a non-blocking signal, fix only the two currently known flaky tests, preserve the existing docs-only behavior, and adjust CI concurrency sodevruns do not cancel each other into misleading red or cancelled history.Proposed changes
unitjob intounit-app,unit-opencode, andunit-desktop. Each Linux package unit job should preserve Turbo dependency semantics, for examplebun turbo test:ci --filter=@opencode-ai/app,bun turbo test:ci --filter=opencode, andbun turbo test:ci --filter=@opencode-ai/desktop-electron, unless the implementation preserves the same dependency builds another way.changesdocs-only filter.ci / checkshould continue to depend onchanges, and should gatetypecheck,unit-app,unit-opencode, andunit-desktop.bun turbo test:ci, withcontinue-on-error: true. This job should also depend onchangesand skip docs-only changes. It must remain outside the blocking aggregate or be explicitly ignored by it, so Windows flakes are visible but do not block PRs ordev.devruns are not cancelled by later pushes, while PR runs still cancel stale attempts.packages/opencode/test/config/seed-e2e.test.ts: avoid the tight 5 second abort race and expose stdout/stderr when the seed process exits non-zero.packages/opencode/test/tool/edit.test.ts: make the bus event assertion deterministic instead of depending on a callback timing race.check.Non-goals
Acceptance criteria
ci / checkstill remains the single required aggregate status for code CI.changes.outputs.docs_only == 'true',ci / checkstill passes as it does today, and the new unit jobs do not run unnecessary tests for docs-only changes.