Skip to content

[Bug] Improve CI stability and split package unit jobs #121

@Astro-Han

Description

@Astro-Han

Problem

PawWork CI is currently too flaky. Recent dev runs often fail in ci / unit even when the triggering commit does not touch the failing code path. The current unit job also runs all package tests through one large bun turbo test:ci command, so one flaky package makes the whole required CI gate look broken and slows down diagnosis.

Evidence

Recent failure on commit 935ab331ffc835c0066ee8377891ea905192c97c was triggered by ci / unit, while typecheck, desktop-smoke, and codeql passed. The commit only bumped versions from 0.2.4 to 0.2.5, but the full unit cache miss exposed two flaky tests in packages/opencode: seed e2e script > exits cleanly after creating the seeded session and tool.edit > editing existing files > emits change event for existing files.

Earlier recent failures had different causes, including app test import resolution errors around @solidjs/router. That pattern suggests the CI surface has multiple unstable points rather than one deterministic product regression.

Scope

First pass should keep the existing required check name ci / check and avoid a full release or branch-protection redesign. Split the Linux unit gate by package, add Windows unit as a non-blocking signal, fix only the two currently known flaky tests, preserve the existing docs-only behavior, and adjust CI concurrency so dev runs do not cancel each other into misleading red or cancelled history.

Proposed changes

  • Split the current unit job into unit-app, unit-opencode, and unit-desktop. Each Linux package unit job should preserve Turbo dependency semantics, for example bun turbo test:ci --filter=@opencode-ai/app, bun turbo test:ci --filter=opencode, and bun turbo test:ci --filter=@opencode-ai/desktop-electron, unless the implementation preserves the same dependency builds another way.
  • Publish package-specific JUnit reports and artifacts for each Linux unit job.
  • Keep the existing changes docs-only filter. ci / check should continue to depend on changes, and should gate typecheck, unit-app, unit-opencode, and unit-desktop.
  • Add a Windows unit signal job that runs the full package unit command on Windows, likely bun turbo test:ci, with continue-on-error: true. This job should also depend on changes and skip docs-only changes. It must remain outside the blocking aggregate or be explicitly ignored by it, so Windows flakes are visible but do not block PRs or dev.
  • Change CI concurrency so dev runs are not cancelled by later pushes, while PR runs still cancel stale attempts.
  • Fix the known flaky tests:
    • packages/opencode/test/config/seed-e2e.test.ts: avoid the tight 5 second abort race and expose stdout/stderr when the seed process exits non-zero.
    • packages/opencode/test/tool/edit.test.ts: make the bus event assertion deterministic instead of depending on a callback timing race.
  • Update workflow contract tests so future CI edits cannot forget to pin actions, disable checkout credential persistence, upload JUnit reports, preserve docs-only behavior, preserve Turbo-filtered package unit semantics, or include required unit jobs in the aggregate check.

Non-goals

  • Do not redesign release workflows.
  • Do not make Windows unit blocking in this pass.
  • Do not move e2e into the required CI gate in this pass.
  • Do not broadly rewrite all watcher, timeout, env, or tmp-dir tests. Fix only the known flaky tests unless a direct blocker appears during implementation.

Acceptance criteria

  • Linux package unit jobs are separately visible in GitHub Actions.
  • Linux package unit jobs preserve Turbo dependency build semantics.
  • ci / check still remains the single required aggregate status for code CI.
  • Docs-only behavior is preserved: when changes.outputs.docs_only == 'true', ci / check still passes as it does today, and the new unit jobs do not run unnecessary tests for docs-only changes.
  • Windows unit test signal runs but does not block merge.
  • The two known flaky tests are deterministic locally and in CI.
  • Workflow contract tests cover the new CI shape.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High prioritybugSomething isn't workingciContinuous integration / GitHub Actionsflaky-testNon-deterministic test failure

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions