Symptom
tool execution produces non-empty session diff (snapshot race) flakes on Windows CI.
- File:
packages/opencode/test/session/snapshot-tool-race.test.ts:246
- Seen: Attempt 1 of PR #8 validation run, 3516ms
Context
Post PR #8 residual flake. The test polls SessionSummary.diff() 50 times × 100ms (= 5s total budget) expecting a non-empty diff after a tool execution, because summarize() is explicitly fire-and-forget.
// Poll for diff — summarize() is fire-and-forget
let diff: Awaited<ReturnType<typeof SessionSummary.diff>> = []
for (let i = 0; i < 50; i++) {
diff = yield* Effect.promise(() => SessionSummary.diff({ sessionID: session.id }))
if (diff.length > 0) break
yield* Effect.sleep("100 millis")
}
expect(diff.length).toBeGreaterThan(0)
Root cause to investigate
5s poll budget is apparently not enough on Windows for summarize() to finish its async write-back. Either:
- Increase poll count/budget (simple, but masking slow Windows behavior)
- Remove the fire-and-forget pattern — have
summarize() return a promise the test can await (design fix; tightens semantics; may leak into production behavior if summarize() is meant to be non-blocking for users)
- Add a test-only hook that signals when summarization completes
Option 2 is the right long-term fix if fire-and-forget was only for CLI ergonomics, not a load-bearing product decision.
Symptom
tool execution produces non-empty session diff (snapshot race)flakes on Windows CI.packages/opencode/test/session/snapshot-tool-race.test.ts:246Context
Post PR #8 residual flake. The test polls
SessionSummary.diff()50 times × 100ms (= 5s total budget) expecting a non-empty diff after a tool execution, becausesummarize()is explicitly fire-and-forget.Root cause to investigate
5s poll budget is apparently not enough on Windows for
summarize()to finish its async write-back. Either:summarize()return a promise the test canawait(design fix; tightens semantics; may leak into production behavior ifsummarize()is meant to be non-blocking for users)Option 2 is the right long-term fix if fire-and-forget was only for CLI ergonomics, not a load-bearing product decision.