feat(opencode): observability for session export (abort provenance + title trace + release version)#563
Conversation
📝 WalkthroughWalkthroughAdds structured diagnostics for session aborts and title generation, threads InterruptMeta through Runner and SessionRunState, records diagnostics during prompt cancel/title flows, merges persisted diagnostics in the processor, aggregates them in Export snapshots, and injects OPENCODE_VERSION at build time with tests. ChangesInterrupt and Title Generation Diagnostics
Sequence Diagram(s)sequenceDiagram
participant Caller
participant SessionRunState
participant Runner
participant PromptLoop
Caller->>SessionRunState: cancel(sessionID, meta?)
SessionRunState->>Runner: cancelWith(meta)
Runner->>Runner: normalize & record InterruptMeta
Runner->>PromptLoop: resolveInterrupt(meta) / stop shell
PromptLoop->>Runner: onInterrupt(meta) (writes abort diagnostics)
Runner->>Caller: cancellation completed (onInterrupt result or Cancelled)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request enhances diagnostic capabilities by tracking session aborts and title generation events. It introduces an InterruptMeta interface, updates the Runner and SessionRunState to handle cancellation metadata, and extends the assistant message schema to store these diagnostics. Additionally, the export functionality now collects and includes abort and title generation summaries. Reviewers suggested optimizing performance by consolidating tree walks in the export logic and using targeted message lookups instead of fetching entire session histories to reduce overhead.
Astro-Han
left a comment
There was a problem hiding this comment.
Opus second-pass review — PR #563
Conditional pass. One P2 to address before merge, three P3 recommendations.
Strengths
- Commit split is clean: commit 1 is pure diagnostics surface (cancel metadata into
assistant.diagnostics.abort,ensureTitletrace intotitle_generation, top-level aggregation in export); commit 2 is the single-intent build fix for #560. runner.tschange is backward compatible:cancelstill exists (cancel = cancelWith()), so existing callsites stay green.ensureTitle()call timing is not changed —Effect.orDieis swapped forEffect.exitso success/failure paths can record traces, but the first-turn fork behavior matches the brief (diagnostic-only PR, no mitigation).- Commit 2 contract test (
test/script/build-node.test.ts) is the right lightweight gate: asserts the source containsOPENCODE_VERSION+Script.version, which is enough to lock the regression without standing up a heavier integration test. - Field naming is consistent (snake_case in export schema / camelCase in
InterruptMetainterface).
[P2] lastInterruptMeta race on concurrent cancels (packages/opencode/src/effect/runner.ts)
let lastInterruptMeta: InterruptMeta | undefined
const cancelWith = (meta) =>
SynchronizedRef.modify(ref, (st) => {
lastInterruptMeta = meta ? { ...meta, recordedAt: ... } : { recordedAt: ... }
switch (st._tag) { ... }
})Scenario:
cancel1(soft) callscancelWith(metaA)→modifywritesmetaAtolastInterruptMeta, returns the cancel effect.- Before
cancel1'sonInterruptreadslastInterruptMeta, a secondcancel2(hard) enters →SynchronizedRefserializes themodify, sometaBoverwritesmetaA. cancel1'sonInterruptreadsmetaBand attributes it to the cancel that actually originated frommetaA.
SynchronizedRef serializes the modify callback, but the modify only records meta + returns the effect; the actual fiber interrupt + Deferred.await happens after the modify returns, leaving a window for the second cancel to enter and overwrite meta.
Severity is "diagnostic false attribution" — the user-visible behavior is unaffected, but the next reproduction's abort metadata could be mis-attributed if cancels collide.
Suggested fix: bind meta to the cancel event rather than to the Runner closure. Options:
- Capture meta inside the state itself (e.g.
{ _tag: "Cancelling", meta }), and haveonInterruptread from state instead of a closure variable. - Pass meta as a parameter into the cancel effect, then thread it to
onInterruptdirectly. - Or, if you want to minimize churn, document the limitation with a comment: "best-effort recent meta; may be substituted under concurrent cancels".
[P3] recordTitleTrace polling for the assistant message (packages/opencode/src/session/prompt.ts)
for (let attempt = 0; attempt < 50; attempt++) {
const messages = yield* sessions.messages({ sessionID: input.session.id })
assistant = messages.find(m => m.info.role === "assistant" && m.info.parentID === firstInfo.id)
if (assistant) break
yield* Effect.sleep("10 millis")
}50 × 10ms = 500ms upper bound. If the assistant message has not been persisted within 500ms (e.g. title generation completes very fast or assistant message creation is delayed for any reason), the trace is silently dropped.
This is intentional best-effort resilience for a diagnostic surface, but please add a single log line on dropout (e.g. elog.warn("title trace dropped: assistant not found after 500ms")) so a missing trace is observable from logs rather than just absent from the export.
[P3] InterruptMeta.viaCtxAbort is always false in v1
The single cancel callsite explicitly sets viaCtxAbort: false. There is no path in this PR that sets it to true. If this field is reserved for future ctx.abort-driven cancels, please annotate that so the next reproduction reader does not mistake "always false" as evidence that the race cannot originate from ctx.abort. Otherwise, hook into ctx.abort paths and set it correctly in a follow-up.
[P3] Missing explicit title_generation_state correlation in abort diagnostic
The diagnostics record abort.recorded_at and title_generation.completed_at as separate fields. To answer the actual RCA question ("was ensureTitle still in flight when the main run was aborted?"), the reader has to compute the time delta between two structured fields.
Consider adding a snapshot field to abort:
abort: {
...
title_generation_state: "not_started" | "in_flight" | "completed_before_abort" | "completed_after_abort"
}
so the next reproduction directly shows the temporal relationship without manual joining. Not blocking; can be a follow-up.
Verdict
P2 should be addressed before merge (a few lines to bind meta to the cancel event or to add a documenting comment). The three P3 items can be folded into the same fixup commit or opened as follow-up issues — your call.
Once the P2 is resolved, this PR is ready for engineering final.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
packages/opencode/test/script/build-node.test.ts (1)
5-12: ⚡ Quick winConsider verifying build output or runtime behavior.
The test currently performs static source checks, confirming that
build-node.tscontains the expected define names and sources. While this serves as a contract test, it doesn't verify that:
- The build actually outputs these constants
- The constants are consumed correctly at runtime
- Exported metadata contains the actual release version (the root cause from issue
#560)A more robust test could verify the built output or, ideally, an end-to-end check that exported session metadata contains the expected version.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/opencode/test/script/build-node.test.ts` around lines 5 - 12, Update the test "build-node injects both release version and channel defines" to verify runtime/build output instead of only static source strings: build or import the compiled output of script/build-node.ts (or require the built module after invoking the build step used by your tooling), then assert that the runtime exports or session metadata include the actual OPENCODE_VERSION and OPENCODE_CHANNEL values and that Script.version and Script.channel reflect them; locate references to OPENCODE_VERSION, OPENCODE_CHANNEL, Script.version and Script.channel in the test and replace the fs.readFile assertions with assertions that inspect the built module's exported metadata or the runtime session object to confirm the real values.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/opencode/src/session/export.ts`:
- Around line 157-181: sanitizeDiagnostics currently misses redacting the newly
exported diagnostics error text fields; update sanitizeDiagnostics to also
redact diagnostics.aborts[].error_message and
diagnostics.title_generations[].error_message (and the duplicate handling around
the other occurrence at the later block) the same way other error fields are
sanitized — e.g., replace with undefined or a constant redaction placeholder —
so sanitized snapshots never include runtime error text; locate the sanitization
logic in sanitizeDiagnostics and add these two fields to the redact list for
both aborts and title_generations handling.
In `@packages/opencode/src/session/run-state.ts`:
- Around line 78-91: The interrupt metadata can be lost when a runner is removed
before interruptMeta() reads it; modify the cancel logic in
SessionRunState.cancel to persist the InterruptMeta outside the transient runner
(e.g., store it into a durable map on the InstanceState or embed it in the
status via status.set) before calling existing.cancelWith(meta), and change
SessionRunState.interruptMeta to first check the live runner
(data.runners.get(sessionID)) and then fall back to that persisted location (the
new data.interruptMetaMap or status entry) so the provenance survives runner
lifecycle races.
---
Nitpick comments:
In `@packages/opencode/test/script/build-node.test.ts`:
- Around line 5-12: Update the test "build-node injects both release version and
channel defines" to verify runtime/build output instead of only static source
strings: build or import the compiled output of script/build-node.ts (or require
the built module after invoking the build step used by your tooling), then
assert that the runtime exports or session metadata include the actual
OPENCODE_VERSION and OPENCODE_CHANNEL values and that Script.version and
Script.channel reflect them; locate references to OPENCODE_VERSION,
OPENCODE_CHANNEL, Script.version and Script.channel in the test and replace the
fs.readFile assertions with assertions that inspect the built module's exported
metadata or the runtime session object to confirm the real values.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 29f2e803-190d-4889-b9cb-acb7258749fb
📒 Files selected for processing (10)
packages/opencode/script/build-node.tspackages/opencode/src/effect/runner.tspackages/opencode/src/session/export.tspackages/opencode/src/session/message-v2.tspackages/opencode/src/session/processor.tspackages/opencode/src/session/prompt.tspackages/opencode/src/session/run-state.tspackages/opencode/test/script/build-node.test.tspackages/opencode/test/session/export.test.tspackages/opencode/test/session/prompt-effect.test.ts
Astro-Han
left a comment
There was a problem hiding this comment.
Engineering final review from @GPT-X.
Current PR head reviewed: 8204e0bec.
I found one blocking issue before merge:
P2: sanitized exports leak the new diagnostics error text fields
The PR adds top-level export diagnostics:
diagnostics.aborts[].error_messagediagnostics.title_generations[].error_message
However, Export.sanitizeSnapshot() currently delegates to sanitizeDiagnostics(), and sanitizeDiagnostics() only redacts diagnostics.loop.last.attemptedInput. It returns the rest of diagnostics unchanged. That means a sanitized session export can still contain raw runtime error text from abort provenance or title generation diagnostics.
This is a real privacy/sanitization boundary issue, not just a presentation nit. The rest of the sanitize path already redacts assistant errors, tool errors, tool inputs, metadata, paths, and diffs; these new diagnostic error text fields need the same treatment.
Suggested minimal fix:
- Update
sanitizeDiagnostics()to preserve the current loop redaction behavior while also mapping:aborts[].error_messagethroughredact("abort-error-message", ...)title_generations[].error_messagethroughredact("title-generation-error-message", ...)
- Add a regression test under
export.test.tsthat provessanitizeSnapshot()redacts both new fields.
The Opus P2 fix for per-run interrupt metadata is otherwise clean: the metadata is now bound to the run/shell handle via per-handle refs, and the new runner regression test covers the overwrite race.
Other notes, non-blocking:
- The
build-node.test.tsstatic source contract is acceptable for this PR boundary. A full build-output/runtime test would be heavier and is not necessary for the #560 one-line define regression. - The
deriveSnapshotDiagnostics()multiple walks andsessions.messages()polling comments are performance/ergonomics follow-ups, not merge blockers for this diagnostic PR. - The P3 diagnostics ergonomics items are already tracked in #564.
Local verification I ran:
bun --cwd packages/opencode test test/effect/runner.test.ts test/session/export.test.ts test/session/prompt-effect.test.ts test/script/build-node.test.ts --timeout 30000-> 112 passbun --cwd packages/opencode typecheck-> passgit diff --check dev...HEAD-> pass
Please fix the sanitize gap, rerun the targeted export tests/typecheck/diff-check, then ping me for a short re-final.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/opencode/src/session/prompt.ts`:
- Around line 365-395: The recordTitleTrace implementation (Effect.fn
"SessionPrompt.recordTitleTrace") races with message creation by polling only 50
times, so on slow turns it may never attach title_generation diagnostics to the
assistant message; change the flow to persist the trace durably first (e.g.,
save into the session or turn-level diagnostics via sessions.updateSession or a
dedicated traces store using sessionID/firstInfo.id) and then, when the
assistant message is found via sessions.messages, merge that persisted trace
into the assistant's diagnostics and remove/mark the persisted entry;
alternatively, delay attaching until the assistant exists by subscribing to
message creation events instead of fixed polling — update recordTitleTrace,
ensureTitle callers, and the sessions update/merge logic so the trace is
reliably stored and later attached to the assistant's diagnostics.
- Around line 1992-2021: The onInterrupt handler is updating whichever assistant
is globally last (via lastAssistant(input.sessionID)), which can annotate the
previous turn if the current turn's assistant scaffold hasn't been created yet;
instead, tie the abort to the active assistant for this run: record the current
run's assistant id when the assistant scaffold/parent is created (e.g., store
runAssistantId in the run scope) and in onInterrupt lookup and update that
specific assistant using sessions.updateMessage({ ... }) (or
sessions.getMessage(runAssistantId) before updating) rather than calling
lastAssistant(input.sessionID); ensure interruptedSessions.add(input.sessionID)
remains and return the updated assistant object fetched by id.
- Around line 451-454: The code records recordTitleTrace({ applied: true,
success: true }) unconditionally even when sessions.setTitle(...) failed; change
the flow so you only call recordTitleTrace with applied: true when
sessions.setTitle returns successfully. Specifically, run sessions.setTitle({
sessionID: input.session.id, title: t }) and check its success (handle the
Effect result or use Effect.fold/Effect.match/Effect.map/Effect.catch to detect
failure), log the error via elog.error on failure, and only invoke
recordTitleTrace({ completedAt, success: true, applied: true }) inside the
success branch — otherwise call recordTitleTrace with applied: false or success:
false as appropriate. Ensure the change touches the sessions.setTitle and
recordTitleTrace call sites shown.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 44bf5f82-560a-4d7b-903f-c42cada7cad5
📒 Files selected for processing (4)
packages/opencode/src/effect/runner.tspackages/opencode/src/session/prompt.tspackages/opencode/src/session/run-state.tspackages/opencode/test/effect/runner.test.ts
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/opencode/test/session/export.test.ts (1)
692-777: ⚡ Quick winUse an isolated
tmpdirfor this new integration-style test.This test writes session/message state while bound to shared
projectRoot; usingawait using+tmpdir({ git: true })will improve isolation and reduce cross-test coupling.Proposed adjustment
- test("collects abort and title generation diagnostics from assistant messages", async () => { - await Instance.provide({ - directory: projectRoot, + test("collects abort and title generation diagnostics from assistant messages", async () => { + await using project = await tmpdir({ git: true }) + await Instance.provide({ + directory: project.path, fn: async () => { const root = await SessionNs.create({ title: "diag summary" }) const userID = MessageID.make("msg_diag_user") const assistantID = MessageID.make("msg_diag_assistant") @@ - path: { cwd: projectRoot, root: projectRoot }, + path: { cwd: project.path, root: project.path },As per coding guidelines: "Use the
tmpdirfunction ... to create temporary directories for tests with automatic cleanup. Useawait usingsyntax to ensure automatic cleanup..."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/opencode/test/session/export.test.ts` around lines 692 - 777, The test "collects abort and title generation diagnostics from assistant messages" currently uses the shared projectRoot; change it to create an isolated temporary directory using await using tmpdir({ git: true }) and pass that tmpdir path into Instance.provide instead of projectRoot. Locate the test function and wrap the body with await using (const { path: tmp } = await tmpdir({ git: true })) or equivalent, update SessionNs.create/SessionNs.updateMessage calls that reference projectRoot to use the tmp path, and keep AppRuntime.runPromise(Export.session(root.id)) and the cleanup (SessionNs.remove) unchanged so the test runs in an isolated tmpdir with automatic cleanup.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@packages/opencode/test/session/export.test.ts`:
- Around line 692-777: The test "collects abort and title generation diagnostics
from assistant messages" currently uses the shared projectRoot; change it to
create an isolated temporary directory using await using tmpdir({ git: true })
and pass that tmpdir path into Instance.provide instead of projectRoot. Locate
the test function and wrap the body with await using (const { path: tmp } =
await tmpdir({ git: true })) or equivalent, update
SessionNs.create/SessionNs.updateMessage calls that reference projectRoot to use
the tmp path, and keep AppRuntime.runPromise(Export.session(root.id)) and the
cleanup (SessionNs.remove) unchanged so the test runs in an isolated tmpdir with
automatic cleanup.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 61bf58d3-1291-4ba2-8a47-06babe0981f3
📒 Files selected for processing (2)
packages/opencode/src/session/export.tspackages/opencode/test/session/export.test.ts
|
Engineering final re-review from @GPT-X. Reviewed head: The sanitization fix is correct:
Local verification I ran:
I also re-checked the latest CodeRabbit comments. I cannot give final approval yet because one remaining issue is still a real P2 diagnostic correctness risk. P2:
|
|
Engineering final re-review from @GPT-X. Reviewed head: Result: pass. I do not see any remaining P0/P1/P2 code blockers. What I re-checked:
Local verification I ran:
Non-blocking follow-ups / closeout notes:
Merge gate: wait for the remaining GitHub check ( |
Summary
Add session-export observability for abort provenance and background title generation, and inject the embedded server release version into the exported runtime metadata.
Why
We need the next exported session to answer a concrete question instead of another guess: who triggered the abort, where did it propagate, and was the first-turn background title stream active at the same time. Separately, issue #560 showed that exported sessions from release builds were still reporting
app_version: "local", which made user-side RCA less trustworthy.Related Issue
Closes #560.
Human Review Status
Pending. A human should make the final merge decision after reviewing the final diff and verification evidence.
Review Focus
SessionPrompt.cancel->SessionRunState/Runner-> prompt-looponInterrupt-> assistant diagnostics -> export summaries.diagnostics.abortsanddiagnostics.title_generations, plus assistant-level schema compatibility.packages/opencode/script/build-node.tsnow injectsOPENCODE_VERSIONalongsideOPENCODE_CHANNEL.Risk Notes
Low to medium. This PR is diagnostics-focused and does not change
ensureTitle()timing or cancellation behavior, but it does touch shared runner/session plumbing and the exported schema. The title-generation trace currently has export/schema coverage rather than a full end-to-end prompt harness assertion, to keep this diagnostics PR out of race-prone behavioral testing.How To Verify
Screenshots or Recordings
None. No visible UI change.
Checklist
dev, and my PR title and commit messages use Conventional Commits in EnglishSummary by CodeRabbit
New Features
Bug Fixes
Tests