Skip to content

feat(opencode): observability for session export (abort provenance + title trace + release version)#563

Merged
Astro-Han merged 6 commits into
devfrom
slock/session-export-observability
May 12, 2026
Merged

feat(opencode): observability for session export (abort provenance + title trace + release version)#563
Astro-Han merged 6 commits into
devfrom
slock/session-export-observability

Conversation

@Astro-Han

@Astro-Han Astro-Han commented May 12, 2026

Copy link
Copy Markdown
Owner

Summary

Add session-export observability for abort provenance and background title generation, and inject the embedded server release version into the exported runtime metadata.

Why

We need the next exported session to answer a concrete question instead of another guess: who triggered the abort, where did it propagate, and was the first-turn background title stream active at the same time. Separately, issue #560 showed that exported sessions from release builds were still reporting app_version: "local", which made user-side RCA less trustworthy.

Related Issue

Closes #560.

Human Review Status

Pending. A human should make the final merge decision after reviewing the final diff and verification evidence.

Review Focus

  1. The cancel provenance path: SessionPrompt.cancel -> SessionRunState / Runner -> prompt-loop onInterrupt -> assistant diagnostics -> export summaries.
  2. The export surface: top-level diagnostics.aborts and diagnostics.title_generations, plus assistant-level schema compatibility.
  3. The embedded/node build fix: packages/opencode/script/build-node.ts now injects OPENCODE_VERSION alongside OPENCODE_CHANNEL.

Risk Notes

Low to medium. This PR is diagnostics-focused and does not change ensureTitle() timing or cancellation behavior, but it does touch shared runner/session plumbing and the exported schema. The title-generation trace currently has export/schema coverage rather than a full end-to-end prompt harness assertion, to keep this diagnostics PR out of race-prone behavioral testing.

How To Verify

Diff check: pass
Typecheck: pass
Focused tests: 114 passed across runner, prompt-effect, export, and build-node contract tests
- bun --cwd packages/opencode test test/effect/runner.test.ts test/session/export.test.ts test/session/prompt-effect.test.ts test/script/build-node.test.ts --timeout 30000
- bun --cwd packages/opencode typecheck
- git diff --check

Screenshots or Recordings

None. No visible UI change.

Checklist

  • Human review status is stated above as pending, approved, or not required
  • I linked the related issue, or stated why there is no issue
  • This PR has type, primary area, and priority labels, or I requested maintainer labeling
  • I described the review focus and any meaningful risks
  • I listed the relevant verification steps and the key result for each
  • I did not introduce unrelated refactors, dependencies, generated files, or file changes beyond the stated scope
  • I manually checked visible UI or copy changes when needed, with screenshots or recordings
  • I considered macOS and Windows impact for platform, packaging, updater, signing, paths, shell, or permissions changes
  • I called out docs, release notes, dependencies, permissions, credentials, deletion behavior, generated content, or local file changes when relevant
  • I reviewed the final diff for unrelated changes and suspicious dependency changes
  • I am targeting dev, and my PR title and commit messages use Conventional Commits in English

Summary by CodeRabbit

  • New Features

    • Sessions now record and propagate rich interrupt metadata and annotate assistant messages with structured abort and title-generation diagnostics.
    • Build output now embeds the application version and channel constants.
  • Bug Fixes

    • Persisted assistant diagnostics are merged when finalizing messages; exported error details are redacted during sanitization.
  • Tests

    • Added tests for interrupt metadata propagation, diagnostics collection/redaction, and build output injection.

Review Change Stack

@Astro-Han Astro-Han added enhancement New feature or request harness Model harness, prompts, tool descriptions, and session mechanics P2 Medium priority labels May 12, 2026
@coderabbitai

coderabbitai Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

Adds structured diagnostics for session aborts and title generation, threads InterruptMeta through Runner and SessionRunState, records diagnostics during prompt cancel/title flows, merges persisted diagnostics in the processor, aggregates them in Export snapshots, and injects OPENCODE_VERSION at build time with tests.

Changes

Interrupt and Title Generation Diagnostics

Layer / File(s) Summary
Interrupt metadata infrastructure
packages/opencode/src/effect/runner.ts
Introduce InterruptMeta, thread interruptMeta refs through run/shell handles, add cancelWith(meta?) and expose cancel as alias.
SessionRunState wiring
packages/opencode/src/session/run-state.ts
Import InterruptMeta, forward cancel(meta?) to runner.cancelWith, and update ensureRunning/startShell onInterrupt signatures to accept optional metadata.
Message diagnostic schema
packages/opencode/src/session/message-v2.ts
Extend AssistantMessage.diagnostics with optional abort and title_generation sub-objects containing structured fields.
Processor persisted diagnostics merge
packages/opencode/src/session/processor.ts
When finalizing assistant messages, re-fetch persisted assistant message and merge its existing diagnostics into the in-flight message before adding llm_trace.
Export diagnostics collectors
packages/opencode/src/session/export.ts
Extend Export.Snapshot["diagnostics"] and deriveSnapshotDiagnostics to compute and include optional aborts and title_generations arrays via new collectors; update sanitizeDiagnostics to redact error_message.
Prompt capture and title generation
packages/opencode/src/session/prompt.ts
cancel() passes structured interrupt payload to state; ensureTitle() records title-generation timing and traces; loop() onInterrupt now accepts metadata and writes structured abort diagnostics; adjust shell cancellation callback.
Build and tests
packages/opencode/script/build-node.ts, packages/opencode/test/script/build-node.test.ts, packages/opencode/test/session/export.test.ts, packages/opencode/test/session/prompt-effect.test.ts, packages/opencode/test/effect/runner.test.ts
Inject OPENCODE_VERSION into Bun build defines; add/adjust tests to verify build injection, abort/title export collection, prompt abort diagnostics shape, and cancelWith metadata propagation.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant SessionRunState
  participant Runner
  participant PromptLoop

  Caller->>SessionRunState: cancel(sessionID, meta?)
  SessionRunState->>Runner: cancelWith(meta)
  Runner->>Runner: normalize & record InterruptMeta
  Runner->>PromptLoop: resolveInterrupt(meta) / stop shell
  PromptLoop->>Runner: onInterrupt(meta) (writes abort diagnostics)
  Runner->>Caller: cancellation completed (onInterrupt result or Cancelled)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Possibly related PRs

Poem

🐰 I hopped through code at break of dawn,

I marked the cancel, I logged the time upon,
Titles and aborts tucked in neat,
Exports now carry every heartbeat,
Version stamped — a tidy little song.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning The PR addresses issue #560 by injecting OPENCODE_VERSION into the build, enabling exported metadata to reflect actual release versions instead of 'local'. However, the PR review notes a blocking issue: onInterrupt can attach abort diagnostics to the previous assistant when a cancel arrives before the new assistant scaffold exists, undermining the PR's abort-provenance observability goal. Add regression test confirming abort diagnostics attach to the current assistant, not previous ones; implement the suggested fix to tie abort writes to the current run's assistant via parentID matching or message ID capture.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main changes: observability additions for session export (abort provenance, title trace) and release version injection.
Out of Scope Changes check ✅ Passed All changes are in-scope: abort/title-generation diagnostics collection/export, runner interrupt-metadata propagation, build version injection, and corresponding test coverage. No unrelated refactors or dependencies detected.
Description check ✅ Passed The PR description is comprehensive, structured per template, and addresses all required sections including Summary, Why, Related Issue, Review Focus, Risk Notes, verification steps, and completed checklist.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch slock/session-export-observability

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances diagnostic capabilities by tracking session aborts and title generation events. It introduces an InterruptMeta interface, updates the Runner and SessionRunState to handle cancellation metadata, and extends the assistant message schema to store these diagnostics. Additionally, the export functionality now collects and includes abort and title generation summaries. Reviewers suggested optimizing performance by consolidating tree walks in the export logic and using targeted message lookups instead of fetching entire session histories to reduce overhead.

Comment thread packages/opencode/src/session/export.ts
Comment thread packages/opencode/src/session/processor.ts
Comment thread packages/opencode/src/session/prompt.ts

@Astro-Han Astro-Han left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opus second-pass review — PR #563

Conditional pass. One P2 to address before merge, three P3 recommendations.

Strengths

  • Commit split is clean: commit 1 is pure diagnostics surface (cancel metadata into assistant.diagnostics.abort, ensureTitle trace into title_generation, top-level aggregation in export); commit 2 is the single-intent build fix for #560.
  • runner.ts change is backward compatible: cancel still exists (cancel = cancelWith()), so existing callsites stay green.
  • ensureTitle() call timing is not changed — Effect.orDie is swapped for Effect.exit so success/failure paths can record traces, but the first-turn fork behavior matches the brief (diagnostic-only PR, no mitigation).
  • Commit 2 contract test (test/script/build-node.test.ts) is the right lightweight gate: asserts the source contains OPENCODE_VERSION + Script.version, which is enough to lock the regression without standing up a heavier integration test.
  • Field naming is consistent (snake_case in export schema / camelCase in InterruptMeta interface).

[P2] lastInterruptMeta race on concurrent cancels (packages/opencode/src/effect/runner.ts)

let lastInterruptMeta: InterruptMeta | undefined

const cancelWith = (meta) =>
  SynchronizedRef.modify(ref, (st) => {
    lastInterruptMeta = meta ? { ...meta, recordedAt: ... } : { recordedAt: ... }
    switch (st._tag) { ... }
  })

Scenario:

  1. cancel1 (soft) calls cancelWith(metaA)modify writes metaA to lastInterruptMeta, returns the cancel effect.
  2. Before cancel1's onInterrupt reads lastInterruptMeta, a second cancel2 (hard) enters → SynchronizedRef serializes the modify, so metaB overwrites metaA.
  3. cancel1's onInterrupt reads metaB and attributes it to the cancel that actually originated from metaA.

SynchronizedRef serializes the modify callback, but the modify only records meta + returns the effect; the actual fiber interrupt + Deferred.await happens after the modify returns, leaving a window for the second cancel to enter and overwrite meta.

Severity is "diagnostic false attribution" — the user-visible behavior is unaffected, but the next reproduction's abort metadata could be mis-attributed if cancels collide.

Suggested fix: bind meta to the cancel event rather than to the Runner closure. Options:

  • Capture meta inside the state itself (e.g. { _tag: "Cancelling", meta }), and have onInterrupt read from state instead of a closure variable.
  • Pass meta as a parameter into the cancel effect, then thread it to onInterrupt directly.
  • Or, if you want to minimize churn, document the limitation with a comment: "best-effort recent meta; may be substituted under concurrent cancels".

[P3] recordTitleTrace polling for the assistant message (packages/opencode/src/session/prompt.ts)

for (let attempt = 0; attempt < 50; attempt++) {
  const messages = yield* sessions.messages({ sessionID: input.session.id })
  assistant = messages.find(m => m.info.role === "assistant" && m.info.parentID === firstInfo.id)
  if (assistant) break
  yield* Effect.sleep("10 millis")
}

50 × 10ms = 500ms upper bound. If the assistant message has not been persisted within 500ms (e.g. title generation completes very fast or assistant message creation is delayed for any reason), the trace is silently dropped.

This is intentional best-effort resilience for a diagnostic surface, but please add a single log line on dropout (e.g. elog.warn("title trace dropped: assistant not found after 500ms")) so a missing trace is observable from logs rather than just absent from the export.

[P3] InterruptMeta.viaCtxAbort is always false in v1

The single cancel callsite explicitly sets viaCtxAbort: false. There is no path in this PR that sets it to true. If this field is reserved for future ctx.abort-driven cancels, please annotate that so the next reproduction reader does not mistake "always false" as evidence that the race cannot originate from ctx.abort. Otherwise, hook into ctx.abort paths and set it correctly in a follow-up.

[P3] Missing explicit title_generation_state correlation in abort diagnostic

The diagnostics record abort.recorded_at and title_generation.completed_at as separate fields. To answer the actual RCA question ("was ensureTitle still in flight when the main run was aborted?"), the reader has to compute the time delta between two structured fields.

Consider adding a snapshot field to abort:

abort: {
  ...
  title_generation_state: "not_started" | "in_flight" | "completed_before_abort" | "completed_after_abort"
}

so the next reproduction directly shows the temporal relationship without manual joining. Not blocking; can be a follow-up.

Verdict

P2 should be addressed before merge (a few lines to bind meta to the cancel event or to add a documenting comment). The three P3 items can be folded into the same fixup commit or opened as follow-up issues — your call.

Once the P2 is resolved, this PR is ready for engineering final.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
packages/opencode/test/script/build-node.test.ts (1)

5-12: ⚡ Quick win

Consider verifying build output or runtime behavior.

The test currently performs static source checks, confirming that build-node.ts contains the expected define names and sources. While this serves as a contract test, it doesn't verify that:

  • The build actually outputs these constants
  • The constants are consumed correctly at runtime
  • Exported metadata contains the actual release version (the root cause from issue #560)

A more robust test could verify the built output or, ideally, an end-to-end check that exported session metadata contains the expected version.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opencode/test/script/build-node.test.ts` around lines 5 - 12, Update
the test "build-node injects both release version and channel defines" to verify
runtime/build output instead of only static source strings: build or import the
compiled output of script/build-node.ts (or require the built module after
invoking the build step used by your tooling), then assert that the runtime
exports or session metadata include the actual OPENCODE_VERSION and
OPENCODE_CHANNEL values and that Script.version and Script.channel reflect them;
locate references to OPENCODE_VERSION, OPENCODE_CHANNEL, Script.version and
Script.channel in the test and replace the fs.readFile assertions with
assertions that inspect the built module's exported metadata or the runtime
session object to confirm the real values.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/opencode/src/session/export.ts`:
- Around line 157-181: sanitizeDiagnostics currently misses redacting the newly
exported diagnostics error text fields; update sanitizeDiagnostics to also
redact diagnostics.aborts[].error_message and
diagnostics.title_generations[].error_message (and the duplicate handling around
the other occurrence at the later block) the same way other error fields are
sanitized — e.g., replace with undefined or a constant redaction placeholder —
so sanitized snapshots never include runtime error text; locate the sanitization
logic in sanitizeDiagnostics and add these two fields to the redact list for
both aborts and title_generations handling.

In `@packages/opencode/src/session/run-state.ts`:
- Around line 78-91: The interrupt metadata can be lost when a runner is removed
before interruptMeta() reads it; modify the cancel logic in
SessionRunState.cancel to persist the InterruptMeta outside the transient runner
(e.g., store it into a durable map on the InstanceState or embed it in the
status via status.set) before calling existing.cancelWith(meta), and change
SessionRunState.interruptMeta to first check the live runner
(data.runners.get(sessionID)) and then fall back to that persisted location (the
new data.interruptMetaMap or status entry) so the provenance survives runner
lifecycle races.

---

Nitpick comments:
In `@packages/opencode/test/script/build-node.test.ts`:
- Around line 5-12: Update the test "build-node injects both release version and
channel defines" to verify runtime/build output instead of only static source
strings: build or import the compiled output of script/build-node.ts (or require
the built module after invoking the build step used by your tooling), then
assert that the runtime exports or session metadata include the actual
OPENCODE_VERSION and OPENCODE_CHANNEL values and that Script.version and
Script.channel reflect them; locate references to OPENCODE_VERSION,
OPENCODE_CHANNEL, Script.version and Script.channel in the test and replace the
fs.readFile assertions with assertions that inspect the built module's exported
metadata or the runtime session object to confirm the real values.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 29f2e803-190d-4889-b9cb-acb7258749fb

📥 Commits

Reviewing files that changed from the base of the PR and between 26416f1 and 5735988.

📒 Files selected for processing (10)
  • packages/opencode/script/build-node.ts
  • packages/opencode/src/effect/runner.ts
  • packages/opencode/src/session/export.ts
  • packages/opencode/src/session/message-v2.ts
  • packages/opencode/src/session/processor.ts
  • packages/opencode/src/session/prompt.ts
  • packages/opencode/src/session/run-state.ts
  • packages/opencode/test/script/build-node.test.ts
  • packages/opencode/test/session/export.test.ts
  • packages/opencode/test/session/prompt-effect.test.ts

Comment thread packages/opencode/src/session/export.ts
Comment thread packages/opencode/src/session/run-state.ts Outdated

@Astro-Han Astro-Han left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Engineering final review from @GPT-X.

Current PR head reviewed: 8204e0bec.

I found one blocking issue before merge:

P2: sanitized exports leak the new diagnostics error text fields

The PR adds top-level export diagnostics:

  • diagnostics.aborts[].error_message
  • diagnostics.title_generations[].error_message

However, Export.sanitizeSnapshot() currently delegates to sanitizeDiagnostics(), and sanitizeDiagnostics() only redacts diagnostics.loop.last.attemptedInput. It returns the rest of diagnostics unchanged. That means a sanitized session export can still contain raw runtime error text from abort provenance or title generation diagnostics.

This is a real privacy/sanitization boundary issue, not just a presentation nit. The rest of the sanitize path already redacts assistant errors, tool errors, tool inputs, metadata, paths, and diffs; these new diagnostic error text fields need the same treatment.

Suggested minimal fix:

  • Update sanitizeDiagnostics() to preserve the current loop redaction behavior while also mapping:
    • aborts[].error_message through redact("abort-error-message", ...)
    • title_generations[].error_message through redact("title-generation-error-message", ...)
  • Add a regression test under export.test.ts that proves sanitizeSnapshot() redacts both new fields.

The Opus P2 fix for per-run interrupt metadata is otherwise clean: the metadata is now bound to the run/shell handle via per-handle refs, and the new runner regression test covers the overwrite race.

Other notes, non-blocking:

  • The build-node.test.ts static source contract is acceptable for this PR boundary. A full build-output/runtime test would be heavier and is not necessary for the #560 one-line define regression.
  • The deriveSnapshotDiagnostics() multiple walks and sessions.messages() polling comments are performance/ergonomics follow-ups, not merge blockers for this diagnostic PR.
  • The P3 diagnostics ergonomics items are already tracked in #564.

Local verification I ran:

  • bun --cwd packages/opencode test test/effect/runner.test.ts test/session/export.test.ts test/session/prompt-effect.test.ts test/script/build-node.test.ts --timeout 30000 -> 112 pass
  • bun --cwd packages/opencode typecheck -> pass
  • git diff --check dev...HEAD -> pass

Please fix the sanitize gap, rerun the targeted export tests/typecheck/diff-check, then ping me for a short re-final.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/opencode/src/session/prompt.ts`:
- Around line 365-395: The recordTitleTrace implementation (Effect.fn
"SessionPrompt.recordTitleTrace") races with message creation by polling only 50
times, so on slow turns it may never attach title_generation diagnostics to the
assistant message; change the flow to persist the trace durably first (e.g.,
save into the session or turn-level diagnostics via sessions.updateSession or a
dedicated traces store using sessionID/firstInfo.id) and then, when the
assistant message is found via sessions.messages, merge that persisted trace
into the assistant's diagnostics and remove/mark the persisted entry;
alternatively, delay attaching until the assistant exists by subscribing to
message creation events instead of fixed polling — update recordTitleTrace,
ensureTitle callers, and the sessions update/merge logic so the trace is
reliably stored and later attached to the assistant's diagnostics.
- Around line 1992-2021: The onInterrupt handler is updating whichever assistant
is globally last (via lastAssistant(input.sessionID)), which can annotate the
previous turn if the current turn's assistant scaffold hasn't been created yet;
instead, tie the abort to the active assistant for this run: record the current
run's assistant id when the assistant scaffold/parent is created (e.g., store
runAssistantId in the run scope) and in onInterrupt lookup and update that
specific assistant using sessions.updateMessage({ ... }) (or
sessions.getMessage(runAssistantId) before updating) rather than calling
lastAssistant(input.sessionID); ensure interruptedSessions.add(input.sessionID)
remains and return the updated assistant object fetched by id.
- Around line 451-454: The code records recordTitleTrace({ applied: true,
success: true }) unconditionally even when sessions.setTitle(...) failed; change
the flow so you only call recordTitleTrace with applied: true when
sessions.setTitle returns successfully. Specifically, run sessions.setTitle({
sessionID: input.session.id, title: t }) and check its success (handle the
Effect result or use Effect.fold/Effect.match/Effect.map/Effect.catch to detect
failure), log the error via elog.error on failure, and only invoke
recordTitleTrace({ completedAt, success: true, applied: true }) inside the
success branch — otherwise call recordTitleTrace with applied: false or success:
false as appropriate. Ensure the change touches the sessions.setTitle and
recordTitleTrace call sites shown.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 44bf5f82-560a-4d7b-903f-c42cada7cad5

📥 Commits

Reviewing files that changed from the base of the PR and between 5735988 and 8204e0b.

📒 Files selected for processing (4)
  • packages/opencode/src/effect/runner.ts
  • packages/opencode/src/session/prompt.ts
  • packages/opencode/src/session/run-state.ts
  • packages/opencode/test/effect/runner.test.ts

Comment thread packages/opencode/src/session/prompt.ts
Comment thread packages/opencode/src/session/prompt.ts
Comment thread packages/opencode/src/session/prompt.ts

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/opencode/test/session/export.test.ts (1)

692-777: ⚡ Quick win

Use an isolated tmpdir for this new integration-style test.

This test writes session/message state while bound to shared projectRoot; using await using + tmpdir({ git: true }) will improve isolation and reduce cross-test coupling.

Proposed adjustment
-  test("collects abort and title generation diagnostics from assistant messages", async () => {
-    await Instance.provide({
-      directory: projectRoot,
+  test("collects abort and title generation diagnostics from assistant messages", async () => {
+    await using project = await tmpdir({ git: true })
+    await Instance.provide({
+      directory: project.path,
       fn: async () => {
         const root = await SessionNs.create({ title: "diag summary" })
         const userID = MessageID.make("msg_diag_user")
         const assistantID = MessageID.make("msg_diag_assistant")
@@
-            path: { cwd: projectRoot, root: projectRoot },
+            path: { cwd: project.path, root: project.path },

As per coding guidelines: "Use the tmpdir function ... to create temporary directories for tests with automatic cleanup. Use await using syntax to ensure automatic cleanup..."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opencode/test/session/export.test.ts` around lines 692 - 777, The
test "collects abort and title generation diagnostics from assistant messages"
currently uses the shared projectRoot; change it to create an isolated temporary
directory using await using tmpdir({ git: true }) and pass that tmpdir path into
Instance.provide instead of projectRoot. Locate the test function and wrap the
body with await using (const { path: tmp } = await tmpdir({ git: true })) or
equivalent, update SessionNs.create/SessionNs.updateMessage calls that reference
projectRoot to use the tmp path, and keep
AppRuntime.runPromise(Export.session(root.id)) and the cleanup
(SessionNs.remove) unchanged so the test runs in an isolated tmpdir with
automatic cleanup.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@packages/opencode/test/session/export.test.ts`:
- Around line 692-777: The test "collects abort and title generation diagnostics
from assistant messages" currently uses the shared projectRoot; change it to
create an isolated temporary directory using await using tmpdir({ git: true })
and pass that tmpdir path into Instance.provide instead of projectRoot. Locate
the test function and wrap the body with await using (const { path: tmp } =
await tmpdir({ git: true })) or equivalent, update
SessionNs.create/SessionNs.updateMessage calls that reference projectRoot to use
the tmp path, and keep AppRuntime.runPromise(Export.session(root.id)) and the
cleanup (SessionNs.remove) unchanged so the test runs in an isolated tmpdir with
automatic cleanup.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 61bf58d3-1291-4ba2-8a47-06babe0981f3

📥 Commits

Reviewing files that changed from the base of the PR and between 8204e0b and e69921c.

📒 Files selected for processing (2)
  • packages/opencode/src/session/export.ts
  • packages/opencode/test/session/export.test.ts

@Astro-Han

Copy link
Copy Markdown
Owner Author

Engineering final re-review from @GPT-X.

Reviewed head: e69921c95.

The sanitization fix is correct:

  • sanitizeDiagnostics() now redacts diagnostics.aborts[].error_message.
  • sanitizeDiagnostics() now redacts diagnostics.title_generations[].error_message.
  • The new regression test proves both fields are redacted.

Local verification I ran:

  • bun --cwd packages/opencode test test/session/export.test.ts -t "sanitizeSnapshot redacts abort and title generation error messages" --timeout 30000 -> pass
  • bun --cwd packages/opencode test test/effect/runner.test.ts test/session/export.test.ts test/session/prompt-effect.test.ts test/script/build-node.test.ts --timeout 30000 -> 113 pass
  • bun --cwd packages/opencode typecheck -> pass
  • git diff --check dev...HEAD -> pass

I also re-checked the latest CodeRabbit comments. I cannot give final approval yet because one remaining issue is still a real P2 diagnostic correctness risk.

P2: onInterrupt can write abort diagnostics onto the previous assistant turn

SessionPrompt.loop currently handles interrupt provenance by calling lastAssistant(input.sessionID) inside onInterrupt:

const assistant = yield* lastAssistant(input.sessionID)
if (assistant.info.role === "assistant") {
  yield* sessions.updateMessage({
    ...assistant.info,
    diagnostics: {
      ...(assistant.info.diagnostics ?? {}),
      abort: { ... }
    }
  })
}

This is unsafe when a cancel lands before the current turn's assistant scaffold is created. In that window, the session already has the new user message, but lastAssistant(input.sessionID) can still return the previous assistant turn. The export would then show an abort on the previous assistant, which is false provenance.

This matters because PR #563 is specifically meant to make the next reproduction actionable. A wrong abort target is worse than a missing abort diagnostic: it can send the next RCA toward the wrong turn.

Suggested minimal fix:

  • Tie the abort write to the current run's assistant, not the session-global last assistant.
  • A small acceptable v1 shape is: find the latest current user message, then update only an assistant whose parentID matches that user's id. If no such assistant exists yet, skip writing abort diagnostics instead of annotating a previous assistant.
  • A cleaner shape is to capture the assistant id when the scaffold is created and have onInterrupt update that exact message id.

Please add a regression test for the false-attribution case: a session with a previous assistant is interrupted before the new turn's assistant scaffold exists, and the previous assistant must not receive the new abort diagnostics.

Non-blocking notes

  • The 500 ms recordTitleTrace polling issue is a P3/follow-up item, already aligned with the diagnostics ergonomics follow-up direction.
  • The setTitle() failure path currently records applied: true even if title persistence fails. This is also a diagnostic correctness nit. It is small enough to fold into the prompt fix if convenient, but I would not block the PR on it alone.
  • The static build-node.test.ts contract remains acceptable for this PR boundary; a heavier build-output/runtime test is not required here.

Verdict: sanitize gap is fixed, but final approval is still blocked on the onInterrupt previous-turn false attribution risk.

@Astro-Han

Copy link
Copy Markdown
Owner Author

Engineering final re-review from @GPT-X.

Reviewed head: 5829e496b.

Result: pass. I do not see any remaining P0/P1/P2 code blockers.

What I re-checked:

  • The previous-turn false-attribution risk is fixed. onInterrupt no longer writes abort diagnostics through session-global lastAssistant(sessionID). It now targets the current turn by matching the latest user to an assistant with that parentID, and if the current assistant scaffold does not exist yet it avoids writing the abort diagnostic instead of annotating the previous assistant.
  • The regression test covers the important failure window: previous assistant exists, the new turn is cancelled before its assistant scaffold is persisted, and the previous assistant does not receive the new abort diagnostics.
  • The typecheck-only follow-up is acceptable. The test now uses an explicit test-local Mutable<typeof sessions> cast around the monkey patch and restores the original service method in a finalizer; runtime code is unchanged.
  • The earlier sanitize issue remains fixed: sanitized exports redact the new abort/title-generation error_message fields.
  • The earlier per-run interrupt metadata race remains fixed by per-handle metadata refs.

Local verification I ran:

  • bun --cwd packages/opencode test test/session/prompt-effect.test.ts -t "cancel before current assistant scaffold does not attach abort diagnostics to the previous assistant" --timeout 30000 -> pass
  • bun --cwd packages/opencode test test/effect/runner.test.ts test/session/export.test.ts test/session/prompt-effect.test.ts test/script/build-node.test.ts --timeout 30000 -> 114 pass
  • bun --cwd packages/opencode typecheck -> pass
  • git diff --check dev...HEAD -> pass

Non-blocking follow-ups / closeout notes:

  • The recordTitleTrace 500 ms polling reliability concern stays non-blocking for this diagnostic PR and can remain in the diagnostics ergonomics follow-up.
  • The setTitle() failure path still can report applied: true; that is a diagnostic precision nit, not a merge blocker. Please either add it to the existing follow-up issue or reply to the bot thread with that scope decision before merge.
  • Gemini's tree-walk / lookup performance comments are non-blocking optimization follow-ups.
  • The older run-state lifecycle thread is outdated by the per-run metadata fix and can be resolved after reply.

Merge gate: wait for the remaining GitHub check (unit-opencode) to finish green, then reply/resolve the non-blocking bot threads before merge.

@Astro-Han Astro-Han merged commit 6501ab6 into dev May 12, 2026
23 checks passed
@Astro-Han Astro-Han deleted the slock/session-export-observability branch May 12, 2026 04:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request harness Model harness, prompts, tool descriptions, and session mechanics P2 Medium priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Session export metadata shows app_version=local instead of actual release version

1 participant