feat(session): add llm stream failure diagnostics by Astro-Han · Pull Request #771 · Astro-Han/pawwork

Astro-Han · 2026-05-19T15:23:51Z

Summary

Add nested LLM stream diagnostics to assistant traces so failed stream turns preserve safe, structured evidence about the surfaced failure boundary.

Changes include:

stream diagnostics under existing llm_trace.stream with watchdog, timeline, provider, error, and legacy v1 counter semantics;
safe-at-capture error fingerprinting and reviewed-key-only provider correlation;
watchdog/silent-stream recording without changing stream behavior;
overlap classification that prevents watchdog-triggered aborts from being misread as local aborts;
dual-path export sanitization for both top-level diagnostics.llm_traces and session-tree assistant diagnostics.

Why

#754 showed raw terminated after provider progress, while #755 showed pre-progress connect timeout. Existing v1 trace counters could distinguish those symptoms but could not show whether the failure surfaced at the watchdog, local abort, SDK/transport reader, provider stream, or unknown boundary.

Related Issue

Refs #760.
Refs #754, #755, #721.

This PR is diagnostic-only and should not close #754/#755/#721 by itself.

Human Review Status

Pending

Review Focus

Please review against the #760 V6 checklist, especially:

no raw prompt/tool args/headers/body/URLs/local paths/full stacks are persisted in stream diagnostics;
export sanitizer covers both top-level and session-tree copies;
watchdog + abort overlap classifies as watchdog/high only when the watchdog error fired;
post-progress silent-stream timeout remains a non-error success-drain path;
provider correlation only reads reviewed safe keys.

Risk Notes

No visible UI/copy changes. No platform/packaging surface. No docs, dependencies, migrations, generated files, permissions, credentials, deletion behavior, or release-note changes are included.

Skipped conditional checklist items:

Visible UI/manual check: not applicable; diagnostics-only backend/session trace change.
Platform impact check: not applicable; no platform, packaging, shell, permissions, signing, updater, or path behavior changed.
Docs/release/dependencies/etc.: not applicable; no PR-scoped docs, release notes, dependencies, or generated artifacts changed.

How To Verify

Targeted tests: PASS — 63 tests across test/session/llm-trace.test.ts, test/session/export.test.ts, and test/session/llm.test.ts.
Typecheck: PASS — bun run typecheck in packages/opencode completed with tsgo --noEmit and no errors.
Diff check: PASS — git diff --check completed with no whitespace errors.

Screenshots or Recordings

Not applicable; no visible UI changes.

Checklist

How to use this checklist:

Tick a box by replacing [ ] with [x]. Do not edit, add, or remove items.

The bot-applied label items can only be honestly ticked AFTER the PR is opened and the labeler / priority-triage bots have run — return to the PR description and tick them then.

Most items are required. The few that are conditional are explicitly marked (conditional); for those, leave unticked if they truly do not apply and explain why in Risk Notes. All other items must be ticked before requesting human review.

Summary by CodeRabbit

New Features
- Richer LLM stream diagnostics: lifecycle events, abort/watchdog handling, provider progress, and monotonic timing
- Stream failure classification with confidence/evidence and safe provider-correlation extraction
- Snapshot export sanitization now redacts sensitive stream-level fields and embedded per-message traces
Tests
- Expanded tests for sanitization, boundary classification, provider correlation, and timeline/duration robustness

coderabbitai · 2026-05-19T15:24:07Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: cd9ccc4e-cb0e-4d5b-8319-94b37644ebde

📥 Commits

Reviewing files that changed from the base of the PR and between dde73c0 and 3250996.

📒 Files selected for processing (8)

packages/opencode/src/session/export.ts
packages/opencode/src/session/llm-trace/recorder.ts
packages/opencode/src/session/llm-trace/stream-diagnostics.ts
packages/opencode/src/session/llm-trace/types.ts
packages/opencode/src/session/llm.ts
packages/opencode/src/session/processor.ts
packages/opencode/test/session/export.test.ts
packages/opencode/test/session/llm-trace.test.ts

🚧 Files skipped from review as they are similar to previous changes (7)

packages/opencode/src/session/processor.ts
packages/opencode/src/session/llm-trace/types.ts
packages/opencode/src/session/llm-trace/stream-diagnostics.ts
packages/opencode/test/session/export.test.ts
packages/opencode/src/session/llm-trace/recorder.ts
packages/opencode/src/session/export.ts
packages/opencode/src/session/llm.ts

📝 Walkthrough

Walkthrough

This PR implements end-to-end LLM stream failure diagnostics by adding boundary classification utilities, safe error/correlation fingerprinting, stream lifecycle tracking in the LLMTrace recorder, integration into stream timeout/error handling, export snapshot sanitization, and comprehensive test coverage.

Changes

LLM Stream Failure Diagnostics

Layer / File(s)	Summary
Stream diagnostics types and Recorder contract `packages/opencode/src/session/llm-trace/types.ts`	Introduces `StreamDiagnostics` type with nested attempt, timeline, watchdog, abort, error, and provider structures; extends `Summary` schema with optional `stream` field; expands `Recorder` type with stream lifecycle recording methods.
Boundary classification and diagnostic sanitization `packages/opencode/src/session/llm-trace/stream-diagnostics.ts`	Implements `classifyBoundary` to classify stream failure boundaries from evidence flags, `safeProviderCorrelation` to extract and sanitize provider metadata (IDs, status, safe headers only), `safeErrorFingerprint` to redact sensitive error details, and supporting helpers for header filtering, identifier constraining, and message/stack redaction of URLs, tokens, secrets, and file paths.
LLMTrace namespace exports `packages/opencode/src/session/llm-trace/index.ts`	Exports `classifyBoundary` and `safeProviderCorrelation` as public LLMTrace APIs alongside existing tracing helpers.
Recorder stream lifecycle and diagnostics tracking `packages/opencode/src/session/llm-trace/recorder.ts`	Extends `createRecorder` to initialize and track stream diagnostics state on `beginStream`, record monotonic start timing, update timeline/duration/watchdog/error fields on stream events, classify failure boundaries, fingerprint errors, and sanitize provider correlation on finalize.
Stream handling integration with trace hooks `packages/opencode/src/session/llm.ts`	Expands `StreamInput.trace` to include stream lifecycle hooks; calls `trace.beginStream` with timeout metadata; records watchdog firing and classified stream failure on connect-timeout; wraps iterator `next()` with error/completion tracing; records provider progress transitions.
Provider error event recording in processor `packages/opencode/src/session/processor.ts`	Records provider error events in stream error handler, capturing error, optional metadata, and wall-clock/monotonic timestamps before re-throwing; records abort provenance on interrupt.
Export snapshot sanitization for stream diagnostics `packages/opencode/src/session/export.ts`	Sanitizes LLM trace stream diagnostics in both message-level `info.diagnostics.llm_trace` and snapshot-level `diagnostics.llm_traces` by redacting error fingerprints and sanitizing provider correlation fields via new local helpers.
Test coverage for stream diagnostics `packages/opencode/test/session/llm-trace.test.ts`, `packages/opencode/test/session/llm.test.ts`, `packages/opencode/test/session/export.test.ts`	Comprehensive tests for boundary classification, error/correlation sanitization, provider error event recording with sensitive data masking, monotonic duration clamping, watchdog behavior in timeout scenarios, and export snapshot sanitization.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

[Task] Thread local abort provenance into LLM stream diagnostics #773: Implements abort-provenance capture and local-abort classification (recordAbortState and local abort boundary refresh).
[Task] Add LLM stream failure diagnostics v2 to session exports #760: Implements LLM stream failure diagnostics v2 (types, classify/sanitize helpers, recorder methods, llm/processor wiring, export sanitization, tests).
[Feature] Add LLM stream diagnostics to local session export #214: Adds LLM stream diagnostics inclusion and sanitization for local session exports.

Possibly related PRs

Astro-Han/pawwork#558: Overlaps on packages/opencode/src/session/llm.ts stream timeout/fail-on-timeout machinery.
Astro-Han/pawwork#234: Related changes to Export.sanitizeSnapshot export/sanitization pipeline.
Astro-Han/pawwork#729: Touches connect-timeout/watchdog flow in llm.ts, overlapping timeout instrumentation.

Suggested labels

enhancement

Poem

🐰 Stream diagnostics bloom so bright,
Boundaries traced in classified light,
Fingerprints tidy, headers pruned,
Watchdogs timed and timelines tuned,
Secrets tucked away—hoppity delight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat(session): add llm stream failure diagnostics' directly and specifically describes the main change: adding LLM stream failure diagnostics to the session module.
Description check	✅ Passed	The PR description is well-structured, covers all required template sections with concrete details, includes linked issues, describes review focus and risks, and provides specific verification results.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch pawwork/llm-stream-diagnostics-v6-pr760-dev2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions

Suggested priority: P2 (includes non-doc, non-test paths outside the low-risk bucket).

P1/P0 are reserved for maintainer confirmation. Please relabel manually if this is a release blocker, security issue, data-loss risk, or updater/runtime failure.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

packages/opencode/src/session/llm-trace/types.ts (1)
160-160: 💤 Low value

Consider using a stricter schema for the stream field.

Using z.any().optional() bypasses Zod validation entirely. Given that StreamDiagnostics is a well-defined type, you could create a corresponding Zod schema (or use z.unknown().optional() with a type assertion) to catch malformed data at parse time rather than silently accepting anything.

If this is intentional to allow schema evolution without breaking older parsers, a comment explaining the rationale would help.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opencode/src/session/llm-trace/types.ts` at line 160, The `stream`
field currently uses `z.any().optional()` which skips validation; replace it
with a stricter schema that matches the defined `StreamDiagnostics` shape
(create a `StreamDiagnosticsSchema` and use `stream:
StreamDiagnosticsSchema.optional()`), or at minimum use `z.unknown().optional()`
if you want validation-lite plus a TypeScript assertion; if the permissive
`z.any()` was intentional for schema evolution, add a brief inline comment on
the `stream` field explaining that rationale. Ensure you reference the existing
`StreamDiagnostics` type when creating the Zod schema so parsing catches
malformed data at parse time.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/opencode/src/session/export.ts`:
- Around line 1019-1031: The export sanitizer currently spreads rawError into
safeError (via "...rawError") which preserves any unexpected/sensitive keys from
stream.error; change the construction of safeError in export.ts so it does NOT
spread rawError — instead build safeError solely from the output of
safeErrorFingerprint(rawError) (or explicitly pick only approved fields) and
pass that through compactObject, ensuring no arbitrary keys from rawError are
forwarded; keep the existing rawProvider -> safeProviderCorrelation logic
unchanged.

In `@packages/opencode/src/session/llm-trace/stream-diagnostics.ts`:
- Around line 135-144: The safeMessage function is missing Linux home directory
redaction; update the function (safeMessage) to also replace Linux home paths by
adding a regex like /\/home\/[^\s)]+/g with "[redacted:path]" alongside the
existing /Users/ and \\Users\\ replacements, keeping the same truncation
behavior (slice 0,1024) and placement near the other path redaction lines so
Linux usernames are scrubbed the same way.

---

Nitpick comments:
In `@packages/opencode/src/session/llm-trace/types.ts`:
- Line 160: The `stream` field currently uses `z.any().optional()` which skips
validation; replace it with a stricter schema that matches the defined
`StreamDiagnostics` shape (create a `StreamDiagnosticsSchema` and use `stream:
StreamDiagnosticsSchema.optional()`), or at minimum use `z.unknown().optional()`
if you want validation-lite plus a TypeScript assertion; if the permissive
`z.any()` was intentional for schema evolution, add a brief inline comment on
the `stream` field explaining that rationale. Ensure you reference the existing
`StreamDiagnostics` type when creating the Zod schema so parsing catches
malformed data at parse time.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: eb00b8b3-65d9-4df4-8cf7-92902506dd92

📥 Commits

Reviewing files that changed from the base of the PR and between bb7f938 and dde73c0.

📒 Files selected for processing (10)

packages/opencode/src/session/export.ts
packages/opencode/src/session/llm-trace/index.ts
packages/opencode/src/session/llm-trace/recorder.ts
packages/opencode/src/session/llm-trace/stream-diagnostics.ts
packages/opencode/src/session/llm-trace/types.ts
packages/opencode/src/session/llm.ts
packages/opencode/src/session/processor.ts
packages/opencode/test/session/export.test.ts
packages/opencode/test/session/llm-trace.test.ts
packages/opencode/test/session/llm.test.ts

gemini-code-assist

Code Review

This pull request implements a robust LLM stream diagnostics and tracing framework, featuring failure classification, monotonic timing, and a sanitization layer to redact sensitive information like URLs, secrets, and local paths. The review feedback suggests several enhancements: expanding the path redaction regex to support Linux home directories, removing redundant logic in error fingerprinting, refining type signatures for improved accuracy, and implementing guards to prevent overwriting high-confidence error states in the trace recorder.

Astro-Han · 2026-05-19T16:04:08Z

Handled the P2 follow-ups as follows:

Local abort provenance: verified the current gap (recordIteratorError still has no provenance source). Deferred out of this diagnostic PR and tracked in [Task] Thread local abort provenance into LLM stream diagnostics #773 with the requested local-abort / watchdog-overlap / missing-provenance verification scope.
Retry / attempt semantics: fixed in 5e54139 by labeling v1 stream counters as aggregate while nested stream remains the terminal attempt snapshot. Added a recorder regression test for aggregate v1 counters plus terminal nested stream state.
Stream export sanitizer whitelist: fixed in 5e54139 by rebuilding exported stream from explicit stream-level fields instead of spreading arbitrary top-level stream keys. Added malicious top-level stream fixture coverage for both top-level diagnostics and session-tree diagnostics.

Local verification after the fixes:

bun test test/session/llm-trace.test.ts test/session/export.test.ts test/session/llm.test.ts --timeout 30000
bun run typecheck
git diff --check

feat(session): add llm stream failure diagnostics

dde73c0

Astro-Han added P2 Medium priority harness Model harness, prompts, tool descriptions, and session mechanics task Narrow execution, audit, spike, migration, tracking, or upstream follow-up work labels May 19, 2026

github-actions Bot removed the task Narrow execution, audit, spike, migration, tracking, or upstream follow-up work label May 19, 2026

github-actions Bot reviewed May 19, 2026

View reviewed changes

coderabbitai Bot reviewed May 19, 2026

View reviewed changes

Comment thread packages/opencode/src/session/export.ts

Comment thread packages/opencode/src/session/llm-trace/stream-diagnostics.ts

gemini-code-assist Bot reviewed May 19, 2026

View reviewed changes

Astro-Han added the enhancement New feature or request label May 19, 2026

fix(session): harden llm stream diagnostics sanitization

8cd3c2b

Astro-Han mentioned this pull request May 19, 2026

[Task] Thread local abort provenance into LLM stream diagnostics #773

Closed

fix(session): clarify stream diagnostic export boundaries

5e54139

fix(session): thread abort provenance into stream diagnostics

3250996

Astro-Han merged commit c4692b1 into dev May 19, 2026
27 checks passed

Astro-Han deleted the pawwork/llm-stream-diagnostics-v6-pr760-dev2 branch May 19, 2026 16:43

This was referenced May 20, 2026

[Task] Add LLM stream failure diagnostics v2 to session exports #760

Closed

[Task] Design run-level LLM stream diagnostics for recurring terminated failures #783

Closed

feat(session): add run observability diagnostics #788

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(session): add llm stream failure diagnostics#771

feat(session): add llm stream failure diagnostics#771
Astro-Han merged 4 commits into
devfrom
pawwork/llm-stream-diagnostics-v6-pr760-dev2

Astro-Han commented May 19, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 19, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Astro-Han commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Astro-Han commented May 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Related Issue

Human Review Status

Review Focus

Risk Notes

How To Verify

Screenshots or Recordings

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Astro-Han commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Astro-Han commented May 19, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 19, 2026 •

edited

Loading