Skip to content

feat(session): add llm stream failure diagnostics#771

Merged
Astro-Han merged 4 commits into
devfrom
pawwork/llm-stream-diagnostics-v6-pr760-dev2
May 19, 2026
Merged

feat(session): add llm stream failure diagnostics#771
Astro-Han merged 4 commits into
devfrom
pawwork/llm-stream-diagnostics-v6-pr760-dev2

Conversation

@Astro-Han

@Astro-Han Astro-Han commented May 19, 2026

Copy link
Copy Markdown
Owner

Summary

Add nested LLM stream diagnostics to assistant traces so failed stream turns preserve safe, structured evidence about the surfaced failure boundary.

Changes include:

  • stream diagnostics under existing llm_trace.stream with watchdog, timeline, provider, error, and legacy v1 counter semantics;
  • safe-at-capture error fingerprinting and reviewed-key-only provider correlation;
  • watchdog/silent-stream recording without changing stream behavior;
  • overlap classification that prevents watchdog-triggered aborts from being misread as local aborts;
  • dual-path export sanitization for both top-level diagnostics.llm_traces and session-tree assistant diagnostics.

Why

#754 showed raw terminated after provider progress, while #755 showed pre-progress connect timeout. Existing v1 trace counters could distinguish those symptoms but could not show whether the failure surfaced at the watchdog, local abort, SDK/transport reader, provider stream, or unknown boundary.

Related Issue

Refs #760.
Refs #754, #755, #721.

This PR is diagnostic-only and should not close #754/#755/#721 by itself.

Human Review Status

Pending

Review Focus

Please review against the #760 V6 checklist, especially:

  • no raw prompt/tool args/headers/body/URLs/local paths/full stacks are persisted in stream diagnostics;
  • export sanitizer covers both top-level and session-tree copies;
  • watchdog + abort overlap classifies as watchdog/high only when the watchdog error fired;
  • post-progress silent-stream timeout remains a non-error success-drain path;
  • provider correlation only reads reviewed safe keys.

Risk Notes

No visible UI/copy changes. No platform/packaging surface. No docs, dependencies, migrations, generated files, permissions, credentials, deletion behavior, or release-note changes are included.

Skipped conditional checklist items:

  • Visible UI/manual check: not applicable; diagnostics-only backend/session trace change.
  • Platform impact check: not applicable; no platform, packaging, shell, permissions, signing, updater, or path behavior changed.
  • Docs/release/dependencies/etc.: not applicable; no PR-scoped docs, release notes, dependencies, or generated artifacts changed.

How To Verify

Targeted tests: PASS — 63 tests across test/session/llm-trace.test.ts, test/session/export.test.ts, and test/session/llm.test.ts.
Typecheck: PASS — bun run typecheck in packages/opencode completed with tsgo --noEmit and no errors.
Diff check: PASS — git diff --check completed with no whitespace errors.

Screenshots or Recordings

Not applicable; no visible UI changes.

Checklist

How to use this checklist:

  • Tick a box by replacing [ ] with [x]. Do not edit, add, or remove items.
  • The bot-applied label items can only be honestly ticked AFTER the PR is opened and the labeler / priority-triage bots have run — return to the PR description and tick them then.
  • Most items are required. The few that are conditional are explicitly marked (conditional); for those, leave unticked if they truly do not apply and explain why in Risk Notes. All other items must be ticked before requesting human review.
  • Type label — this PR carries exactly one of bug, enhancement, task, documentation. Type labels are author-added; the labeler bot does NOT assign them. Add the label in the GitHub UI, then tick this.
  • Routing labels — this PR carries at least one of app, ui, platform, harness, ci. The labeler bot assigns these on PR open based on changed paths. Confirm the bot's choice (or override if wrong), then tick this.
  • Priority label — this PR carries exactly one of P0, P1, P2, P3. The priority-triage bot suggests one on PR open. Confirm or override, then tick this.
  • Human Review Status above is set to Pending, Approved by @<reviewer>, or Not required: <reason> (default is Pending; "not required" is restricted to bot-authored low-risk PRs).
  • I linked the related issue, or stated in Summary why there is no issue.
  • I described the review focus and any meaningful risks.
  • I replaced the example block in How To Verify with the real verification steps and the key result for each.
  • I did not introduce unrelated refactors, dependencies, generated files, or file changes beyond the stated scope.
  • (conditional) I manually checked visible UI or copy changes when needed, with screenshots or recordings. Leave unticked only if no visible UI or copy changed.
  • (conditional) I considered macOS and Windows impact for platform, packaging, updater, signing, paths, shell, or permissions changes. Leave unticked only if no platform/packaging surface was touched.
  • (conditional) I called out docs, release notes, dependencies, permissions, credentials, deletion behavior, generated content, or local file changes when relevant. Leave unticked only if none of those surfaces was touched.
  • I reviewed the final diff for unrelated changes and suspicious dependency changes.
  • I am targeting dev, and my PR title and commit messages use Conventional Commits in English.

Summary by CodeRabbit

  • New Features

    • Richer LLM stream diagnostics: lifecycle events, abort/watchdog handling, provider progress, and monotonic timing
    • Stream failure classification with confidence/evidence and safe provider-correlation extraction
    • Snapshot export sanitization now redacts sensitive stream-level fields and embedded per-message traces
  • Tests

    • Expanded tests for sanitization, boundary classification, provider correlation, and timeline/duration robustness

Review Change Stack

@Astro-Han Astro-Han added P2 Medium priority harness Model harness, prompts, tool descriptions, and session mechanics task Narrow execution, audit, spike, migration, tracking, or upstream follow-up work labels May 19, 2026
@github-actions github-actions Bot removed the task Narrow execution, audit, spike, migration, tracking, or upstream follow-up work label May 19, 2026
@coderabbitai

coderabbitai Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: cd9ccc4e-cb0e-4d5b-8319-94b37644ebde

📥 Commits

Reviewing files that changed from the base of the PR and between dde73c0 and 3250996.

📒 Files selected for processing (8)
  • packages/opencode/src/session/export.ts
  • packages/opencode/src/session/llm-trace/recorder.ts
  • packages/opencode/src/session/llm-trace/stream-diagnostics.ts
  • packages/opencode/src/session/llm-trace/types.ts
  • packages/opencode/src/session/llm.ts
  • packages/opencode/src/session/processor.ts
  • packages/opencode/test/session/export.test.ts
  • packages/opencode/test/session/llm-trace.test.ts
🚧 Files skipped from review as they are similar to previous changes (7)
  • packages/opencode/src/session/processor.ts
  • packages/opencode/src/session/llm-trace/types.ts
  • packages/opencode/src/session/llm-trace/stream-diagnostics.ts
  • packages/opencode/test/session/export.test.ts
  • packages/opencode/src/session/llm-trace/recorder.ts
  • packages/opencode/src/session/export.ts
  • packages/opencode/src/session/llm.ts

📝 Walkthrough

Walkthrough

This PR implements end-to-end LLM stream failure diagnostics by adding boundary classification utilities, safe error/correlation fingerprinting, stream lifecycle tracking in the LLMTrace recorder, integration into stream timeout/error handling, export snapshot sanitization, and comprehensive test coverage.

Changes

LLM Stream Failure Diagnostics

Layer / File(s) Summary
Stream diagnostics types and Recorder contract
packages/opencode/src/session/llm-trace/types.ts
Introduces StreamDiagnostics type with nested attempt, timeline, watchdog, abort, error, and provider structures; extends Summary schema with optional stream field; expands Recorder type with stream lifecycle recording methods.
Boundary classification and diagnostic sanitization
packages/opencode/src/session/llm-trace/stream-diagnostics.ts
Implements classifyBoundary to classify stream failure boundaries from evidence flags, safeProviderCorrelation to extract and sanitize provider metadata (IDs, status, safe headers only), safeErrorFingerprint to redact sensitive error details, and supporting helpers for header filtering, identifier constraining, and message/stack redaction of URLs, tokens, secrets, and file paths.
LLMTrace namespace exports
packages/opencode/src/session/llm-trace/index.ts
Exports classifyBoundary and safeProviderCorrelation as public LLMTrace APIs alongside existing tracing helpers.
Recorder stream lifecycle and diagnostics tracking
packages/opencode/src/session/llm-trace/recorder.ts
Extends createRecorder to initialize and track stream diagnostics state on beginStream, record monotonic start timing, update timeline/duration/watchdog/error fields on stream events, classify failure boundaries, fingerprint errors, and sanitize provider correlation on finalize.
Stream handling integration with trace hooks
packages/opencode/src/session/llm.ts
Expands StreamInput.trace to include stream lifecycle hooks; calls trace.beginStream with timeout metadata; records watchdog firing and classified stream failure on connect-timeout; wraps iterator next() with error/completion tracing; records provider progress transitions.
Provider error event recording in processor
packages/opencode/src/session/processor.ts
Records provider error events in stream error handler, capturing error, optional metadata, and wall-clock/monotonic timestamps before re-throwing; records abort provenance on interrupt.
Export snapshot sanitization for stream diagnostics
packages/opencode/src/session/export.ts
Sanitizes LLM trace stream diagnostics in both message-level info.diagnostics.llm_trace and snapshot-level diagnostics.llm_traces by redacting error fingerprints and sanitizing provider correlation fields via new local helpers.
Test coverage for stream diagnostics
packages/opencode/test/session/llm-trace.test.ts, packages/opencode/test/session/llm.test.ts, packages/opencode/test/session/export.test.ts
Comprehensive tests for boundary classification, error/correlation sanitization, provider error event recording with sensitive data masking, monotonic duration clamping, watchdog behavior in timeout scenarios, and export snapshot sanitization.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Possibly related PRs

  • Astro-Han/pawwork#558: Overlaps on packages/opencode/src/session/llm.ts stream timeout/fail-on-timeout machinery.
  • Astro-Han/pawwork#234: Related changes to Export.sanitizeSnapshot export/sanitization pipeline.
  • Astro-Han/pawwork#729: Touches connect-timeout/watchdog flow in llm.ts, overlapping timeout instrumentation.

Suggested labels

enhancement

Poem

🐰 Stream diagnostics bloom so bright,
Boundaries traced in classified light,
Fingerprints tidy, headers pruned,
Watchdogs timed and timelines tuned,
Secrets tucked away—hoppity delight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(session): add llm stream failure diagnostics' directly and specifically describes the main change: adding LLM stream failure diagnostics to the session module.
Description check ✅ Passed The PR description is well-structured, covers all required template sections with concrete details, includes linked issues, describes review focus and risks, and provides specific verification results.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch pawwork/llm-stream-diagnostics-v6-pr760-dev2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested priority: P2 (includes non-doc, non-test paths outside the low-risk bucket).

P1/P0 are reserved for maintainer confirmation. Please relabel manually if this is a release blocker, security issue, data-loss risk, or updater/runtime failure.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
packages/opencode/src/session/llm-trace/types.ts (1)

160-160: 💤 Low value

Consider using a stricter schema for the stream field.

Using z.any().optional() bypasses Zod validation entirely. Given that StreamDiagnostics is a well-defined type, you could create a corresponding Zod schema (or use z.unknown().optional() with a type assertion) to catch malformed data at parse time rather than silently accepting anything.

If this is intentional to allow schema evolution without breaking older parsers, a comment explaining the rationale would help.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opencode/src/session/llm-trace/types.ts` at line 160, The `stream`
field currently uses `z.any().optional()` which skips validation; replace it
with a stricter schema that matches the defined `StreamDiagnostics` shape
(create a `StreamDiagnosticsSchema` and use `stream:
StreamDiagnosticsSchema.optional()`), or at minimum use `z.unknown().optional()`
if you want validation-lite plus a TypeScript assertion; if the permissive
`z.any()` was intentional for schema evolution, add a brief inline comment on
the `stream` field explaining that rationale. Ensure you reference the existing
`StreamDiagnostics` type when creating the Zod schema so parsing catches
malformed data at parse time.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/opencode/src/session/export.ts`:
- Around line 1019-1031: The export sanitizer currently spreads rawError into
safeError (via "...rawError") which preserves any unexpected/sensitive keys from
stream.error; change the construction of safeError in export.ts so it does NOT
spread rawError — instead build safeError solely from the output of
safeErrorFingerprint(rawError) (or explicitly pick only approved fields) and
pass that through compactObject, ensuring no arbitrary keys from rawError are
forwarded; keep the existing rawProvider -> safeProviderCorrelation logic
unchanged.

In `@packages/opencode/src/session/llm-trace/stream-diagnostics.ts`:
- Around line 135-144: The safeMessage function is missing Linux home directory
redaction; update the function (safeMessage) to also replace Linux home paths by
adding a regex like /\/home\/[^\s)]+/g with "[redacted:path]" alongside the
existing /Users/ and \\Users\\ replacements, keeping the same truncation
behavior (slice 0,1024) and placement near the other path redaction lines so
Linux usernames are scrubbed the same way.

---

Nitpick comments:
In `@packages/opencode/src/session/llm-trace/types.ts`:
- Line 160: The `stream` field currently uses `z.any().optional()` which skips
validation; replace it with a stricter schema that matches the defined
`StreamDiagnostics` shape (create a `StreamDiagnosticsSchema` and use `stream:
StreamDiagnosticsSchema.optional()`), or at minimum use `z.unknown().optional()`
if you want validation-lite plus a TypeScript assertion; if the permissive
`z.any()` was intentional for schema evolution, add a brief inline comment on
the `stream` field explaining that rationale. Ensure you reference the existing
`StreamDiagnostics` type when creating the Zod schema so parsing catches
malformed data at parse time.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: eb00b8b3-65d9-4df4-8cf7-92902506dd92

📥 Commits

Reviewing files that changed from the base of the PR and between bb7f938 and dde73c0.

📒 Files selected for processing (10)
  • packages/opencode/src/session/export.ts
  • packages/opencode/src/session/llm-trace/index.ts
  • packages/opencode/src/session/llm-trace/recorder.ts
  • packages/opencode/src/session/llm-trace/stream-diagnostics.ts
  • packages/opencode/src/session/llm-trace/types.ts
  • packages/opencode/src/session/llm.ts
  • packages/opencode/src/session/processor.ts
  • packages/opencode/test/session/export.test.ts
  • packages/opencode/test/session/llm-trace.test.ts
  • packages/opencode/test/session/llm.test.ts

Comment thread packages/opencode/src/session/export.ts
Comment thread packages/opencode/src/session/llm-trace/stream-diagnostics.ts

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a robust LLM stream diagnostics and tracing framework, featuring failure classification, monotonic timing, and a sanitization layer to redact sensitive information like URLs, secrets, and local paths. The review feedback suggests several enhancements: expanding the path redaction regex to support Linux home directories, removing redundant logic in error fingerprinting, refining type signatures for improved accuracy, and implementing guards to prevent overwriting high-confidence error states in the trace recorder.

Comment thread packages/opencode/src/session/llm-trace/stream-diagnostics.ts
Comment thread packages/opencode/src/session/llm-trace/stream-diagnostics.ts Outdated
Comment thread packages/opencode/src/session/llm-trace/stream-diagnostics.ts Outdated
Comment thread packages/opencode/src/session/llm-trace/recorder.ts
@Astro-Han Astro-Han added the enhancement New feature or request label May 19, 2026
@Astro-Han

Copy link
Copy Markdown
Owner Author

Handled the P2 follow-ups as follows:

  • Local abort provenance: verified the current gap (recordIteratorError still has no provenance source). Deferred out of this diagnostic PR and tracked in [Task] Thread local abort provenance into LLM stream diagnostics #773 with the requested local-abort / watchdog-overlap / missing-provenance verification scope.
  • Retry / attempt semantics: fixed in 5e54139 by labeling v1 stream counters as aggregate while nested stream remains the terminal attempt snapshot. Added a recorder regression test for aggregate v1 counters plus terminal nested stream state.
  • Stream export sanitizer whitelist: fixed in 5e54139 by rebuilding exported stream from explicit stream-level fields instead of spreading arbitrary top-level stream keys. Added malicious top-level stream fixture coverage for both top-level diagnostics and session-tree diagnostics.

Local verification after the fixes:

  • bun test test/session/llm-trace.test.ts test/session/export.test.ts test/session/llm.test.ts --timeout 30000
  • bun run typecheck
  • git diff --check

@Astro-Han Astro-Han merged commit c4692b1 into dev May 19, 2026
27 checks passed
@Astro-Han Astro-Han deleted the pawwork/llm-stream-diagnostics-v6-pr760-dev2 branch May 19, 2026 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request harness Model harness, prompts, tool descriptions, and session mechanics P2 Medium priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Raw terminated from upstream stream close leaks to assistant message without translation

1 participant