Skip to content

[Feature] Add LLM stream diagnostics to local session export #214

@Astro-Han

Description

@Astro-Han

What task are you trying to do?

PawWork needs local session exports from #194 to include enough model/API diagnostics to explain failures like an assistant message finishing successfully with token usage but no visible text, tool result, reasoning part, or error. The export should let us distinguish whether the issue happened in the provider API response, the AI SDK stream conversion, or PawWork session persistence without publishing the conversation through the opencode share service.

What do you do today?

Today the local database and logs preserve session/message/part records plus aggregate token usage. In a real session, alibaba-coding-plan-cn/kimi-k2.5 produced finish=stop and output=110 tokens, but the assistant message only had step-start and step-finish parts. There was no saved raw API chunk, no stream event counts, no content versus reasoning_content evidence, and no explicit empty-completion marker. The only richer handoff path is the existing cloud share flow that #194 is replacing, and even that share only contains the already-persisted session parts, not raw LLM stream diagnostics.

What would a good result look like?

Record a lightweight, local, structured diagnostic summary for each assistant run and include it in the local session export from #194. The summary should include provider, model, finish reason, token usage, stream event type counts, whether any text-delta, reasoning-delta, tool call, tool result, or stream error was observed, and an explicit diagnostic flag for finish=stop with no user-visible output. The export should make this readable without requiring raw logs or cloud publishing.

Which audience does this matter to most?

Both

Extra context

This is a follow-up to #194, not a replacement for it. #194 defines the safer product path: local session export instead of publishing to opncd.ai. This issue defines extra diagnostic content that the export should be able to carry. It also belongs under the harness series #195 and is related to #133, but it is narrower than general loop detection: the focus here is LLM stream and empty-completion diagnosability.

Acceptance criteria

  • Assistant runs persist a lightweight local diagnostic summary that does not upload conversation content by default.
  • The diagnostic summary records provider/model, finish reason, token usage, and stream event type counts.
  • The summary records whether visible text, reasoning, tool calls, tool results, and stream errors were observed.
  • finish=stop with no visible assistant output is explicitly flagged as an empty completion.
  • The local session export planned in [Feature] Replace cloud session sharing with local session export #194 includes these diagnostics in a readable section.
  • Raw prompt text, raw API chunks, and full tool bodies are not recorded by default unless a separate explicit debug mode is introduced.

Non-goals

  • Do not reintroduce cloud session sharing.
  • Do not store full raw provider responses by default.
  • Do not fix every empty-completion behavior in this issue; a separate bug can add retry or user-visible fallback once this diagnostic layer identifies the failing boundary.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium priorityappApplication behavior and product flowsenhancementNew feature or requestharnessModel harness, prompts, tool descriptions, and session mechanics

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions