Goal
Session exports should include enough LLM trace diagnostics to explain model-output pipeline failures without requiring ad hoc dev builds, SQLite inspection, or HTTPS proxy capture.
When a model appears to expose hidden reasoning, drop output, misroute tool calls, or stream malformed content, maintainers should be able to inspect the exported session and see where the behavior changed shape:
- request/model configuration
- AI SDK normalized stream events
- PawWork session processor parts
- final stored message/token metadata
Scope
In scope:
- Design a lightweight LLM trace summary captured at runtime and included in session export JSON.
- Correlate trace records by
trace_id, session_id, and message_id.
- Capture provider/model identity, selected key provider options, stream event type counts, finish reason, token usage, and final stored part-type counts.
- Include enough detail to distinguish
text-delta from reasoning-delta and final text parts from reasoning parts.
- Keep diagnostics safe by default: no API keys, no full prompts, no full model output unless an explicit local debug mode is enabled.
- Define export schema/versioning and retention behavior for recent or relevant trace records.
Out of scope for the first pass:
- Fixing any specific Kimi, OpenAI-compatible, or AI SDK provider behavior.
- Adding a full packet capture or MITM proxy flow.
- Persisting complete raw provider responses by default.
- Changing frontend rendering of reasoning blocks.
Relevant files or context
Likely files:
packages/opencode/src/session/llm.ts
packages/opencode/src/session/processor.ts
packages/opencode/src/session/export.ts
packages/opencode/src/provider/transform.ts
- OpenAI-compatible provider adapter code, if raw chunk field-name summaries are included later
Recent motivating case:
- A Kimi K2.6 session appeared to emit thinking as visible natural language. The existing export showed
tokens.reasoning = 0 and only stored text parts, but it could not show whether the upstream stream used content, reasoning_content, reasoning_text, or whether the AI SDK normalized the content into text-delta.
Suggested trace shape for the design to evaluate:
{
"message_id": "msg_xxx",
"session_id": "ses_xxx",
"trace_id": "msg_xxx",
"provider": "kimi-for-coding",
"model": "k2p6",
"request": {
"streaming": true,
"tool_count": 12,
"reasoning_capability": true,
"interleaved_field": null
},
"stream_events": {
"text_delta": 42,
"reasoning_delta": 0,
"tool_call": 3,
"finish_reason": "tool-calls"
},
"stored_parts": {
"text": 2,
"reasoning": 0,
"tool": 3
},
"tokens": {
"output": 290,
"reasoning": 0
}
}
A later optional layer could record OpenAI-compatible raw chunk field-name summaries only, for example delta_keys: ["content"] or delta_keys: ["reasoning_content"], without recording raw text.
Verification
- Export a normal session and confirm the export includes LLM trace diagnostics for assistant messages that called the model.
- Verify trace records correlate to existing message/session IDs.
- Verify a streaming text response records
text_delta > 0, stored text > 0, and no false reasoning counts.
- Verify a reasoning-capable model path records
reasoning_delta and/or stored reasoning parts when structured reasoning is actually emitted.
- Verify exported diagnostics do not include API keys, auth headers, full prompts, or full model output by default.
- Verify older exports remain readable or schema-versioned clearly.
Execution mode
Agent should investigate and propose a plan first
Goal
Session exports should include enough LLM trace diagnostics to explain model-output pipeline failures without requiring ad hoc dev builds, SQLite inspection, or HTTPS proxy capture.
When a model appears to expose hidden reasoning, drop output, misroute tool calls, or stream malformed content, maintainers should be able to inspect the exported session and see where the behavior changed shape:
Scope
In scope:
trace_id,session_id, andmessage_id.text-deltafromreasoning-deltaand finaltextparts fromreasoningparts.Out of scope for the first pass:
Relevant files or context
Likely files:
packages/opencode/src/session/llm.tspackages/opencode/src/session/processor.tspackages/opencode/src/session/export.tspackages/opencode/src/provider/transform.tsRecent motivating case:
tokens.reasoning = 0and only storedtextparts, but it could not show whether the upstream stream usedcontent,reasoning_content,reasoning_text, or whether the AI SDK normalized the content intotext-delta.Suggested trace shape for the design to evaluate:
{ "message_id": "msg_xxx", "session_id": "ses_xxx", "trace_id": "msg_xxx", "provider": "kimi-for-coding", "model": "k2p6", "request": { "streaming": true, "tool_count": 12, "reasoning_capability": true, "interleaved_field": null }, "stream_events": { "text_delta": 42, "reasoning_delta": 0, "tool_call": 3, "finish_reason": "tool-calls" }, "stored_parts": { "text": 2, "reasoning": 0, "tool": 3 }, "tokens": { "output": 290, "reasoning": 0 } }A later optional layer could record OpenAI-compatible raw chunk field-name summaries only, for example
delta_keys: ["content"]ordelta_keys: ["reasoning_content"], without recording raw text.Verification
text_delta > 0, storedtext > 0, and no false reasoning counts.reasoning_deltaand/or stored reasoning parts when structured reasoning is actually emitted.Execution mode
Agent should investigate and propose a plan first