Skip to content

[Task] Add LLM trace diagnostics to session exports #454

@Astro-Han

Description

@Astro-Han

Goal

Session exports should include enough LLM trace diagnostics to explain model-output pipeline failures without requiring ad hoc dev builds, SQLite inspection, or HTTPS proxy capture.

When a model appears to expose hidden reasoning, drop output, misroute tool calls, or stream malformed content, maintainers should be able to inspect the exported session and see where the behavior changed shape:

  • request/model configuration
  • AI SDK normalized stream events
  • PawWork session processor parts
  • final stored message/token metadata

Scope

In scope:

  • Design a lightweight LLM trace summary captured at runtime and included in session export JSON.
  • Correlate trace records by trace_id, session_id, and message_id.
  • Capture provider/model identity, selected key provider options, stream event type counts, finish reason, token usage, and final stored part-type counts.
  • Include enough detail to distinguish text-delta from reasoning-delta and final text parts from reasoning parts.
  • Keep diagnostics safe by default: no API keys, no full prompts, no full model output unless an explicit local debug mode is enabled.
  • Define export schema/versioning and retention behavior for recent or relevant trace records.

Out of scope for the first pass:

  • Fixing any specific Kimi, OpenAI-compatible, or AI SDK provider behavior.
  • Adding a full packet capture or MITM proxy flow.
  • Persisting complete raw provider responses by default.
  • Changing frontend rendering of reasoning blocks.

Relevant files or context

Likely files:

  • packages/opencode/src/session/llm.ts
  • packages/opencode/src/session/processor.ts
  • packages/opencode/src/session/export.ts
  • packages/opencode/src/provider/transform.ts
  • OpenAI-compatible provider adapter code, if raw chunk field-name summaries are included later

Recent motivating case:

  • A Kimi K2.6 session appeared to emit thinking as visible natural language. The existing export showed tokens.reasoning = 0 and only stored text parts, but it could not show whether the upstream stream used content, reasoning_content, reasoning_text, or whether the AI SDK normalized the content into text-delta.

Suggested trace shape for the design to evaluate:

{
  "message_id": "msg_xxx",
  "session_id": "ses_xxx",
  "trace_id": "msg_xxx",
  "provider": "kimi-for-coding",
  "model": "k2p6",
  "request": {
    "streaming": true,
    "tool_count": 12,
    "reasoning_capability": true,
    "interleaved_field": null
  },
  "stream_events": {
    "text_delta": 42,
    "reasoning_delta": 0,
    "tool_call": 3,
    "finish_reason": "tool-calls"
  },
  "stored_parts": {
    "text": 2,
    "reasoning": 0,
    "tool": 3
  },
  "tokens": {
    "output": 290,
    "reasoning": 0
  }
}

A later optional layer could record OpenAI-compatible raw chunk field-name summaries only, for example delta_keys: ["content"] or delta_keys: ["reasoning_content"], without recording raw text.

Verification

  • Export a normal session and confirm the export includes LLM trace diagnostics for assistant messages that called the model.
  • Verify trace records correlate to existing message/session IDs.
  • Verify a streaming text response records text_delta > 0, stored text > 0, and no false reasoning counts.
  • Verify a reasoning-capable model path records reasoning_delta and/or stored reasoning parts when structured reasoning is actually emitted.
  • Verify exported diagnostics do not include API keys, auth headers, full prompts, or full model output by default.
  • Verify older exports remain readable or schema-versioned clearly.

Execution mode

Agent should investigate and propose a plan first

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium priorityappApplication behavior and product flowsharnessModel harness, prompts, tool descriptions, and session mechanicstaskNarrow execution, audit, spike, migration, tracking, or upstream follow-up work

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions