Skip to content

Track reasoning token counts separately from output tokens #665

@nabinchha

Description

@nabinchha

Context

DataDesigner currently records model token usage as input/output tokens only. For reasoning models, providers often expose reasoning tokens as a breakdown inside output/completion usage rather than as an additional total.

Current code path:

  • Usage only carries input_tokens, output_tokens, total_tokens, and image usage.
  • extract_usage() maps prompt_tokens/completion_tokens or input_tokens/output_tokens, but does not read provider-specific reasoning-token breakdowns.
  • TokenUsageStats only exposes input/output tokens.

Provider behavior observed from docs

  • OpenAI Chat Completions reports completion_tokens_details.reasoning_tokens; those reasoning tokens are included in completion_tokens.
  • OpenAI Responses reports output_tokens_details.reasoning_tokens; those reasoning tokens are included in output_tokens.
  • Anthropic extended thinking is charged as output-token usage, although the visible thinking content may be summarized or omitted.
  • vLLM exposes reasoning content via message.reasoning; we should confirm from a representative vLLM response whether its reported completion_tokens includes reasoning tokens in our supported server configuration.

Problem

DD likely reports total provider-billed output tokens correctly for OpenAI/Anthropic-style usage, but it drops the separate reasoning-token count. Users cannot inspect how much of output usage came from hidden/visible reasoning versus final answer tokens.

Proposal

Add optional reasoning-token tracking through the canonical model usage path:

  • Preserve existing output_tokens behavior exactly. output_tokens should continue to mean whatever the provider reports as output/completion tokens.
  • Extend provider usage parsing to capture completion_tokens_details.reasoning_tokens, output_tokens_details.reasoning_tokens, and any top-level provider variant if needed.
  • Add a reasoning_tokens field to canonical usage stats as a separate breakdown only. Do not add it again to output_tokens or total_tokens, since providers already include reasoning tokens in output/completion token counts when they report usage that way.
  • Keep telemetry out of scope for this issue.
  • Add provider-shape tests for OpenAI Chat Completions, OpenAI Responses-style usage, Anthropic thinking usage, and a captured vLLM response if available.

Acceptance criteria

  • Existing output_tokens totals remain backward compatible.
  • Reasoning token counts are preserved when providers report them.
  • Reasoning tokens are not double-counted in output_tokens or total_tokens.
  • Telemetry schema/events are not changed as part of this issue.
  • Behavior is documented clearly enough that users understand output_tokens includes reasoning tokens for providers that report usage that way.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions