Skip to content

V4: sampling defaults + thinking-mode parameter wiring #540

@Hmbown

Description

@Hmbown

Thesis

DeepSeek V4 recommends temperature=1.0, top_p=1.0 (V3 used lower for code). Our defaults across the engine and the RLM bridge still use None (which inherits API defaults that differ between V3 and V4) or hardcoded V3-era values like 0.4/0.9. Anything still on the old defaults is silently degraded on V4. Apply V4 sampling conditionally — V3 paths keep their existing values.

Current behavior

  • crates/tui/src/core/engine/turn_loop.rs:246 sends temperature: None, top_p: None, inheriting whatever the upstream API defaults to per model.
  • crates/tui/src/rlm/bridge.rs:108-109 hardcodes temperature: Some(0.4_f32), top_p: Some(0.9_f32) for every sub-LLM call inside RLM. Every RLM sub-call is silently running V3 sampling on a V4 child model.
  • Thinking-mode parameter is implicit; no per-turn control surfaced beyond the global reasoning_effort mapping in crates/tui/src/models.rs:32.

Proposed change

  • Conditional sampling defaults. When the request model name matches the V4 family (prefix deepseek-v4- or NIM equivalent), inject temperature: Some(1.0), top_p: Some(1.0) at the engine layer and at the RLM bridge layer. V3 / custom-base-URL / other-provider paths keep current behavior.
  • Wire thinking-mode parameter explicitly where the API surface exposes it. The current reasoning_effort mapping stays as the user-facing axis; the wire serialization should ensure the V4 thinking flag is set correctly on each request.
  • Drop "tokenizer / encoder migration" framing. encode_messages from encoding_dsv4 is a self-host concern; we hit the OpenAI-compatible API and don't run the encoder ourselves. Out of scope.

Open questions / risks

  • Custom base URL users may serve non-V4 models that match the V4 name prefix — model-family detection should be robust to that, or at minimum opt-out-able.
  • The RLM bridge's child_model defaults to deepseek-v4-flash (crates/tui/src/tools/rlm.rs:24). Confirm this is the only place a sub-call model is set so we don't miss a path.

Acceptance signals

  • V4 requests hit the API with 1.0/1.0 sampling.
  • RLM sub-calls use V4 sampling when the child model is V4.
  • V3 / legacy / custom-provider paths unchanged.
  • Test matrix covers: V4 root, V4 sub-call, V3 root, custom-provider root.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestv0.8.9Targeting v0.8.9

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions