Thesis
DeepSeek V4 recommends temperature=1.0, top_p=1.0 (V3 used lower for code). Our defaults across the engine and the RLM bridge still use None (which inherits API defaults that differ between V3 and V4) or hardcoded V3-era values like 0.4/0.9. Anything still on the old defaults is silently degraded on V4. Apply V4 sampling conditionally — V3 paths keep their existing values.
Current behavior
crates/tui/src/core/engine/turn_loop.rs:246 sends temperature: None, top_p: None, inheriting whatever the upstream API defaults to per model.
crates/tui/src/rlm/bridge.rs:108-109 hardcodes temperature: Some(0.4_f32), top_p: Some(0.9_f32) for every sub-LLM call inside RLM. Every RLM sub-call is silently running V3 sampling on a V4 child model.
- Thinking-mode parameter is implicit; no per-turn control surfaced beyond the global
reasoning_effort mapping in crates/tui/src/models.rs:32.
Proposed change
- Conditional sampling defaults. When the request model name matches the V4 family (prefix
deepseek-v4- or NIM equivalent), inject temperature: Some(1.0), top_p: Some(1.0) at the engine layer and at the RLM bridge layer. V3 / custom-base-URL / other-provider paths keep current behavior.
- Wire thinking-mode parameter explicitly where the API surface exposes it. The current
reasoning_effort mapping stays as the user-facing axis; the wire serialization should ensure the V4 thinking flag is set correctly on each request.
- Drop "tokenizer / encoder migration" framing.
encode_messages from encoding_dsv4 is a self-host concern; we hit the OpenAI-compatible API and don't run the encoder ourselves. Out of scope.
Open questions / risks
- Custom base URL users may serve non-V4 models that match the V4 name prefix — model-family detection should be robust to that, or at minimum opt-out-able.
- The RLM bridge's
child_model defaults to deepseek-v4-flash (crates/tui/src/tools/rlm.rs:24). Confirm this is the only place a sub-call model is set so we don't miss a path.
Acceptance signals
- V4 requests hit the API with 1.0/1.0 sampling.
- RLM sub-calls use V4 sampling when the child model is V4.
- V3 / legacy / custom-provider paths unchanged.
- Test matrix covers: V4 root, V4 sub-call, V3 root, custom-provider root.
Thesis
DeepSeek V4 recommends
temperature=1.0,top_p=1.0(V3 used lower for code). Our defaults across the engine and the RLM bridge still useNone(which inherits API defaults that differ between V3 and V4) or hardcoded V3-era values like0.4/0.9. Anything still on the old defaults is silently degraded on V4. Apply V4 sampling conditionally — V3 paths keep their existing values.Current behavior
crates/tui/src/core/engine/turn_loop.rs:246sendstemperature: None, top_p: None, inheriting whatever the upstream API defaults to per model.crates/tui/src/rlm/bridge.rs:108-109hardcodestemperature: Some(0.4_f32), top_p: Some(0.9_f32)for every sub-LLM call inside RLM. Every RLM sub-call is silently running V3 sampling on a V4 child model.reasoning_effortmapping incrates/tui/src/models.rs:32.Proposed change
deepseek-v4-or NIM equivalent), injecttemperature: Some(1.0), top_p: Some(1.0)at the engine layer and at the RLM bridge layer. V3 / custom-base-URL / other-provider paths keep current behavior.reasoning_effortmapping stays as the user-facing axis; the wire serialization should ensure the V4 thinking flag is set correctly on each request.encode_messagesfromencoding_dsv4is a self-host concern; we hit the OpenAI-compatible API and don't run the encoder ourselves. Out of scope.Open questions / risks
child_modeldefaults todeepseek-v4-flash(crates/tui/src/tools/rlm.rs:24). Confirm this is the only place a sub-call model is set so we don't miss a path.Acceptance signals