V4: sampling defaults + thinking-mode parameter wiring

## Thesis
DeepSeek V4 recommends `temperature=1.0`, `top_p=1.0` (V3 used lower for code). Our defaults across the engine and the RLM bridge still use `None` (which inherits API defaults that differ between V3 and V4) or hardcoded V3-era values like `0.4`/`0.9`. Anything still on the old defaults is silently degraded on V4. Apply V4 sampling conditionally — V3 paths keep their existing values.

## Current behavior
- `crates/tui/src/core/engine/turn_loop.rs:246` sends `temperature: None, top_p: None`, inheriting whatever the upstream API defaults to per model.
- `crates/tui/src/rlm/bridge.rs:108-109` hardcodes `temperature: Some(0.4_f32), top_p: Some(0.9_f32)` for every sub-LLM call inside RLM. **Every RLM sub-call is silently running V3 sampling on a V4 child model.**
- Thinking-mode parameter is implicit; no per-turn control surfaced beyond the global `reasoning_effort` mapping in `crates/tui/src/models.rs:32`.

## Proposed change
- **Conditional sampling defaults.** When the request model name matches the V4 family (prefix `deepseek-v4-` or NIM equivalent), inject `temperature: Some(1.0), top_p: Some(1.0)` at the engine layer and at the RLM bridge layer. V3 / custom-base-URL / other-provider paths keep current behavior.
- **Wire thinking-mode parameter explicitly** where the API surface exposes it. The current `reasoning_effort` mapping stays as the user-facing axis; the wire serialization should ensure the V4 thinking flag is set correctly on each request.
- **Drop "tokenizer / encoder migration" framing.** `encode_messages` from `encoding_dsv4` is a self-host concern; we hit the OpenAI-compatible API and don't run the encoder ourselves. Out of scope.

## Open questions / risks
- Custom base URL users may serve non-V4 models that match the V4 name prefix — model-family detection should be robust to that, or at minimum opt-out-able.
- The RLM bridge's `child_model` defaults to `deepseek-v4-flash` (`crates/tui/src/tools/rlm.rs:24`). Confirm this is the only place a sub-call model is set so we don't miss a path.

## Acceptance signals
- V4 requests hit the API with 1.0/1.0 sampling.
- RLM sub-calls use V4 sampling when the child model is V4.
- V3 / legacy / custom-provider paths unchanged.
- Test matrix covers: V4 root, V4 sub-call, V3 root, custom-provider root.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V4: sampling defaults + thinking-mode parameter wiring #540

Thesis

Current behavior

Proposed change

Open questions / risks

Acceptance signals

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

V4: sampling defaults + thinking-mode parameter wiring #540

Description

Thesis

Current behavior

Proposed change

Open questions / risks

Acceptance signals

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions