Context
DeepSeek V4 ships several endpoints and protocol features that the TUI currently doesn't wrap in tool surfaces, even though they're documented as available behind /beta (per CLAUDE.md's API integration section). Each gap is independently small but cumulatively meaningful — V4 was designed assuming clients use these features, and a "V4-native" frontend should at least audit what's missing before shipping the next major.
Tracker for the four most concrete gaps. Each is a v0.9.0 candidate; none are v0.8.11 release blockers.
Gap 1 — Explicit 128-token boundary alignment in prompt construction
V4 caches shared prefixes at 128-token granularity with ~90% cost discount. Our prompt assembler in crates/tui/src/prompts.rs composes layers in deterministic order (base.md → personality → mode delta → approval policy), which is good for cache stability across sessions, but we don't pad layer boundaries to 128-token alignment. A single-line edit to base.md early in the chain can push a downstream section across a 128-token boundary, silently invalidating cache for that prefix region.
What "good" looks like:
- Audit current prompt sections for typical token counts at 128-token granularity.
- Add explicit padding between major sections so an N-token edit in section K doesn't ripple into section K+1's cache key.
- Add a test that asserts byte-stable prompt composition still produces token-aligned section boundaries.
Codex's prompt-builder pads more deliberately than ours; their pattern is worth a look.
Gap 2 — FIM (Fill-In-the-Middle) tool surface
api.deepseek.com/beta supports FIM completion. We don't wrap it in any tool. For small in-line code edits (inserting a function, completing a half-written branch, filling in a stub), FIM is more cache-efficient than apply_patch because the prefix and suffix don't get rewritten — only the middle is generated. It also produces cleaner diffs by construction.
Sketch:
- New tool
fim_edit { path, prefix_anchor, suffix_anchor } that calls the FIM endpoint with the file's content split at the anchors.
- Returns the generated middle, applied as a precise edit (no fuzz matching).
- Fallback to
apply_patch when the model isn't running on a FIM-capable endpoint.
Gap 3 — Chat prefix completion tool surface
Same /beta endpoint family supports chat-prefix completion: "continue this assistant message under the same context / persona." Useful for patterns like "extend this draft," "refine this answer," "produce the next section in the same voice." We don't have a tool, and there's no obvious caller wanting one — so this is the lowest-priority of the four — but worth filing while the rest of the V4 audit is in scope.
Gap 4 — Strict tool mode opt-in
/beta supports strict tool mode, where the model's tool-call JSON is validated against the declared schema before being delivered to the client. Trade-off: the model is more likely to refuse than malform, which is the right trade for high-stakes tools (apply_patch, exec_shell, anything that mutates filesystem or runs code). For exploratory tools (grep_files, read_file) the existing flexible mode is fine.
What "good" looks like:
- Per-tool opt-in flag in the tool spec.
- Engine routes opted-in tools through the strict endpoint when the active provider supports it; falls back to flexible mode when not.
- Add a test that pins which tools are opted in.
Out of scope
- New theme variants, palette changes, or any UX-visible work — that's tracked elsewhere.
- The handoff-instead-of-compact pattern — separate issue (filed alongside this).
- Auto reasoning_effort + cache-aware adaptation — separate issue.
Reporter
Maintainer-scoped engineering observation (Hmbown), 2026-05-04. Captured during a V4-paper-derived audit of the current tool surface.
Context
DeepSeek V4 ships several endpoints and protocol features that the TUI currently doesn't wrap in tool surfaces, even though they're documented as available behind
/beta(perCLAUDE.md's API integration section). Each gap is independently small but cumulatively meaningful — V4 was designed assuming clients use these features, and a "V4-native" frontend should at least audit what's missing before shipping the next major.Tracker for the four most concrete gaps. Each is a v0.9.0 candidate; none are v0.8.11 release blockers.
Gap 1 — Explicit 128-token boundary alignment in prompt construction
V4 caches shared prefixes at 128-token granularity with ~90% cost discount. Our prompt assembler in
crates/tui/src/prompts.rscomposes layers in deterministic order (base.md → personality → mode delta → approval policy), which is good for cache stability across sessions, but we don't pad layer boundaries to 128-token alignment. A single-line edit tobase.mdearly in the chain can push a downstream section across a 128-token boundary, silently invalidating cache for that prefix region.What "good" looks like:
Codex's prompt-builder pads more deliberately than ours; their pattern is worth a look.
Gap 2 — FIM (Fill-In-the-Middle) tool surface
api.deepseek.com/betasupports FIM completion. We don't wrap it in any tool. For small in-line code edits (inserting a function, completing a half-written branch, filling in a stub), FIM is more cache-efficient thanapply_patchbecause the prefix and suffix don't get rewritten — only the middle is generated. It also produces cleaner diffs by construction.Sketch:
fim_edit { path, prefix_anchor, suffix_anchor }that calls the FIM endpoint with the file's content split at the anchors.apply_patchwhen the model isn't running on a FIM-capable endpoint.Gap 3 — Chat prefix completion tool surface
Same
/betaendpoint family supports chat-prefix completion: "continue this assistant message under the same context / persona." Useful for patterns like "extend this draft," "refine this answer," "produce the next section in the same voice." We don't have a tool, and there's no obvious caller wanting one — so this is the lowest-priority of the four — but worth filing while the rest of the V4 audit is in scope.Gap 4 — Strict tool mode opt-in
/betasupports strict tool mode, where the model's tool-call JSON is validated against the declared schema before being delivered to the client. Trade-off: the model is more likely to refuse than malform, which is the right trade for high-stakes tools (apply_patch,exec_shell, anything that mutates filesystem or runs code). For exploratory tools (grep_files,read_file) the existing flexible mode is fine.What "good" looks like:
Out of scope
Reporter
Maintainer-scoped engineering observation (Hmbown), 2026-05-04. Captured during a V4-paper-derived audit of the current tool surface.