Skip to content

V4 endpoint adoption tracker: FIM, chat-prefix, strict tool mode, 128-token alignment #662

@Hmbown

Description

@Hmbown

Context

DeepSeek V4 ships several endpoints and protocol features that the TUI currently doesn't wrap in tool surfaces, even though they're documented as available behind /beta (per CLAUDE.md's API integration section). Each gap is independently small but cumulatively meaningful — V4 was designed assuming clients use these features, and a "V4-native" frontend should at least audit what's missing before shipping the next major.

Tracker for the four most concrete gaps. Each is a v0.9.0 candidate; none are v0.8.11 release blockers.

Gap 1 — Explicit 128-token boundary alignment in prompt construction

V4 caches shared prefixes at 128-token granularity with ~90% cost discount. Our prompt assembler in crates/tui/src/prompts.rs composes layers in deterministic order (base.md → personality → mode delta → approval policy), which is good for cache stability across sessions, but we don't pad layer boundaries to 128-token alignment. A single-line edit to base.md early in the chain can push a downstream section across a 128-token boundary, silently invalidating cache for that prefix region.

What "good" looks like:

  • Audit current prompt sections for typical token counts at 128-token granularity.
  • Add explicit padding between major sections so an N-token edit in section K doesn't ripple into section K+1's cache key.
  • Add a test that asserts byte-stable prompt composition still produces token-aligned section boundaries.

Codex's prompt-builder pads more deliberately than ours; their pattern is worth a look.

Gap 2 — FIM (Fill-In-the-Middle) tool surface

api.deepseek.com/beta supports FIM completion. We don't wrap it in any tool. For small in-line code edits (inserting a function, completing a half-written branch, filling in a stub), FIM is more cache-efficient than apply_patch because the prefix and suffix don't get rewritten — only the middle is generated. It also produces cleaner diffs by construction.

Sketch:

  • New tool fim_edit { path, prefix_anchor, suffix_anchor } that calls the FIM endpoint with the file's content split at the anchors.
  • Returns the generated middle, applied as a precise edit (no fuzz matching).
  • Fallback to apply_patch when the model isn't running on a FIM-capable endpoint.

Gap 3 — Chat prefix completion tool surface

Same /beta endpoint family supports chat-prefix completion: "continue this assistant message under the same context / persona." Useful for patterns like "extend this draft," "refine this answer," "produce the next section in the same voice." We don't have a tool, and there's no obvious caller wanting one — so this is the lowest-priority of the four — but worth filing while the rest of the V4 audit is in scope.

Gap 4 — Strict tool mode opt-in

/beta supports strict tool mode, where the model's tool-call JSON is validated against the declared schema before being delivered to the client. Trade-off: the model is more likely to refuse than malform, which is the right trade for high-stakes tools (apply_patch, exec_shell, anything that mutates filesystem or runs code). For exploratory tools (grep_files, read_file) the existing flexible mode is fine.

What "good" looks like:

  • Per-tool opt-in flag in the tool spec.
  • Engine routes opted-in tools through the strict endpoint when the active provider supports it; falls back to flexible mode when not.
  • Add a test that pins which tools are opted in.

Out of scope

  • New theme variants, palette changes, or any UX-visible work — that's tracked elsewhere.
  • The handoff-instead-of-compact pattern — separate issue (filed alongside this).
  • Auto reasoning_effort + cache-aware adaptation — separate issue.

Reporter

Maintainer-scoped engineering observation (Hmbown), 2026-05-04. Captured during a V4-paper-derived audit of the current tool surface.

Metadata

Metadata

Assignees

No one assigned

    Labels

    cache-maximalismDeepSeek V4 cache-maximal context and agent architectureenhancementNew feature or requestv0.8.11Targeting v0.8.11

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions