V4 endpoint adoption tracker: FIM, chat-prefix, strict tool mode, 128-token alignment

## Context

DeepSeek V4 ships several endpoints and protocol features that the TUI currently doesn't wrap in tool surfaces, even though they're documented as available behind `/beta` (per `CLAUDE.md`'s API integration section). Each gap is independently small but cumulatively meaningful — V4 was designed assuming clients use these features, and a "V4-native" frontend should at least audit what's missing before shipping the next major.

Tracker for the four most concrete gaps. Each is a v0.9.0 candidate; none are v0.8.11 release blockers.

## Gap 1 — Explicit 128-token boundary alignment in prompt construction

V4 caches shared prefixes at 128-token granularity with ~90% cost discount. Our prompt assembler in `crates/tui/src/prompts.rs` composes layers in deterministic order (`base.md → personality → mode delta → approval policy`), which is good for cache stability across sessions, but we don't pad layer boundaries to 128-token alignment. A single-line edit to `base.md` early in the chain can push a downstream section across a 128-token boundary, silently invalidating cache for that prefix region.

What "good" looks like:

- Audit current prompt sections for typical token counts at 128-token granularity.
- Add explicit padding between major sections so an N-token edit in section K doesn't ripple into section K+1's cache key.
- Add a test that asserts byte-stable prompt composition still produces token-aligned section boundaries.

Codex's prompt-builder pads more deliberately than ours; their pattern is worth a look.

## Gap 2 — FIM (Fill-In-the-Middle) tool surface

`api.deepseek.com/beta` supports FIM completion. We don't wrap it in any tool. For small in-line code edits (inserting a function, completing a half-written branch, filling in a stub), FIM is more cache-efficient than `apply_patch` because the prefix and suffix don't get rewritten — only the middle is generated. It also produces cleaner diffs by construction.

Sketch:

- New tool `fim_edit { path, prefix_anchor, suffix_anchor }` that calls the FIM endpoint with the file's content split at the anchors.
- Returns the generated middle, applied as a precise edit (no fuzz matching).
- Fallback to `apply_patch` when the model isn't running on a FIM-capable endpoint.

## Gap 3 — Chat prefix completion tool surface

Same `/beta` endpoint family supports chat-prefix completion: "continue this assistant message under the same context / persona." Useful for patterns like "extend this draft," "refine this answer," "produce the next section in the same voice." We don't have a tool, and there's no obvious caller wanting one — so this is the lowest-priority of the four — but worth filing while the rest of the V4 audit is in scope.

## Gap 4 — Strict tool mode opt-in

`/beta` supports strict tool mode, where the model's tool-call JSON is validated against the declared schema before being delivered to the client. Trade-off: the model is more likely to refuse than malform, which is the right trade for high-stakes tools (`apply_patch`, `exec_shell`, anything that mutates filesystem or runs code). For exploratory tools (`grep_files`, `read_file`) the existing flexible mode is fine.

What "good" looks like:

- Per-tool opt-in flag in the tool spec.
- Engine routes opted-in tools through the strict endpoint when the active provider supports it; falls back to flexible mode when not.
- Add a test that pins which tools are opted in.

## Out of scope

- New theme variants, palette changes, or any UX-visible work — that's tracked elsewhere.
- The handoff-instead-of-compact pattern — separate issue (filed alongside this).
- Auto reasoning_effort + cache-aware adaptation — separate issue.

## Reporter

Maintainer-scoped engineering observation (Hmbown), 2026-05-04. Captured during a V4-paper-derived audit of the current tool surface.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V4 endpoint adoption tracker: FIM, chat-prefix, strict tool mode, 128-token alignment #662

Context

Gap 1 — Explicit 128-token boundary alignment in prompt construction

Gap 2 — FIM (Fill-In-the-Middle) tool surface

Gap 3 — Chat prefix completion tool surface

Gap 4 — Strict tool mode opt-in

Out of scope

Reporter

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

V4 endpoint adoption tracker: FIM, chat-prefix, strict tool mode, 128-token alignment #662

Description

Context

Gap 1 — Explicit 128-token boundary alignment in prompt construction

Gap 2 — FIM (Fill-In-the-Middle) tool surface

Gap 3 — Chat prefix completion tool surface

Gap 4 — Strict tool mode opt-in

Out of scope

Reporter

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions