Skip to content

feat(cache): slim runtime_prompt to minimal tag, move policy descriptions to system prompt#2874

Merged
Hmbown merged 7 commits into
Hmbown:codex/v0.9.0-stewardshipfrom
LeoAlex0:feat/slim-runtime-prompt
Jun 7, 2026
Merged

feat(cache): slim runtime_prompt to minimal tag, move policy descriptions to system prompt#2874
Hmbown merged 7 commits into
Hmbown:codex/v0.9.0-stewardshipfrom
LeoAlex0:feat/slim-runtime-prompt

Conversation

@LeoAlex0

@LeoAlex0 LeoAlex0 commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Summary / 概要

Builds on #2801 / 基于 #2801(merged / 已合并)

#2801 moved mode/approval policy descriptions out of the byte-stable system prompt into a per-turn <runtime_prompt> transient message. While it solved the prefix-cache invalidation problem, it introduced a recurring cost: the full policy text (~500 tokens) was repeated on every API request — including tool-call continuations within the same turn.

This PR restructures the information architecture: static policy descriptions now live in the frozen system-prompt prefix (sent once per session, cache-hit thereafter), and only a minimal tag is sent per turn.

#2801 将 mode/approval 策略描述从稳定的 system prompt 中移出,改为 per-turn transient message。虽然解决了 prefix-cache 失效问题,但引入了新成本:完整策略文本(~500 tokens)在每个 API 请求都重复发送,包括同一 turn 内的 tool-call 续轮。

本 PR 重构信息架构:静态策略描述放入 frozen system-prompt prefix(每 session 发送一次,后续请求全部 cache hit),每 turn 只发送最小 tag。


Scope / 范围

  • Add render_runtime_policy_reference() — a static reference block containing all mode + approval policy descriptions, plus a tag-interpretation protocol. / 新增 render_runtime_policy_reference(),生成包含所有 mode + approval 策略描述的静态参考块,附带 tag 解释协议。
  • Insert the reference into the system prompt above the volatile-content boundary (frozen prefix region). / 将 reference 插入 system prompt 的 volatile-content boundary 之上(frozen prefix 区域)。
  • Simplify runtime_prompt_text() from ~500-token XML block to ~16-token self-closing tag: <runtime_prompt visibility="internal" mode="yolo" approval="auto"/>. / 将 runtime_prompt_text() 从 ~500-token XML 块简化为 ~16-token 自闭合标签。
  • Fix markdown heading hierarchy in 6 .md files (#######) so they nest correctly under #### in the reference section. / 修正 6 个 .md 文件的标题层次 (#######),确保在 reference 章节中正确嵌套。
  • Remove unused legacy functions: mode_prompt(), approval_prompt_for_mode(), mode_change_runtime_message(). / 删除不再使用的遗留函数。
  • Simplify Op::ChangeMode — no longer persists a separate mode_change event; the next turn tag carries the current mode. / 简化 Op::ChangeMode,不再持久化额外的 mode_change 事件,下一轮 tag 即携带当前 mode。
  • Update & rename affected tests (2 tests rewritten, 5 assertions updated). / 更新并重命名受影响的测试(2 个测试重写,5 处断言更新)。

Not in this slice / 不涉及

  • Policy content is byte-identical; only heading levels changed. / 策略内容本身未修改,仅标题层次变更。
  • Tag-interpretation protocol unchanged — already established in the system prompt reference section. / 不修改 tag 解释协议,已通过 system prompt 中的参考章节确立。
  • A/B testing model behavior with the new tag format — follow-up work. / 新 tag 格式的 A/B 测试属于后续工作。

Tradeoff / 权衡

Dimension / 维度 Before (post-#2801) / 旧 After (this PR) / 新
System prompt size / 系统提示体积 ~8,000 tokens ~9,325 tokens
— one-time miss cost / 首次请求 miss ~8,000 ~9,325 (+1,325)
— subsequent cache-hit / 后续 cache-hit ~800 (~10%) ~932 (~10%)
Per-request runtime tag / 每轮标签 ~487 tokens (miss every time / 每次 miss) ~16 tokens (miss every time / 每次 miss)
10-turn session / 10 轮会话 (60 API calls) 8,000 + 60×487 = 37,220 miss 9,325 + 60×16 = 10,285 miss
Saving / 节省 26,935 tokens (72%)
Break-even / 回本点 3 API calls / 3 次 API 调用
Model behavior / 模型行为 Sees full policy every turn / 每轮显式看到完整策略 Looks up policy from system prompt reference / 从 reference 查表

Core tradeoff: system prompt grows by 1,325 tokens (one-time miss) in exchange for saving 471 tokens per API call. Net positive after 3 API calls. The model shifts from receiving explicit policy text every turn to looking up rules by tag — the reference section includes an explicit protocol instruction. Structural correctness is verified by tests; production behavioral stability needs follow-up observation.

核心权衡:system prompt 膨胀 1,325 tokens(仅首次 miss),换取每轮节省 471 tokens。3 次 API 调用后净正收益。模型从"每轮显式接收策略描述"改为"按 tag 查表"——参考章节中有明确的协议指引。结构性正确性已由测试验证;生产行为稳定性需要后续观测。


Builds on / 基于


Validation / 验证

cargo fmt --all                                   # ✅ passed / 通过
cargo clippy -p codewhale-tui -- -D warnings       # ✅ passed (0 warnings)
cargo test -p codewhale-tui -- runtime_prompt mode_change  # ✅ 9 passed, 0 failed
cargo check -p codewhale-tui                       # ✅ passed / 通过

Files changed / 改动文件: 9 files, +143 / −170 lines

Greptile Summary

This PR restructures the runtime-policy information architecture by moving all mode and approval-policy descriptions from a per-turn <runtime_prompt> XML block (~500 tokens, re-sent on every API call) into a static ## Runtime Policy Reference section in the frozen system-prompt prefix, and replacing the per-turn block with a minimal self-closing tag (~16 tokens). The change saves roughly 72% of per-session miss tokens (break-even at 3 API calls) at the cost of a one-time +1,325 token system-prompt miss.

  • render_runtime_policy_reference() added to prompts.rs — generates a lookup table listing all three modes and all three approval policies; inserted at step 5a of the frozen-prefix composition pipeline so it is byte-stable across turns.
  • runtime_prompt_text() in engine.rs simplified to <runtime_prompt visibility="internal" mode="…" approval="…"/>, removing mode_change_runtime_message and the associated per-mode re-evaluation hints from Op::ChangeMode.
  • Six .md files had heading levels adjusted (####### / ######) to nest correctly under the ####-level subsections in the reference block; two tests were rewritten and five assertions updated to match the new tag format.

Confidence Score: 5/5

Safe to merge. The structural refactor is self-consistent: approval-mode constraints (YOLO→auto, Plan→never) are still enforced by approval_mode_for() before runtime_prompt_text() is called, and the reference block is always present in the full system prompt used by the engine.

All changed code paths are covered by the updated tests. The removed mode_change_runtime_message was a behavioral notification, not a structural safety check, and its removal is explicitly acknowledged as a follow-up observation item. No data-loss or session-corruption paths were identified.

crates/tui/src/core/engine/tests.rs — the two rewritten test functions have some weaknesses in assertion coverage for the mode-change path, as noted in inline comments.

Important Files Changed

Filename Overview
crates/tui/src/prompts.rs Core architectural change: adds render_runtime_policy_reference() (all mode+approval descriptions as static lookup table in system prompt), renames render_core_tool_taxonomy_block → render_core_tool_taxonomy_body (no longer emits its own heading), removes mode_prompt/approval_prompt_for_mode/default_approval_mode_for_mode helpers. Reference is inserted at step 5a of the frozen-prefix composition pipeline.
crates/tui/src/core/engine.rs runtime_prompt_text() slimmed from ~500-token XML block to a ~16-token self-closing tag; Op::ChangeMode no longer injects a mode_change runtime event into session history; mode_prompt_marker, approval_prompt_marker, mode_prompt_text, mode_change_runtime_message helpers removed.
crates/tui/src/core/engine/tests.rs Two tests rewritten: change_mode_op_injects_runtime_event → change_mode_op_updates_current_mode_and_emits_status (verifies event structure but no longer validates mode identity in Status content); mode_change_runtime_message_format → current_mode_field_assignment_takes_effect_synchronously (exercises runtime tag reflection but bypasses Op::ChangeMode dispatch). Five assertions updated to match new tag format.
crates/tui/src/prompts/modes/agent.md Top heading ## → #####; sub-headings ## Efficient Approvals and ## Session Longevity → ###### so they nest correctly under the #### agent section in the reference block.
crates/tui/src/prompts/approvals/auto.md Heading changed from ## to ##### so it nests correctly under #### auto in the Runtime Policy Reference.

Sequence Diagram

sequenceDiagram
    participant Engine
    participant SP as "System Prompt (frozen prefix)"
    participant API as "LLM API"
    participant Model

    Note over SP: render_runtime_policy_reference() contains all mode and approval descriptions. Sent once per session.

    Engine->>API: Request (system_prompt + session_messages + runtime_tag)
    Note over API: system_prompt is byte-stable, cache hits after first call
    API->>Model: system_prompt cached ~9325 tokens
    API->>Model: session_messages history
    API->>Model: runtime_prompt_message() ~16 tokens misses every time
    Model->>Model: Looks up mode and approval rules from system prompt reference
    Model->>API: Response or tool calls
    API->>Engine: Response

    Note over Engine: Op::ChangeMode received
    Engine->>Engine: "self.current_mode = NewMode"
    Engine->>Engine: emit_session_updated()
    Note over Engine: Next request tag carries new mode value automatically
Loading

Fix All in Codex Fix All in Claude Code Fix All in Cursor

Reviews (3): Last reviewed commit: "test: add runtime_policy_reference compo..." | Re-trigger Greptile

…ions to system prompt

- Add render_runtime_policy_reference() in prompts.rs containing all
  mode and approval policy descriptions in the frozen system-prompt
  prefix (sent once per session, cache-hit thereafter).
- Simplify runtime_prompt_text() from ~500-token XML block to a ~16-token
  self-closing tag (<runtime_prompt visibility="internal" mode="..." approval="..."/>).
- Fix markdown heading hierarchy in all prompts/modes/*.md and
  prompts/approvals/*.md (## → #####) to nest correctly under ####.
- Remove now-unused legacy functions: mode_prompt(),
  approval_prompt_for_mode(), mode_change_runtime_message().
- Simplify Op::ChangeMode: no longer persists a mode_change event
  (next turn tag carries the current mode).
- Update and rename affected tests.

Builds on Hmbown#2801. Reduces per-request runtime prompt overhead by 97%
(~471 tokens saved per API call). System prompt grows by ~1325 tokens
in the frozen prefix (one-time miss cost); break-even at 3 API calls.
@github-actions

github-actions Bot commented Jun 7, 2026

Copy link
Copy Markdown

Thanks @LeoAlex0 for taking the time to contribute.

This repository is currently observing a maintainer-managed contribution gate in dry-run mode, so this pull request is staying open. When enforcement is enabled, pull requests from contributors who are not listed in .github/APPROVED_CONTRIBUTORS will be closed automatically.

Please read CONTRIBUTING.md for the expected contribution shape. A maintainer can grant PR access by commenting /lgtm on a pull request.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes token usage by moving detailed mode and approval policy descriptions into a static reference block within the system prompt, replacing the verbose per-turn messages with a minimal <runtime_prompt> tag. Feedback focuses on improving Markdown rendering and hierarchy, specifically by ensuring proper blank lines before headings in the generated policy reference and demoting subheadings in agent.md to level 6 to match the demoted main heading.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread crates/tui/src/prompts.rs
Comment thread crates/tui/src/prompts/modes/agent.md Outdated
Comment thread crates/tui/src/prompts/modes/agent.md Outdated
Comment thread crates/tui/src/prompts.rs Outdated
Comment thread crates/tui/src/core/engine/tests.rs
Comment thread crates/tui/src/prompts.rs Outdated
LeoAlex0 added 5 commits June 7, 2026 15:15
…ug_assert

- Add proper blank lines (\n\n) before mode headings in
  render_runtime_policy_reference (CommonMark/GFM compliance).
- Demote subheadings in agent.md from ##### to ###### so they
  nest correctly under the demoted main heading.
- Add debug_assert! in taxonomy_body() to loudly fail when
  render_core_tool_taxonomy_block format changes, preventing
  silent heading-hierarchy breakage.
…ender_core_tool_taxonomy_body

- Add render_core_tool_taxonomy_body(mode) that generates the tool
  taxonomy text without the ## Core Tool Taxonomy heading.
- Refactor render_core_tool_taxonomy_block to use the body function
  internally (DRY).
- Delete taxonomy_body() — a downstream strip_prefix hack that
  worked around the source format instead of fixing it.
- Also removes the now-unnecessary debug_assert! (over-defensive,
  since the two functions are co-located in the same file).
…dy variant

- Replace the 2 remaining test callers with render_core_tool_taxonomy_body
  (neither test depends on the ## heading — they check content only).
- Delete render_core_tool_taxonomy_block — zero production callers after
  the previous refactor.
- Inline mode_prompt_marker_value and approval_prompt_marker_value into
  runtime_prompt_text (each called exactly once).
- Remove default_approval_mode_for_mode — zero callers.
…e dispatch

- Rename mode_change_op_updates_current_mode_and_emits_session_updated
  to current_mode_field_assignment_takes_effect_synchronously.
- The test directly mutates engine.current_mode, not through Op::ChangeMode.
  The dispatch path is separately covered by
  change_mode_op_updates_current_mode_and_emits_status.
@LeoAlex0 LeoAlex0 marked this pull request as ready for review June 7, 2026 07:30
…eMode tests, fix outdated comments

- Add runtime_policy_reference_is_included_in_full_prompt test to verify
  that render_runtime_policy_reference() output lands in the composed
  system prompt. Guards against silent breakage if the push_str() call
  is accidentally removed (all existing tests would still pass).

- Strengthen change_mode_op_updates_current_mode_and_emits_status:
  destructure SessionUpdated to assert that session messages do NOT
  contain <runtime_prompt> tags after mode change — verifying the core
  invariant that Op::ChangeMode does not write session history.

- Extend current_mode_field_assignment_takes_effect_synchronously:
  now also verifies that messages_with_turn_metadata() produces the
  correct runtime tag (mode="yolo" approval="auto") after a mode
  switch, covering the tag-generation mechanism end-to-end.

- Fix outdated comments in composed_prompt_no_longer_inlines_tool_taxonomy
  and plan_prompt_taxonomy_omits_run_tests: replace stale references to
  deleted <mode_prompt> metadata with accurate descriptions of the
  ## Runtime Policy Reference section.
Hmbown added a commit that referenced this pull request Jun 7, 2026
Mechanical rustfmt of the runtime_prompt tests rewritten in PR #2874
(LeoAlex0). No logic change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Hmbown Hmbown merged commit 3619962 into Hmbown:codex/v0.9.0-stewardship Jun 7, 2026
2 checks passed
@LeoAlex0 LeoAlex0 deleted the feat/slim-runtime-prompt branch June 8, 2026 02:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants