feat(cache): slim runtime_prompt to minimal tag, move policy descriptions to system prompt#2874
Conversation
…ions to system prompt - Add render_runtime_policy_reference() in prompts.rs containing all mode and approval policy descriptions in the frozen system-prompt prefix (sent once per session, cache-hit thereafter). - Simplify runtime_prompt_text() from ~500-token XML block to a ~16-token self-closing tag (<runtime_prompt visibility="internal" mode="..." approval="..."/>). - Fix markdown heading hierarchy in all prompts/modes/*.md and prompts/approvals/*.md (## → #####) to nest correctly under ####. - Remove now-unused legacy functions: mode_prompt(), approval_prompt_for_mode(), mode_change_runtime_message(). - Simplify Op::ChangeMode: no longer persists a mode_change event (next turn tag carries the current mode). - Update and rename affected tests. Builds on Hmbown#2801. Reduces per-request runtime prompt overhead by 97% (~471 tokens saved per API call). System prompt grows by ~1325 tokens in the frozen prefix (one-time miss cost); break-even at 3 API calls.
|
Thanks @LeoAlex0 for taking the time to contribute. This repository is currently observing a maintainer-managed contribution gate in dry-run mode, so this pull request is staying open. When enforcement is enabled, pull requests from contributors who are not listed in Please read |
There was a problem hiding this comment.
Code Review
This pull request optimizes token usage by moving detailed mode and approval policy descriptions into a static reference block within the system prompt, replacing the verbose per-turn messages with a minimal <runtime_prompt> tag. Feedback focuses on improving Markdown rendering and hierarchy, specifically by ensuring proper blank lines before headings in the generated policy reference and demoting subheadings in agent.md to level 6 to match the demoted main heading.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
…ug_assert - Add proper blank lines (\n\n) before mode headings in render_runtime_policy_reference (CommonMark/GFM compliance). - Demote subheadings in agent.md from ##### to ###### so they nest correctly under the demoted main heading. - Add debug_assert! in taxonomy_body() to loudly fail when render_core_tool_taxonomy_block format changes, preventing silent heading-hierarchy breakage.
…ender_core_tool_taxonomy_body - Add render_core_tool_taxonomy_body(mode) that generates the tool taxonomy text without the ## Core Tool Taxonomy heading. - Refactor render_core_tool_taxonomy_block to use the body function internally (DRY). - Delete taxonomy_body() — a downstream strip_prefix hack that worked around the source format instead of fixing it. - Also removes the now-unnecessary debug_assert! (over-defensive, since the two functions are co-located in the same file).
…dy variant - Replace the 2 remaining test callers with render_core_tool_taxonomy_body (neither test depends on the ## heading — they check content only). - Delete render_core_tool_taxonomy_block — zero production callers after the previous refactor.
- Inline mode_prompt_marker_value and approval_prompt_marker_value into runtime_prompt_text (each called exactly once). - Remove default_approval_mode_for_mode — zero callers.
…e dispatch - Rename mode_change_op_updates_current_mode_and_emits_session_updated to current_mode_field_assignment_takes_effect_synchronously. - The test directly mutates engine.current_mode, not through Op::ChangeMode. The dispatch path is separately covered by change_mode_op_updates_current_mode_and_emits_status.
…eMode tests, fix outdated comments - Add runtime_policy_reference_is_included_in_full_prompt test to verify that render_runtime_policy_reference() output lands in the composed system prompt. Guards against silent breakage if the push_str() call is accidentally removed (all existing tests would still pass). - Strengthen change_mode_op_updates_current_mode_and_emits_status: destructure SessionUpdated to assert that session messages do NOT contain <runtime_prompt> tags after mode change — verifying the core invariant that Op::ChangeMode does not write session history. - Extend current_mode_field_assignment_takes_effect_synchronously: now also verifies that messages_with_turn_metadata() produces the correct runtime tag (mode="yolo" approval="auto") after a mode switch, covering the tag-generation mechanism end-to-end. - Fix outdated comments in composed_prompt_no_longer_inlines_tool_taxonomy and plan_prompt_taxonomy_omits_run_tests: replace stale references to deleted <mode_prompt> metadata with accurate descriptions of the ## Runtime Policy Reference section.
Mechanical rustfmt of the runtime_prompt tests rewritten in PR #2874 (LeoAlex0). No logic change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary / 概要
#2801 moved mode/approval policy descriptions out of the byte-stable system prompt into a per-turn
<runtime_prompt>transient message. While it solved the prefix-cache invalidation problem, it introduced a recurring cost: the full policy text (~500 tokens) was repeated on every API request — including tool-call continuations within the same turn.This PR restructures the information architecture: static policy descriptions now live in the frozen system-prompt prefix (sent once per session, cache-hit thereafter), and only a minimal tag is sent per turn.
#2801 将 mode/approval 策略描述从稳定的 system prompt 中移出,改为 per-turn transient message。虽然解决了 prefix-cache 失效问题,但引入了新成本:完整策略文本(~500 tokens)在每个 API 请求都重复发送,包括同一 turn 内的 tool-call 续轮。
本 PR 重构信息架构:静态策略描述放入 frozen system-prompt prefix(每 session 发送一次,后续请求全部 cache hit),每 turn 只发送最小 tag。
Scope / 范围
render_runtime_policy_reference()— a static reference block containing all mode + approval policy descriptions, plus a tag-interpretation protocol. / 新增render_runtime_policy_reference(),生成包含所有 mode + approval 策略描述的静态参考块,附带 tag 解释协议。runtime_prompt_text()from ~500-token XML block to ~16-token self-closing tag:<runtime_prompt visibility="internal" mode="yolo" approval="auto"/>. / 将runtime_prompt_text()从 ~500-token XML 块简化为 ~16-token 自闭合标签。.mdfiles (##→#####) so they nest correctly under####in the reference section. / 修正 6 个.md文件的标题层次 (##→#####),确保在 reference 章节中正确嵌套。mode_prompt(),approval_prompt_for_mode(),mode_change_runtime_message(). / 删除不再使用的遗留函数。Op::ChangeMode— no longer persists a separate mode_change event; the next turn tag carries the current mode. / 简化Op::ChangeMode,不再持久化额外的 mode_change 事件,下一轮 tag 即携带当前 mode。Not in this slice / 不涉及
Tradeoff / 权衡
Core tradeoff: system prompt grows by 1,325 tokens (one-time miss) in exchange for saving 471 tokens per API call. Net positive after 3 API calls. The model shifts from receiving explicit policy text every turn to looking up rules by tag — the reference section includes an explicit protocol instruction. Structural correctness is verified by tests; production behavioral stability needs follow-up observation.
核心权衡:system prompt 膨胀 1,325 tokens(仅首次 miss),换取每轮节省 471 tokens。3 次 API 调用后净正收益。模型从"每轮显式接收策略描述"改为"按 tag 查表"——参考章节中有明确的协议指引。结构性正确性已由测试验证;生产行为稳定性需要后续观测。
Builds on / 基于
feat(cache): project mode prompts per request) — merged predecessor that established the per-turn<runtime_prompt>architecture / 已合并的前置 PR,确立了 per-turn<runtime_prompt>架构Validation / 验证
Files changed / 改动文件: 9 files, +143 / −170 lines
Greptile Summary
This PR restructures the runtime-policy information architecture by moving all mode and approval-policy descriptions from a per-turn
<runtime_prompt>XML block (~500 tokens, re-sent on every API call) into a static## Runtime Policy Referencesection in the frozen system-prompt prefix, and replacing the per-turn block with a minimal self-closing tag (~16 tokens). The change saves roughly 72% of per-session miss tokens (break-even at 3 API calls) at the cost of a one-time +1,325 token system-prompt miss.render_runtime_policy_reference()added toprompts.rs— generates a lookup table listing all three modes and all three approval policies; inserted at step 5a of the frozen-prefix composition pipeline so it is byte-stable across turns.runtime_prompt_text()inengine.rssimplified to<runtime_prompt visibility="internal" mode="…" approval="…"/>, removingmode_change_runtime_messageand the associated per-mode re-evaluation hints fromOp::ChangeMode..mdfiles had heading levels adjusted (##→#####/######) to nest correctly under the####-level subsections in the reference block; two tests were rewritten and five assertions updated to match the new tag format.Confidence Score: 5/5
Safe to merge. The structural refactor is self-consistent: approval-mode constraints (YOLO→auto, Plan→never) are still enforced by
approval_mode_for()beforeruntime_prompt_text()is called, and the reference block is always present in the full system prompt used by the engine.All changed code paths are covered by the updated tests. The removed
mode_change_runtime_messagewas a behavioral notification, not a structural safety check, and its removal is explicitly acknowledged as a follow-up observation item. No data-loss or session-corruption paths were identified.crates/tui/src/core/engine/tests.rs — the two rewritten test functions have some weaknesses in assertion coverage for the mode-change path, as noted in inline comments.
Important Files Changed
Sequence Diagram
sequenceDiagram participant Engine participant SP as "System Prompt (frozen prefix)" participant API as "LLM API" participant Model Note over SP: render_runtime_policy_reference() contains all mode and approval descriptions. Sent once per session. Engine->>API: Request (system_prompt + session_messages + runtime_tag) Note over API: system_prompt is byte-stable, cache hits after first call API->>Model: system_prompt cached ~9325 tokens API->>Model: session_messages history API->>Model: runtime_prompt_message() ~16 tokens misses every time Model->>Model: Looks up mode and approval rules from system prompt reference Model->>API: Response or tool calls API->>Engine: Response Note over Engine: Op::ChangeMode received Engine->>Engine: "self.current_mode = NewMode" Engine->>Engine: emit_session_updated() Note over Engine: Next request tag carries new mode value automaticallyReviews (3): Last reviewed commit: "test: add runtime_policy_reference compo..." | Re-trigger Greptile