Remove Normal mode and consolidate to Agent by Hmbown · Pull Request #4 · Hmbown/CodeWhale

Hmbown · 2026-03-12T16:17:54Z

Summary

remove Normal as a visible mode and consolidate compatibility behavior into Agent
normalize legacy default_mode = "normal" settings and report the normalized saved value in /set
carry the current TUI, onboarding, layout, and palette refinements in the approved worktree into one PR
restore the Alt+4 -> Plan shortcut and add focused regression coverage

Validation

cargo fmt --all --check
cargo check
cargo test --workspace --all-features
cargo test -p deepseek-tui alt_4_switches_to_plan_mode
cargo test -p deepseek-tui ctrl_alt_4_focuses_agents_sidebar_without_switching_modes

Keep legacy /normal and settings fallback behavior mapped to Agent, align docs around the three visible modes, and include the current TUI and onboarding refinements in this worktree. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR removes Normal as a visible TUI mode, consolidates legacy/compat behavior into Agent, and carries a set of TUI/layout/palette refinements (composer panel, status strip, transcript spacing, onboarding copy), with regression tests for Alt+4 mode switching behavior.

Changes:

Remove visible Normal mode, keep hidden /normal alias and normalize default_mode = "normal" → agent (including /set reporting the normalized saved value).
Refresh TUI UI/UX: composer becomes a bordered panel with density controls; status strip becomes a concise summary + optional detail lines; transcript separators become configurable spacing.
Add configuration knobs (calm_mode, low_motion, composer_density, transcript_spacing) and regression coverage for Alt+4 behaviors.

Reviewed changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
docs/MODES.md	Updates mode documentation to reflect 3 visible modes + compatibility behavior.
docs/CONFIGURATION.md	Documents `default_mode` normalization and clarifies visible modes and upgrade notes.
crates/tui/src/tui/widgets/mod.rs	Adds empty-state rendering and reworks composer rendering/layout helpers (panel, padding, density).
crates/tui/src/tui/widgets/header.rs	Reworks header to show segmented mode “tabs” and workspace/model context.
crates/tui/src/tui/views/mod.rs	Removes `/normal` from help grouping; adds config rows and refreshes help text.
crates/tui/src/tui/ui/tests.rs	Adds regression tests for `Alt+4` shortcuts; updates status layout tests to new API/behavior.
crates/tui/src/tui/ui.rs	Consolidates `Alt+4` handling, updates mode toggles, overhauls status indicator rendering, adds low-motion polling/animation controls, updates footer hinting, and changes tool-details pager behavior.
crates/tui/src/tui/transcript.rs	Replaces horizontal separators with configurable blank-line spacing (`TranscriptSpacing`).
crates/tui/src/tui/onboarding/welcome.rs	Simplifies welcome copy and removes ASCII logo.
crates/tui/src/tui/onboarding/mod.rs	Wraps onboarding in a bordered/padded panel and updates tips copy.
crates/tui/src/tui/history.rs	Major transcript rendering refresh: calmer tool cards, low-motion symbols, condensed thinking blocks, and new render helpers.
crates/tui/src/tui/command_palette.rs	Removes `normal` from directly-executed palette commands (keeps as hidden compatibility elsewhere).
crates/tui/src/tui/app.rs	Removes `AppMode::Normal`, adds `ComposerDensity`/`TranscriptSpacing`, adds `calm_mode`/`low_motion`, and updates mode cycling semantics.
crates/tui/src/tools/registry.rs	Updates comment to remove Normal-mode mention.
crates/tui/src/settings.rs	Adds new settings, normalizes legacy `normal` → `agent`, and validates new density/spacing keys.
crates/tui/src/runtime_threads.rs	Drops explicit parsing of `normal` mode (falls back to Agent).
crates/tui/src/prompts.rs	Treats “normal” system prompt as Agent; keeps `NORMAL_PROMPT` as dead-code-allowed constant.
crates/tui/src/palette.rs	Adds new palette constants and removes `MODE_NORMAL`.
crates/tui/src/hooks.rs	Updates ModeChange doc comment to reflect visible modes.
crates/tui/src/commands/mod.rs	Removes `/normal` from public command list but keeps it as a hidden compatibility command.
crates/tui/src/commands/core.rs	Updates home dashboard mode tips and tests to remove Normal.
crates/tui/src/commands/config.rs	Implements hidden `/normal` alias → Agent and `/set default_mode normal --save` normalized reporting + test.
README.md	Updates README to “three visible modes” and removes Normal references.
.trimtab/init-trimtab-protocol.md	Updates Trimtab protocol note for modes + compatibility aliasing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-12T16:22:14Z

+        let top_padding = if self.app.input.is_empty() {
+            let empty_lines = if self.app.input_history.is_empty() && input_rows_budget > 1 {
+                2
+            } else {
+                1
+            };
+            composer_vertical_padding(empty_lines, input_rows_budget)
+        } else {
+            let visible_count = wrap_input_lines(&self.app.input, content_width)
+                .len()
+                .max(1);
+            composer_vertical_padding(visible_count.min(input_rows_budget), input_rows_budget)
+        };


ComposerWidget::cursor_pos() computes top_padding for an empty input using empty_lines = 2 when input_history is empty, but render() always renders exactly 1 placeholder line in that state. This makes the cursor Y offset inconsistent with the rendered content (cursor ends up one row above the placeholder), which will be visible on a fresh install / first run.

Consider deriving top_padding from the same input_lines.len() logic used in render() (or centralizing the padding calculation in a helper) so render() and cursor_pos() stay in sync for the empty-input case.

Suggested change

let top_padding = if self.app.input.is_empty() {

let empty_lines = if self.app.input_history.is_empty() && input_rows_budget > 1 {

2

} else {

1

};

composer_vertical_padding(empty_lines, input_rows_budget)

} else {

let visible_count = wrap_input_lines(&self.app.input, content_width)

.len()

.max(1);

composer_vertical_padding(visible_count.min(input_rows_budget), input_rows_budget)

};

let visible_count = wrap_input_lines(&self.app.input, content_width)

.len()

.max(1);

let top_padding =

composer_vertical_padding(visible_count.min(input_rows_budget), input_rows_budget);

gemini-code-assist · 2026-03-12T16:22:28Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on simplifying the user experience by removing the 'Normal' mode and consolidating its functionality into the 'Agent' mode. It also includes several UI/UX improvements and restores a key shortcut. The changes ensure consistent behavior across different configurations and enhance the overall usability of the application.

Highlights

Mode Consolidation: Removed the 'Normal' mode, streamlining the user interface and consolidating its functionality into the 'Agent' mode.
Legacy Settings Normalization: Normalized legacy default_mode = "normal" settings to 'Agent' mode, ensuring consistent behavior across different configurations.
UI/UX Improvements: Integrated TUI, onboarding, layout, and palette refinements from the approved worktree, enhancing the user experience.
Shortcut Restoration: Restored the Alt+4 -> Plan shortcut and added focused regression coverage to ensure its reliability.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Activity

cargo fmt --all --check
cargo check
cargo test --workspace --all-features
cargo test -p deepseek-tui alt_4_switches_to_plan_mode
cargo test -p deepseek-tui ctrl_alt_4_focuses_agents_sidebar_without_switching_modes

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

devin-ai-integration

Devin Review found 1 potential issue.

View 8 additional findings in Devin Review.

devin-ai-integration · 2026-03-12T16:24:15Z

🟡 low_motion setting ignored for Tool cells in default configuration

When show_tool_details is true (default) and calm_mode is false (default), HistoryCell::Tool cells fall through to the _ => self.lines(width) catch-all in lines_with_options() at crates/tui/src/tui/history.rs:156. The lines() method hard-codes low_motion: false at crates/tui/src/tui/history.rs:113, so the user's low_motion preference is silently discarded. This means running-tool spinner animations will still animate even when the user has explicitly enabled reduced motion. The low_motion flag only takes effect for Tool cells when either show_tool_details=false or calm_mode=true, which are non-default paths.

Affected code path

lines_with_options handles Tool explicitly only for !show_tool_details and calm_mode guards — both pass options.low_motion correctly. But the common case (both defaults) falls through to self.lines(width) which calls cell.lines_with_motion(width, false).

(Refers to line 156)

Was this helpful? React with 👍 or 👎 to provide feedback.

gemini-code-assist

Code Review

This pull request successfully removes the 'Normal' mode and consolidates its behavior into 'Agent' mode, which simplifies the application's state. The backward compatibility for settings and commands is well-handled. The accompanying TUI overhaul is extensive, introducing a more refined and configurable user interface with a new color palette, redesigned components, and improved layout logic. The changes are substantial but appear to be well-structured. I've added a couple of comments with suggestions to improve code cleanliness and maintainability.

gemini-code-assist · 2026-03-12T16:28:57Z

+#[allow(dead_code)]
+pub const SURFACE_PANEL: Color = Color::Rgb(21, 33, 52); // #152134
+#[allow(dead_code)]
+pub const SURFACE_ELEVATED: Color = Color::Rgb(28, 42, 64); // #1C2A40
+#[allow(dead_code)]
+pub const SURFACE_REASONING: Color = Color::Rgb(54, 44, 26); // #362C1A
+#[allow(dead_code)]
+pub const SURFACE_REASONING_ACTIVE: Color = Color::Rgb(68, 53, 28); // #44351C
+#[allow(dead_code)]
+pub const SURFACE_TOOL: Color = Color::Rgb(24, 39, 60); // #18273C
+#[allow(dead_code)]
+pub const SURFACE_TOOL_ACTIVE: Color = Color::Rgb(29, 48, 73); // #1D3049
+#[allow(dead_code)]
+pub const SURFACE_SUCCESS: Color = Color::Rgb(22, 56, 63); // #16383F
+#[allow(dead_code)]
+pub const SURFACE_ERROR: Color = Color::Rgb(63, 27, 36); // #3F1B24


There are several new SURFACE_* color constants that are marked with #[allow(dead_code)] and do not appear to be used within this pull request. To keep the codebase clean and focused on the current changes, it would be best to remove these unused constants. They can be re-introduced in a future PR when they are actually used.

gemini-code-assist · 2026-03-12T16:28:57Z

+fn history_has_live_motion(history: &[HistoryCell]) -> bool {
+    history.iter().any(|cell| match cell {
+        HistoryCell::Thinking { streaming, .. } => *streaming,
+        HistoryCell::Tool(tool) => match tool {
+            ToolCell::Exec(cell) => cell.status == ToolStatus::Running,
+            ToolCell::Exploring(cell) => cell
+                .entries
+                .iter()
+                .any(|entry| entry.status == ToolStatus::Running),
+            ToolCell::PlanUpdate(cell) => cell.status == ToolStatus::Running,
+            ToolCell::PatchSummary(cell) => cell.status == ToolStatus::Running,
+            ToolCell::Review(cell) => cell.status == ToolStatus::Running,
+            ToolCell::DiffPreview(_) => false,
+            ToolCell::Mcp(cell) => cell.status == ToolStatus::Running,
+            ToolCell::ViewImage(_) => false,
+            ToolCell::WebSearch(cell) => cell.status == ToolStatus::Running,
+            ToolCell::Generic(cell) => cell.status == ToolStatus::Running,
+        },
+        _ => false,
+    })
 }


The logic inside history_has_live_motion is quite detailed and couples this function to the internal structure of HistoryCell and ToolCell. To improve encapsulation and maintainability, consider moving this logic into methods on HistoryCell and ToolCell themselves.

For example, you could add a has_live_motion(&self) -> bool method to both enums. This would make history_has_live_motion much simpler:

fn history_has_live_motion(history: &[HistoryCell]) -> bool { history.iter().any(HistoryCell::has_live_motion) }

This change would make the code more object-oriented and easier to reason about.

Honor low_motion in the default tool transcript path and align composer cursor padding with the rendered placeholder. Add focused regression tests for both behaviors. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: address PR #4 follow-ups Honor low_motion in the default tool transcript path and align composer cursor padding with the rendered placeholder. Add focused regression tests for both behaviors. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * lint: remove redundant imports in empty_state test, reuse create_test_app The test had inner `use` statements for Config, App, TuiOptions, and PathBuf that duplicated the module-level test imports. It also manually constructed App instead of calling the existing create_test_app() helper. * fix: replace useless format!("{text}") with text.to_string() in details_affordance_line * test: pin composer_density in cursor test to avoid sensitivity to loaded settings Settings::load() may return a non-default composer_density on some CI environments. Explicitly set ComposerDensity::Comfortable so the expected cursor position is deterministic across all platforms. * fix: make tool low_motion test robust against coarse Windows timers Use a 2× cycle offset so the animated frame index is 2 (maximally distant from 0), giving 1800 ms of headroom before the animation could wrap back to index 0. The previous 1× offset left only ~15 ms of margin, causing flaky failures on Windows where Instant resolution is approximately 15.6 ms. * fix: correct headroom comment in tool animation test (3600ms, not 1800ms) * fix: resolve lint, parity, and Windows test failures - Fix rustfmt line-length issue in history.rs tool animation test - Settings::path() now respects DEEPSEEK_CONFIG_PATH for Windows test compat - doctor_check_mcp_server recognizes Unix-style absolute paths on Windows - Use checked_sub for Instant arithmetic in web_run tests to prevent underflow on freshly-booted Windows CI runners Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: expand ~ in DEEPSEEK_CONFIG_PATH when resolving settings path --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

Keep legacy /normal and settings fallback behavior mapped to Agent, align docs around the three visible modes, and include the current TUI and onboarding refinements in this worktree.

* fix: address PR #4 follow-ups Honor low_motion in the default tool transcript path and align composer cursor padding with the rendered placeholder. Add focused regression tests for both behaviors. * lint: remove redundant imports in empty_state test, reuse create_test_app The test had inner `use` statements for Config, App, TuiOptions, and PathBuf that duplicated the module-level test imports. It also manually constructed App instead of calling the existing create_test_app() helper. * fix: replace useless format!("{text}") with text.to_string() in details_affordance_line * test: pin composer_density in cursor test to avoid sensitivity to loaded settings Settings::load() may return a non-default composer_density on some CI environments. Explicitly set ComposerDensity::Comfortable so the expected cursor position is deterministic across all platforms. * fix: make tool low_motion test robust against coarse Windows timers Use a 2× cycle offset so the animated frame index is 2 (maximally distant from 0), giving 1800 ms of headroom before the animation could wrap back to index 0. The previous 1× offset left only ~15 ms of margin, causing flaky failures on Windows where Instant resolution is approximately 15.6 ms. * fix: correct headroom comment in tool animation test (3600ms, not 1800ms) * fix: resolve lint, parity, and Windows test failures - Fix rustfmt line-length issue in history.rs tool animation test - Settings::path() now respects DEEPSEEK_CONFIG_PATH for Windows test compat - doctor_check_mcp_server recognizes Unix-style absolute paths on Windows - Use checked_sub for Instant arithmetic in web_run tests to prevent underflow on freshly-booted Windows CI runners * fix: expand ~ in DEEPSEEK_CONFIG_PATH when resolving settings path ---------

- Wire 120 FPS FrameRateLimiter into run_event_loop via time_until_next_draw + mark_emitted - Add low_motion support: 30 FPS cap via LOW_MOTION_MIN_FRAME_INTERVAL - Add AdaptiveChunkingPolicy::set_low_motion() to force Smooth mode - Add StreamingState::set_low_motion() to propagate to all block policies - Tool spinner already freezes on first frame when low_motion is set TODO_BACKEND.md §3, TODO_FIXES.md #4

* fix(pricing): extend V4 Pro 75% discount expiry to 2026-05-31 15:59 UTC DeepSeek extended the promotional discount past the original 2026-05-05 cutoff. Without this update the TUI would have started showing 4× the actual billed cost on May 6. Source: https://api-docs.deepseek.com/quick_start/pricing — "extended until 2026/05/31 15:59 UTC". Adds a regression test pinning the new active window so a future revert to the May 5 date trips the suite immediately. Closes #267 * chore: remove stale TODO(integrate) markers from already-integrated modules Five `// TODO(integrate)` comments and one matching "Not yet integrated" note were misleading anyone grepping for integration work. Each module is in fact wired up: - execpolicy/mod.rs → tools/shell.rs:1322 (load_default_policy) - sandbox/mod.rs → tools/shell.rs:28, main.rs:2647, tui/approval.rs:30 - sandbox/policy.rs → main.rs:2752, tui/approval.rs:30 (SandboxPolicy) - command_safety.rs → tools/shell.rs:1321, tools/tasks.rs:13, tools/approval_cache.rs:26 - tui/streaming/mod.rs → tui/app.rs:38 (StreamingState) The remaining TODO at mcp.rs:1771 covers a separate "wire legacy sync API into CLI subcommands or remove" decision and is left in place. Closes #266 * docs(release): add install + dual-binary template to GitHub Release page Closes #265. The Release page used the auto-generated commit-title body. New users hitting the Release page from Twitter / npm-search had no on-page guidance that the dispatcher (`deepseek`) and the TUI runtime (`deepseek-tui`) ship as two binaries that must coexist; #258 was an external user spending 11 minutes figuring this out and #272 was the follow-on confusion. The new body covers: - npm wrapper as the recommended install - `cargo install deepseek-tui-cli deepseek-tui --locked` (both crates) - Manual download with a per-platform table showing both artifacts - sha256 verify using the existing `deepseek-artifacts-sha256.txt` - Changelog link * feat(debug): add /cache command surfacing per-turn DeepSeek cache hit/miss Step 1 of #263. Without per-turn telemetry the prefix-cache audit is unfounded speculation; the rest of the issue's investigation steps depend on this surface. The DeepSeek API already returns `prompt_cache_hit_tokens` and `prompt_cache_miss_tokens` per turn, and we already store the *latest* on App. This adds a 50-turn ring (`turn_cache_history`) populated at the same site as `last_prompt_cache_*_tokens`, plus a `/cache [count]` slash command that renders a fixed-width table of the last N turns with per-turn ratios and a session aggregate. Default count is 10; larger values clamp to the ring size. Edge cases the formatter handles: - No telemetry yet → friendly "no turns recorded" message - `cache_hit_tokens = None` (provider didn't report) → row renders all em-dashes and is excluded from session aggregates so one missing- telemetry turn can't make the average ratio look broken. - `cache_hit_tokens = Some, cache_miss_tokens = None` → infer miss as `input − hit` and mark the cell with `*`. Footer documents the asterisk. - Ring at cap (50) → push evicts oldest. Tests cover all four paths plus the cap. * test(prompts): add cache-prefix stability harness for #263 step 2 The DeepSeek prefix-cache only hits while the byte prefix of each request matches the prior call. Anything in the cached prefix that varies turn-to-turn for unchanged inputs is a cache buster. Adds a focused harness next to the production surface so the property is regression-guarded: 1. `first_divergence(a, b)` helper that returns the first divergent byte position with a `±32 byte` window of context, used by the custom assertion `assert_byte_identical`. Future suspect tests can reuse this to surface "where" rather than just "fail". 2. `compose_prompt_is_byte_stable_across_calls` — sweeps every (mode, personality) pair and pins that two consecutive calls produce identical bytes. Rules out suspect #4 (mode-prompt churn). 3. `system_prompt_for_mode_with_context_is_byte_stable_for_unchanged_workspace` — the call site `engine.rs::build_tool_context` actually invokes, pinned for an empty workspace across all three modes. 4. `system_prompt_with_working_set_summary_is_byte_stable_for_constant_summary` — pins that the surrounding prompt construction faithfully embeds the working_set summary it's given without injecting extra non-determinism. (The actual working_set summary stability lives in `working_set.rs` and is the next investigation target — see issue note in PR description.) Foundation for the suspect-by-suspect bisection in the rest of #263. * fix(secrets): never overwrite the secrets file when load_unlocked errors `FileKeyringStore::set` and `delete` did `self.load_unlocked().unwrap_or_default()`, which wiped every existing secret if the read failed for any reason other than \"file is missing\": - file mode != 0600 (`InsecurePermissions`) — easy on headless / CI environments where a permissive umask got applied - corrupt JSON - transient I/O error In all of those, the next `store_unlocked` overwrote the file with an empty-or-single-entry blob and reset perms to 0600, silently losing every other provider's key. Switch both call sites to `?`. `load_unlocked` already returns `Ok(default)` for a missing file, so the first-write-creates-the-file ergonomic is preserved (covered by the new `file_store_set_still_creates_file_when_missing` test). Adds four regression tests: - set: insecure perms surface InsecurePermissions and leave the file byte-identical. - delete: same. - set: corrupt JSON surfaces the parse error and leaves the file byte-identical. - set: missing file path still works (idempotence guard). Closes #281 * fix(cache): make tool catalog byte-stable across calls and sessions DeepSeek's KV prefix cache hits on the longest matching byte prefix of the request. Two places in the tool-array path were silently introducing divergence: 1. `ToolRegistry::to_api_tools()` iterated `self.tools.values()` directly. Rust's default `HashMap` is seeded with `RandomState` per process, so every `deepseek` launch produced a different tool order — the cross- session resume case (the one with the biggest cache wins) never hit. 2. `active_tool_list_from_catalog()` filtered the catalog `Vec` by the active set in catalog order. When ToolSearch activated a previously- deferred tool mid-conversation, the new tool appeared at its catalog index, shifting every later tool's byte offset and busting the cached prefix from there onwards. Fixes: - `to_api_tools()` now sorts by tool name before emitting the API tool array. Stable across calls AND across launches. - `build_model_tool_catalog()` sorts each partition (built-ins first, contiguous; MCP tools after, also alphabetical). Mirrors Claude Code's `assembleToolPool` strategy where they explicitly call out cache stability as the reason: "a flat sort would interleave MCP tools into built-ins and invalidate all downstream cache keys whenever an MCP tool sorts between existing built-ins." - `active_tool_list_from_catalog()` puts always-loaded tools in catalog order at the head and deferred-but-now-active tools at the tail. A deferred-tool activation during ToolSearch no longer shifts earlier tools' positions. Adds three regression tests: - `to_api_tools_emits_alphabetical_order_regardless_of_registration_order` - `model_tool_catalog_sorts_each_partition_for_prefix_cache_stability` - `active_tool_list_pushes_deferred_activations_to_the_tail` Refs #263. Findings produced by reading reference Claude Code source side-by-side with our request-building flow; full delta analysis in the PR description. * fix(sandbox): elevate Agent-mode shell sandbox to allow network access The seatbelt-default policy is `WorkspaceWrite { network_access: false }`, which on macOS emits `(deny default)` with no `(allow network-outbound)` / `(allow system-socket)`. Every outbound socket call from a sandboxed shell command — including `getaddrinfo` for DNS — gets denied by the kernel. Symptom: "DNS resolution failed" for any URL the model tries to reach via curl, yt-dlp, package managers, etc. Engine.build_tool_context only elevated the policy in Yolo mode, leaving Agent mode (the default) stuck on the strict default. That's tighter than competitors (Claude Code, Codex) without buying any safety the application-level NetworkPolicy or the approval flow doesn't already provide. Switch the elevation to a `match` so: - Plan → no elevation (read-only investigation; shell tool not registered) - Agent → WorkspaceWrite { network_access: true, … } - Yolo → WorkspaceWrite { network_access: true, … } (unchanged) Adds `agent_and_yolo_modes_elevate_shell_sandbox_to_allow_network` so a future revert to the no-network default trips CI immediately. Closes #273 * fix(skills): treat bare github.com/<owner>/<repo> URLs as GitHubRepo Closes #269. `/skill install https://github.com/obra/superpowers` failed on every platform with `invalid gzip header`. Root cause: `InstallSource::parse` matched any `https://`-prefixed spec as `DirectUrl`, so the installer downloaded the HTML repo page (200 OK, `text/html`) and tried to gzip-decode HTML. The user reported it from Win11 + PowerShell but the parse path is platform-independent. Recognize bare GitHub repo URLs in `InstallSource::parse`: - `https://github.com/<owner>/<repo>` - `https://github.com/<owner>/<repo>/` - `https://github.com/<owner>/<repo>.git` - `https://github.com/<owner>/<repo>.git/` - `https://www.github.com/<owner>/<repo>` - `http://github.com/<owner>/<repo>` (legacy) …all route to the existing `GitHubRepo` source, which already produces `https://github.com/<repo>/archive/refs/heads/{main,master}.tar.gz` candidates with proper fallback. URLs with a third path segment (`/archive/...`, `/blob/...`, `/tree/...`) keep going through `DirectUrl` because the user picked that exact path. Adds two regression tests: one asserting the seven recognised forms all canonicalize to `github:obra/superpowers`, and one pinning the sub-resource paths to `DirectUrl`. * fix(cache): drop volatile fields from working_set summary block (#280) (#287) The working-set summary lands inside the system prompt before the historical conversation, so any byte that drifts there cache-misses everything that follows in DeepSeek's KV prefix cache. Two sources of turn-over-turn drift are removed: 1. The rendered line is now `- {path} ({kind})`. The previous form interpolated `entry.touches` and `self.turn - entry.last_turn`, both of which advance on every user message even when no new paths are observed. 2. A new `sorted_for_prompt` helper sorts by (touches DESC, path ASC) instead of the turn-aware `sorted_entries`. The recency bonus in `score_entry` crosses bucket boundaries as turns advance, so even without rendering `last seen` the order — and which entries cross the `max_prompt_entries` cutoff — drifted. Compaction pinning still uses `sorted_entries` because it genuinely wants recency. Adds a regression test that observes a fixed message set, calls `summary_block` before and after `next_turn()`, and asserts the two outputs are byte-identical. The shared `first_divergence` / `assert_byte_identical` helpers (from #279) move from `prompts::tests` into `test_support` so working_set tests can reuse them. Closes #280. * fix(cache): memoise tool catalog so descriptions stay byte-stable (#289) `to_api_tools` previously re-sampled `tool.description()` and `tool.input_schema()` on every call. Native tools return `&'static str` and a `json!` literal, so the bytes were stable in practice — but the `McpToolAdapter` returns `self.tool.description.as_deref()`, which can drift when the upstream MCP server reconnects with a different description string. Any drift mid-session rewrites the tool catalog that lands in the cached prefix and busts every byte that follows. Adds an `api_cache: OnceLock<Vec<Tool>>` field on `ToolRegistry`. The first `to_api_tools` call materialises the catalog; subsequent calls return a clone of the cached vector. Mutations (`register`, `remove`, `clear`) reset the field so the next read rebuilds. Mirrors reference-cc's `getToolSchemaCache` (`utils/api.ts:119–208`). Tests: - `to_api_tools_pins_description_bytes_across_calls` registers a tool whose `description()` advances through a script of pre-built strings on each call. After the cache is populated, the second `to_api_tools` read returns the original description because `description()` is no longer invoked. Without the cache the second read would return the next script entry. - `register_invalidates_api_tools_cache` registers a tool, snapshots, registers another, snapshots again, and asserts the second snapshot reflects both tools (cache rebuilt) and that the varying tool's description advanced (proving the rebuild actually re-sampled). - `remove_and_clear_invalidate_api_tools_cache` covers the other two invalidation paths. * fix(cache): sort project_tree and summarize_project output (#290) Both helpers walked the workspace via `ignore::WalkBuilder::build()` and emitted entries in the OS readdir order — non-deterministic across filesystems (htree-hash on ext4, insertion-order on APFS, etc.). Their output lands in the fallback branch of the system prompt's project context (when the workspace has no AGENTS.md / CLAUDE.md) and inside the `project_map` tool surface, both of which feed the cached prefix. `summarize_project` now sorts the collected key-files list before the type-detection logic and the fallback `Project with key files: …` join. `project_tree` collects `(rel_path, is_dir)` tuples, sorts by full path, and only then formats the indented tree. Sorting by full path preserves the visual tree shape — `"src" < "src/lib.rs"` because the shorter string compares less — while making siblings deterministic. Tests cover sibling order, parent-before-children invariant, byte stability across two consecutive calls, and the fallback `Project with key files:` branch (the only branch where the joined order escapes into output without further sorting downstream). * fix(client): unique fallback id for parallel streaming tool calls (#291) When a streamed tool_call delta omits the `id` field, the chat-completion decoder used to fall back to the literal string `"tool_call"` for every call. With the V4 API's native parallel tool calls (multiple tool_calls in one delta), every parallel call ended up with the same fallback id — downstream tool-result routing then matched the first call's result twice and the second call hung waiting for an answer that never arrived. The fallback now indexes by the assigned `content_block` position, producing `"call_0"`, `"call_1"`, … within a single response. Upstream- supplied ids are still forwarded verbatim; only the fallback path changes. Tests pin both invariants: - `decoder_assigns_unique_fallback_ids_to_parallel_tool_calls_missing_id` feeds two tool calls without `id` in one delta and asserts they get distinct ids. - `decoder_preserves_upstream_tool_call_id_when_present` keeps the forward-as-is path honest. * fix(cache): place handoff and working_set after static prompt blocks (#292) * fix(cache): drop volatile fields from working_set summary block (#280) The working-set summary lands inside the system prompt before the historical conversation, so any byte that drifts there cache-misses everything that follows in DeepSeek's KV prefix cache. Two sources of turn-over-turn drift are removed: 1. The rendered line is now `- {path} ({kind})`. The previous form interpolated `entry.touches` and `self.turn - entry.last_turn`, both of which advance on every user message even when no new paths are observed. 2. A new `sorted_for_prompt` helper sorts by (touches DESC, path ASC) instead of the turn-aware `sorted_entries`. The recency bonus in `score_entry` crosses bucket boundaries as turns advance, so even without rendering `last seen` the order — and which entries cross the `max_prompt_entries` cutoff — drifted. Compaction pinning still uses `sorted_entries` because it genuinely wants recency. Adds a regression test that observes a fixed message set, calls `summary_block` before and after `next_turn()`, and asserts the two outputs are byte-identical. The shared `first_divergence` / `assert_byte_identical` helpers (from #279) move from `prompts::tests` into `test_support` so working_set tests can reuse them. Closes #280. * fix(cache): place handoff and working_set after static prompt blocks `system_prompt_for_mode_with_context_and_skills` previously interleaved volatile content into the static prefix: 1. mode prompt static 2. project context static 3. working_set_summary ← volatile 4. skills_block static 5. handoff_block ← volatile 6. ## Context Management static 7. COMPACT_TEMPLATE static Anything past byte (3) cache-missed every time the working-set drifted or `/compact` rewrote `.deepseek/handoff.md` — including the static `## Context Management` and `## Compaction Handoff` blocks behind them. New order keeps every static block in the cached prefix and pushes the two volatile blocks to the end: 1. mode prompt 2. project context (or fallback automap) 3. skills block 4. ## Context Management (Agent / Yolo only) 5. COMPACT_TEMPLATE ── volatile boundary ── 6. handoff block 7. working-set summary Adds a doc comment on the function describing the volatile-content-last invariant so future contributors don't reintroduce churn into the prefix. Adds two regression tests: - `system_prompt_with_handoff_file_is_byte_stable_when_file_is_unchanged` pins the handoff path with a fixture file. - `handoff_and_working_set_appear_after_static_blocks` asserts the ordering invariant directly so a future reorder fails loudly. Reference: Claude Code's own prompt builder marks this same boundary with a `SYSTEM_PROMPT_DYNAMIC_BOUNDARY` constant; we don't introduce the abstraction yet but match the principle. * feat(i18n): localize slash command help (Phase 1a, #285) (#294) Adds 44 new MessageIds, one per slash command, and translations to all four shipped locales (en/ja/zh-Hans/pt-BR). Refactors CommandInfo so the English description now lives in localization.rs (single source of truth) instead of being duplicated on the struct, and threads the active Locale through the three render surfaces: - crates/tui/src/tui/views/help.rs (the ?/F1/Ctrl+/ help overlay) - crates/tui/src/tui/command_palette.rs (Ctrl+K palette) - crates/tui/src/commands/core.rs (the /help text command) Usage strings (e.g. /cache [count]) stay English by design — they're placeholder syntax, not natural language. The existing locale-coverage test (`shipped_first_pack_has_no_missing_core_messages`) already iterates ALL_MESSAGE_IDS across Locale::shipped(), so the 44 new IDs are automatically required to be present in all four locale arms or CI fails. This is the first of several incremental Phase 1 PRs. Phase 1b covers the debug commands (/tokens /cost /cache), 1c the footer hints, and 1d doctor output. Phases 2–3 cover onboarding and error surfaces. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(i18n): localize /tokens /cost /cache debug output (Phase 1b, #285) (#295) Adds 13 new MessageIds covering the report templates and the sub-strings shared across them, with translations for all four shipped locales (en/ja/zh-Hans/pt-BR): - CmdTokensReport, CmdTokensContextWithWindow, CmdTokensContextUnknownWindow - CmdTokensCacheBoth, CmdTokensCacheHitOnly, CmdTokensCacheMissOnly - CmdTokensNotReported - CmdCostReport - CmdCacheNoData, CmdCacheHeader, CmdCacheTotals, CmdCacheFootnote, CmdCacheAdvice Each template uses {placeholder} substitution via String::replace rather than format!, since format! requires a literal — the locale-resolved &'static str isn't one. The placeholder convention ({active}, {hit}, {miss}, …) means a translator can re-order or restructure a sentence freely without changing the call site. Helpers `token_count`, `active_context_summary`, `cache_summary`, and `format_cache_history` now take `Locale` so each can resolve their templates from the same source of truth. The English templates byte-match the previous hardcoded format strings so the existing 16 debug-command tests pass unchanged. Column headers in the cache table (`turn in out hit miss …`) are intentionally NOT localized — the body rows are formatted with fixed column widths and translating the header words would break alignment. Numbers, ratios, and the model id stay in English form. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(i18n): localize footer state + help section labels (Phase 1c, #285) (#296) Adds 11 new MessageIds covering visible footer chrome and the help-overlay section headings, with translations for all four shipped locales: Footer: - FooterWorking — animated `working` / `working.` / … pulse - FooterAgentSingular / FooterAgentsPlural — the sub-agent count chip - FooterPressCtrlCAgain — the quit-confirmation toast Help overlay sections (`?` / `F1` / `Ctrl+/`): - HelpSectionNavigation, HelpSectionEditing, HelpSectionActions, HelpSectionModes, HelpSectionSessions, HelpSectionClipboard, HelpSectionHelp `KeybindingSection::label` now takes Locale and returns tr(locale, …). `footer_working_label` and `footer_agents_chip` likewise take Locale; the two production callsites in tui/ui.rs pass `app.ui_locale`. The mode chip itself (agent / yolo / plan) intentionally stays English — those are brand/acronym labels, and translating them would mean explaining to maintainers what `代理` means in a bug report. The keybinding catalog DESCRIPTIONS (41 entries) are not translated in this PR — those are technical prose that would dwarf the rest of i18n work and can ship in v0.8.5. Section labels are translated so the help overlay groups read as expected in any locale. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(commands): smoke-test that every / command dispatches to a handler (#299) Adds two parallel-safe smoke tests in `crates/tui/src/commands/mod.rs` that iterate the COMMANDS registry and verify every command — and every declared alias — dispatches to a real handler. A dispatch miss surfaces as the fall-through `Unknown command:` error message in `execute`, which used to be invisible until a user typed the command and saw the "did you mean" suggestion fire on a registered command. The tests build a workspace-isolated app via `tempfile::TempDir` so side-effecting handlers (`/init` writing AGENTS.md, `/save` and `/export` writing files) do not pollute `crates/tui/` when CI runs from there. `/save` and `/export` get an explicit tempdir-relative path because their no-arg defaults still resolve relative to `cwd`. `/restore` is skipped — it shells out to git for the snapshot repo and its own dedicated tests in `commands/restore.rs` already serialize on the global env mutex via `scoped_home`. The existing coverage there is sufficient. Closes a gap surfaced when verifying that the v0.8.4 i18n refactor (#294, #295, #296) did not silently break any slash-command dispatch. All 44 commands and their aliases pass (16 aliases on top of the 44 names; `/restore` is the only skip). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): bump version to 0.8.4 (#297) CHANGELOG entry covers the v0.8.4 work landed since 0.8.3: - Localization Phase 1 (#285) — slash command help (#294), debug command output (#295), footer state and help-overlay section labels (#296). Adds 68 new MessageIds across all four shipped locales (en/ja/zh-Hans/pt-BR). - Cache-prefix stability (#263) — five companion fixes (#287, #288→#292, #289, #290, #291) that keep the DeepSeek prefix cache stable across turns. - Plus the items already in [Unreleased]: agent-mode network exec (#272), /skill GitHub URL parsing (#269), and the V4 Pro discount expiry extension (#267). Bumps: - Cargo.toml workspace version 0.8.3 → 0.8.4 - npm/deepseek-tui/package.json version + deepseekBinaryVersion 0.8.3 → 0.8.4 - Cargo.lock regenerated from the new workspace version. Phase 1d (doctor output), Phase 2 (onboarding/init/missing-companion), and Phase 3 (tool errors / sandbox denials / approvals) deferred to v0.8.5. The shipped Phase 1 surfaces (slash commands, debug telemetry, footer chrome) cover the highest-traffic UI paths Chinese users see first. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(release): bump internal path-dep versions + repair doc link (#301) CI on PR #300 (release feat/v0.8.4 → main) flagged two regressions introduced by the 0.8.4 version bump: 1. Version drift — path-dependency `version = "0.8.3"` references inside the workspace crates (10 crates: agent, app-server, cli, config, core, execpolicy, hooks, mcp, tools, tui) did not move with the workspace `[workspace.package] version = "0.8.4"`. The CI guard `scripts/release/check-versions.sh` requires they match. 2. Broken intra-doc-link `[crate::localization::english]` in the CommandInfo doc comment — `english` is private. Replaced with a reference to the public `description_for` accessor and the public `tr()` function. Verified with: - scripts/release/check-versions.sh — Version state OK. - RUSTDOCFLAGS=-Dwarnings cargo doc --workspace --no-deps — green. - cargo fmt + clippy + test all green. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The DeepSeek prefix-cache only hits while the byte prefix of each request matches the prior call. Anything in the cached prefix that varies turn-to-turn for unchanged inputs is a cache buster. Adds a focused harness next to the production surface so the property is regression-guarded: 1. `first_divergence(a, b)` helper that returns the first divergent byte position with a `±32 byte` window of context, used by the custom assertion `assert_byte_identical`. Future suspect tests can reuse this to surface "where" rather than just "fail". 2. `compose_prompt_is_byte_stable_across_calls` — sweeps every (mode, personality) pair and pins that two consecutive calls produce identical bytes. Rules out suspect Hmbown#4 (mode-prompt churn). 3. `system_prompt_for_mode_with_context_is_byte_stable_for_unchanged_workspace` — the call site `engine.rs::build_tool_context` actually invokes, pinned for an empty workspace across all three modes. 4. `system_prompt_with_working_set_summary_is_byte_stable_for_constant_summary` — pins that the surrounding prompt construction faithfully embeds the working_set summary it's given without injecting extra non-determinism. (The actual working_set summary stability lives in `working_set.rs` and is the next investigation target — see issue note in PR description.) Foundation for the suspect-by-suspect bisection in the rest of Hmbown#263.

- Wire 120 FPS FrameRateLimiter into run_event_loop via time_until_next_draw + mark_emitted - Add low_motion support: 30 FPS cap via LOW_MOTION_MIN_FRAME_INTERVAL - Add AdaptiveChunkingPolicy::set_low_motion() to force Smooth mode - Add StreamingState::set_low_motion() to propagate to all block policies - Tool spinner already freezes on first frame when low_motion is set TODO_BACKEND.md §3, TODO_FIXES.md Hmbown#4

* fix(pricing): extend V4 Pro 75% discount expiry to 2026-05-31 15:59 UTC DeepSeek extended the promotional discount past the original 2026-05-05 cutoff. Without this update the TUI would have started showing 4× the actual billed cost on May 6. Source: https://api-docs.deepseek.com/quick_start/pricing — "extended until 2026/05/31 15:59 UTC". Adds a regression test pinning the new active window so a future revert to the May 5 date trips the suite immediately. Closes Hmbown#267 * chore: remove stale TODO(integrate) markers from already-integrated modules Five `// TODO(integrate)` comments and one matching "Not yet integrated" note were misleading anyone grepping for integration work. Each module is in fact wired up: - execpolicy/mod.rs → tools/shell.rs:1322 (load_default_policy) - sandbox/mod.rs → tools/shell.rs:28, main.rs:2647, tui/approval.rs:30 - sandbox/policy.rs → main.rs:2752, tui/approval.rs:30 (SandboxPolicy) - command_safety.rs → tools/shell.rs:1321, tools/tasks.rs:13, tools/approval_cache.rs:26 - tui/streaming/mod.rs → tui/app.rs:38 (StreamingState) The remaining TODO at mcp.rs:1771 covers a separate "wire legacy sync API into CLI subcommands or remove" decision and is left in place. Closes Hmbown#266 * docs(release): add install + dual-binary template to GitHub Release page Closes Hmbown#265. The Release page used the auto-generated commit-title body. New users hitting the Release page from Twitter / npm-search had no on-page guidance that the dispatcher (`deepseek`) and the TUI runtime (`deepseek-tui`) ship as two binaries that must coexist; Hmbown#258 was an external user spending 11 minutes figuring this out and Hmbown#272 was the follow-on confusion. The new body covers: - npm wrapper as the recommended install - `cargo install deepseek-tui-cli deepseek-tui --locked` (both crates) - Manual download with a per-platform table showing both artifacts - sha256 verify using the existing `deepseek-artifacts-sha256.txt` - Changelog link * feat(debug): add /cache command surfacing per-turn DeepSeek cache hit/miss Step 1 of Hmbown#263. Without per-turn telemetry the prefix-cache audit is unfounded speculation; the rest of the issue's investigation steps depend on this surface. The DeepSeek API already returns `prompt_cache_hit_tokens` and `prompt_cache_miss_tokens` per turn, and we already store the *latest* on App. This adds a 50-turn ring (`turn_cache_history`) populated at the same site as `last_prompt_cache_*_tokens`, plus a `/cache [count]` slash command that renders a fixed-width table of the last N turns with per-turn ratios and a session aggregate. Default count is 10; larger values clamp to the ring size. Edge cases the formatter handles: - No telemetry yet → friendly "no turns recorded" message - `cache_hit_tokens = None` (provider didn't report) → row renders all em-dashes and is excluded from session aggregates so one missing- telemetry turn can't make the average ratio look broken. - `cache_hit_tokens = Some, cache_miss_tokens = None` → infer miss as `input − hit` and mark the cell with `*`. Footer documents the asterisk. - Ring at cap (50) → push evicts oldest. Tests cover all four paths plus the cap. * test(prompts): add cache-prefix stability harness for Hmbown#263 step 2 The DeepSeek prefix-cache only hits while the byte prefix of each request matches the prior call. Anything in the cached prefix that varies turn-to-turn for unchanged inputs is a cache buster. Adds a focused harness next to the production surface so the property is regression-guarded: 1. `first_divergence(a, b)` helper that returns the first divergent byte position with a `±32 byte` window of context, used by the custom assertion `assert_byte_identical`. Future suspect tests can reuse this to surface "where" rather than just "fail". 2. `compose_prompt_is_byte_stable_across_calls` — sweeps every (mode, personality) pair and pins that two consecutive calls produce identical bytes. Rules out suspect Hmbown#4 (mode-prompt churn). 3. `system_prompt_for_mode_with_context_is_byte_stable_for_unchanged_workspace` — the call site `engine.rs::build_tool_context` actually invokes, pinned for an empty workspace across all three modes. 4. `system_prompt_with_working_set_summary_is_byte_stable_for_constant_summary` — pins that the surrounding prompt construction faithfully embeds the working_set summary it's given without injecting extra non-determinism. (The actual working_set summary stability lives in `working_set.rs` and is the next investigation target — see issue note in PR description.) Foundation for the suspect-by-suspect bisection in the rest of Hmbown#263. * fix(secrets): never overwrite the secrets file when load_unlocked errors `FileKeyringStore::set` and `delete` did `self.load_unlocked().unwrap_or_default()`, which wiped every existing secret if the read failed for any reason other than \"file is missing\": - file mode != 0600 (`InsecurePermissions`) — easy on headless / CI environments where a permissive umask got applied - corrupt JSON - transient I/O error In all of those, the next `store_unlocked` overwrote the file with an empty-or-single-entry blob and reset perms to 0600, silently losing every other provider's key. Switch both call sites to `?`. `load_unlocked` already returns `Ok(default)` for a missing file, so the first-write-creates-the-file ergonomic is preserved (covered by the new `file_store_set_still_creates_file_when_missing` test). Adds four regression tests: - set: insecure perms surface InsecurePermissions and leave the file byte-identical. - delete: same. - set: corrupt JSON surfaces the parse error and leaves the file byte-identical. - set: missing file path still works (idempotence guard). Closes Hmbown#281 * fix(cache): make tool catalog byte-stable across calls and sessions DeepSeek's KV prefix cache hits on the longest matching byte prefix of the request. Two places in the tool-array path were silently introducing divergence: 1. `ToolRegistry::to_api_tools()` iterated `self.tools.values()` directly. Rust's default `HashMap` is seeded with `RandomState` per process, so every `deepseek` launch produced a different tool order — the cross- session resume case (the one with the biggest cache wins) never hit. 2. `active_tool_list_from_catalog()` filtered the catalog `Vec` by the active set in catalog order. When ToolSearch activated a previously- deferred tool mid-conversation, the new tool appeared at its catalog index, shifting every later tool's byte offset and busting the cached prefix from there onwards. Fixes: - `to_api_tools()` now sorts by tool name before emitting the API tool array. Stable across calls AND across launches. - `build_model_tool_catalog()` sorts each partition (built-ins first, contiguous; MCP tools after, also alphabetical). Mirrors Claude Code's `assembleToolPool` strategy where they explicitly call out cache stability as the reason: "a flat sort would interleave MCP tools into built-ins and invalidate all downstream cache keys whenever an MCP tool sorts between existing built-ins." - `active_tool_list_from_catalog()` puts always-loaded tools in catalog order at the head and deferred-but-now-active tools at the tail. A deferred-tool activation during ToolSearch no longer shifts earlier tools' positions. Adds three regression tests: - `to_api_tools_emits_alphabetical_order_regardless_of_registration_order` - `model_tool_catalog_sorts_each_partition_for_prefix_cache_stability` - `active_tool_list_pushes_deferred_activations_to_the_tail` Refs Hmbown#263. Findings produced by reading reference Claude Code source side-by-side with our request-building flow; full delta analysis in the PR description. * fix(sandbox): elevate Agent-mode shell sandbox to allow network access The seatbelt-default policy is `WorkspaceWrite { network_access: false }`, which on macOS emits `(deny default)` with no `(allow network-outbound)` / `(allow system-socket)`. Every outbound socket call from a sandboxed shell command — including `getaddrinfo` for DNS — gets denied by the kernel. Symptom: "DNS resolution failed" for any URL the model tries to reach via curl, yt-dlp, package managers, etc. Engine.build_tool_context only elevated the policy in Yolo mode, leaving Agent mode (the default) stuck on the strict default. That's tighter than competitors (Claude Code, Codex) without buying any safety the application-level NetworkPolicy or the approval flow doesn't already provide. Switch the elevation to a `match` so: - Plan → no elevation (read-only investigation; shell tool not registered) - Agent → WorkspaceWrite { network_access: true, … } - Yolo → WorkspaceWrite { network_access: true, … } (unchanged) Adds `agent_and_yolo_modes_elevate_shell_sandbox_to_allow_network` so a future revert to the no-network default trips CI immediately. Closes Hmbown#273 * fix(skills): treat bare github.com/<owner>/<repo> URLs as GitHubRepo Closes Hmbown#269. `/skill install https://github.com/obra/superpowers` failed on every platform with `invalid gzip header`. Root cause: `InstallSource::parse` matched any `https://`-prefixed spec as `DirectUrl`, so the installer downloaded the HTML repo page (200 OK, `text/html`) and tried to gzip-decode HTML. The user reported it from Win11 + PowerShell but the parse path is platform-independent. Recognize bare GitHub repo URLs in `InstallSource::parse`: - `https://github.com/<owner>/<repo>` - `https://github.com/<owner>/<repo>/` - `https://github.com/<owner>/<repo>.git` - `https://github.com/<owner>/<repo>.git/` - `https://www.github.com/<owner>/<repo>` - `http://github.com/<owner>/<repo>` (legacy) …all route to the existing `GitHubRepo` source, which already produces `https://github.com/<repo>/archive/refs/heads/{main,master}.tar.gz` candidates with proper fallback. URLs with a third path segment (`/archive/...`, `/blob/...`, `/tree/...`) keep going through `DirectUrl` because the user picked that exact path. Adds two regression tests: one asserting the seven recognised forms all canonicalize to `github:obra/superpowers`, and one pinning the sub-resource paths to `DirectUrl`. * fix(cache): drop volatile fields from working_set summary block (Hmbown#280) (Hmbown#287) The working-set summary lands inside the system prompt before the historical conversation, so any byte that drifts there cache-misses everything that follows in DeepSeek's KV prefix cache. Two sources of turn-over-turn drift are removed: 1. The rendered line is now `- {path} ({kind})`. The previous form interpolated `entry.touches` and `self.turn - entry.last_turn`, both of which advance on every user message even when no new paths are observed. 2. A new `sorted_for_prompt` helper sorts by (touches DESC, path ASC) instead of the turn-aware `sorted_entries`. The recency bonus in `score_entry` crosses bucket boundaries as turns advance, so even without rendering `last seen` the order — and which entries cross the `max_prompt_entries` cutoff — drifted. Compaction pinning still uses `sorted_entries` because it genuinely wants recency. Adds a regression test that observes a fixed message set, calls `summary_block` before and after `next_turn()`, and asserts the two outputs are byte-identical. The shared `first_divergence` / `assert_byte_identical` helpers (from Hmbown#279) move from `prompts::tests` into `test_support` so working_set tests can reuse them. Closes Hmbown#280. * fix(cache): memoise tool catalog so descriptions stay byte-stable (Hmbown#289) `to_api_tools` previously re-sampled `tool.description()` and `tool.input_schema()` on every call. Native tools return `&'static str` and a `json!` literal, so the bytes were stable in practice — but the `McpToolAdapter` returns `self.tool.description.as_deref()`, which can drift when the upstream MCP server reconnects with a different description string. Any drift mid-session rewrites the tool catalog that lands in the cached prefix and busts every byte that follows. Adds an `api_cache: OnceLock<Vec<Tool>>` field on `ToolRegistry`. The first `to_api_tools` call materialises the catalog; subsequent calls return a clone of the cached vector. Mutations (`register`, `remove`, `clear`) reset the field so the next read rebuilds. Mirrors reference-cc's `getToolSchemaCache` (`utils/api.ts:119–208`). Tests: - `to_api_tools_pins_description_bytes_across_calls` registers a tool whose `description()` advances through a script of pre-built strings on each call. After the cache is populated, the second `to_api_tools` read returns the original description because `description()` is no longer invoked. Without the cache the second read would return the next script entry. - `register_invalidates_api_tools_cache` registers a tool, snapshots, registers another, snapshots again, and asserts the second snapshot reflects both tools (cache rebuilt) and that the varying tool's description advanced (proving the rebuild actually re-sampled). - `remove_and_clear_invalidate_api_tools_cache` covers the other two invalidation paths. * fix(cache): sort project_tree and summarize_project output (Hmbown#290) Both helpers walked the workspace via `ignore::WalkBuilder::build()` and emitted entries in the OS readdir order — non-deterministic across filesystems (htree-hash on ext4, insertion-order on APFS, etc.). Their output lands in the fallback branch of the system prompt's project context (when the workspace has no AGENTS.md / CLAUDE.md) and inside the `project_map` tool surface, both of which feed the cached prefix. `summarize_project` now sorts the collected key-files list before the type-detection logic and the fallback `Project with key files: …` join. `project_tree` collects `(rel_path, is_dir)` tuples, sorts by full path, and only then formats the indented tree. Sorting by full path preserves the visual tree shape — `"src" < "src/lib.rs"` because the shorter string compares less — while making siblings deterministic. Tests cover sibling order, parent-before-children invariant, byte stability across two consecutive calls, and the fallback `Project with key files:` branch (the only branch where the joined order escapes into output without further sorting downstream). * fix(client): unique fallback id for parallel streaming tool calls (Hmbown#291) When a streamed tool_call delta omits the `id` field, the chat-completion decoder used to fall back to the literal string `"tool_call"` for every call. With the V4 API's native parallel tool calls (multiple tool_calls in one delta), every parallel call ended up with the same fallback id — downstream tool-result routing then matched the first call's result twice and the second call hung waiting for an answer that never arrived. The fallback now indexes by the assigned `content_block` position, producing `"call_0"`, `"call_1"`, … within a single response. Upstream- supplied ids are still forwarded verbatim; only the fallback path changes. Tests pin both invariants: - `decoder_assigns_unique_fallback_ids_to_parallel_tool_calls_missing_id` feeds two tool calls without `id` in one delta and asserts they get distinct ids. - `decoder_preserves_upstream_tool_call_id_when_present` keeps the forward-as-is path honest. * fix(cache): place handoff and working_set after static prompt blocks (Hmbown#292) * fix(cache): drop volatile fields from working_set summary block (Hmbown#280) The working-set summary lands inside the system prompt before the historical conversation, so any byte that drifts there cache-misses everything that follows in DeepSeek's KV prefix cache. Two sources of turn-over-turn drift are removed: 1. The rendered line is now `- {path} ({kind})`. The previous form interpolated `entry.touches` and `self.turn - entry.last_turn`, both of which advance on every user message even when no new paths are observed. 2. A new `sorted_for_prompt` helper sorts by (touches DESC, path ASC) instead of the turn-aware `sorted_entries`. The recency bonus in `score_entry` crosses bucket boundaries as turns advance, so even without rendering `last seen` the order — and which entries cross the `max_prompt_entries` cutoff — drifted. Compaction pinning still uses `sorted_entries` because it genuinely wants recency. Adds a regression test that observes a fixed message set, calls `summary_block` before and after `next_turn()`, and asserts the two outputs are byte-identical. The shared `first_divergence` / `assert_byte_identical` helpers (from Hmbown#279) move from `prompts::tests` into `test_support` so working_set tests can reuse them. Closes Hmbown#280. * fix(cache): place handoff and working_set after static prompt blocks `system_prompt_for_mode_with_context_and_skills` previously interleaved volatile content into the static prefix: 1. mode prompt static 2. project context static 3. working_set_summary ← volatile 4. skills_block static 5. handoff_block ← volatile 6. ## Context Management static 7. COMPACT_TEMPLATE static Anything past byte (3) cache-missed every time the working-set drifted or `/compact` rewrote `.deepseek/handoff.md` — including the static `## Context Management` and `## Compaction Handoff` blocks behind them. New order keeps every static block in the cached prefix and pushes the two volatile blocks to the end: 1. mode prompt 2. project context (or fallback automap) 3. skills block 4. ## Context Management (Agent / Yolo only) 5. COMPACT_TEMPLATE ── volatile boundary ── 6. handoff block 7. working-set summary Adds a doc comment on the function describing the volatile-content-last invariant so future contributors don't reintroduce churn into the prefix. Adds two regression tests: - `system_prompt_with_handoff_file_is_byte_stable_when_file_is_unchanged` pins the handoff path with a fixture file. - `handoff_and_working_set_appear_after_static_blocks` asserts the ordering invariant directly so a future reorder fails loudly. Reference: Claude Code's own prompt builder marks this same boundary with a `SYSTEM_PROMPT_DYNAMIC_BOUNDARY` constant; we don't introduce the abstraction yet but match the principle. * feat(i18n): localize slash command help (Phase 1a, Hmbown#285) (Hmbown#294) Adds 44 new MessageIds, one per slash command, and translations to all four shipped locales (en/ja/zh-Hans/pt-BR). Refactors CommandInfo so the English description now lives in localization.rs (single source of truth) instead of being duplicated on the struct, and threads the active Locale through the three render surfaces: - crates/tui/src/tui/views/help.rs (the ?/F1/Ctrl+/ help overlay) - crates/tui/src/tui/command_palette.rs (Ctrl+K palette) - crates/tui/src/commands/core.rs (the /help text command) Usage strings (e.g. /cache [count]) stay English by design — they're placeholder syntax, not natural language. The existing locale-coverage test (`shipped_first_pack_has_no_missing_core_messages`) already iterates ALL_MESSAGE_IDS across Locale::shipped(), so the 44 new IDs are automatically required to be present in all four locale arms or CI fails. This is the first of several incremental Phase 1 PRs. Phase 1b covers the debug commands (/tokens /cost /cache), 1c the footer hints, and 1d doctor output. Phases 2–3 cover onboarding and error surfaces. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(i18n): localize /tokens /cost /cache debug output (Phase 1b, Hmbown#285) (Hmbown#295) Adds 13 new MessageIds covering the report templates and the sub-strings shared across them, with translations for all four shipped locales (en/ja/zh-Hans/pt-BR): - CmdTokensReport, CmdTokensContextWithWindow, CmdTokensContextUnknownWindow - CmdTokensCacheBoth, CmdTokensCacheHitOnly, CmdTokensCacheMissOnly - CmdTokensNotReported - CmdCostReport - CmdCacheNoData, CmdCacheHeader, CmdCacheTotals, CmdCacheFootnote, CmdCacheAdvice Each template uses {placeholder} substitution via String::replace rather than format!, since format! requires a literal — the locale-resolved &'static str isn't one. The placeholder convention ({active}, {hit}, {miss}, …) means a translator can re-order or restructure a sentence freely without changing the call site. Helpers `token_count`, `active_context_summary`, `cache_summary`, and `format_cache_history` now take `Locale` so each can resolve their templates from the same source of truth. The English templates byte-match the previous hardcoded format strings so the existing 16 debug-command tests pass unchanged. Column headers in the cache table (`turn in out hit miss …`) are intentionally NOT localized — the body rows are formatted with fixed column widths and translating the header words would break alignment. Numbers, ratios, and the model id stay in English form. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(i18n): localize footer state + help section labels (Phase 1c, Hmbown#285) (Hmbown#296) Adds 11 new MessageIds covering visible footer chrome and the help-overlay section headings, with translations for all four shipped locales: Footer: - FooterWorking — animated `working` / `working.` / … pulse - FooterAgentSingular / FooterAgentsPlural — the sub-agent count chip - FooterPressCtrlCAgain — the quit-confirmation toast Help overlay sections (`?` / `F1` / `Ctrl+/`): - HelpSectionNavigation, HelpSectionEditing, HelpSectionActions, HelpSectionModes, HelpSectionSessions, HelpSectionClipboard, HelpSectionHelp `KeybindingSection::label` now takes Locale and returns tr(locale, …). `footer_working_label` and `footer_agents_chip` likewise take Locale; the two production callsites in tui/ui.rs pass `app.ui_locale`. The mode chip itself (agent / yolo / plan) intentionally stays English — those are brand/acronym labels, and translating them would mean explaining to maintainers what `代理` means in a bug report. The keybinding catalog DESCRIPTIONS (41 entries) are not translated in this PR — those are technical prose that would dwarf the rest of i18n work and can ship in v0.8.5. Section labels are translated so the help overlay groups read as expected in any locale. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(commands): smoke-test that every / command dispatches to a handler (Hmbown#299) Adds two parallel-safe smoke tests in `crates/tui/src/commands/mod.rs` that iterate the COMMANDS registry and verify every command — and every declared alias — dispatches to a real handler. A dispatch miss surfaces as the fall-through `Unknown command:` error message in `execute`, which used to be invisible until a user typed the command and saw the "did you mean" suggestion fire on a registered command. The tests build a workspace-isolated app via `tempfile::TempDir` so side-effecting handlers (`/init` writing AGENTS.md, `/save` and `/export` writing files) do not pollute `crates/tui/` when CI runs from there. `/save` and `/export` get an explicit tempdir-relative path because their no-arg defaults still resolve relative to `cwd`. `/restore` is skipped — it shells out to git for the snapshot repo and its own dedicated tests in `commands/restore.rs` already serialize on the global env mutex via `scoped_home`. The existing coverage there is sufficient. Closes a gap surfaced when verifying that the v0.8.4 i18n refactor (Hmbown#294, Hmbown#295, Hmbown#296) did not silently break any slash-command dispatch. All 44 commands and their aliases pass (16 aliases on top of the 44 names; `/restore` is the only skip). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): bump version to 0.8.4 (Hmbown#297) CHANGELOG entry covers the v0.8.4 work landed since 0.8.3: - Localization Phase 1 (Hmbown#285) — slash command help (Hmbown#294), debug command output (Hmbown#295), footer state and help-overlay section labels (Hmbown#296). Adds 68 new MessageIds across all four shipped locales (en/ja/zh-Hans/pt-BR). - Cache-prefix stability (Hmbown#263) — five companion fixes (Hmbown#287, Hmbown#288→Hmbown#292, Hmbown#289, Hmbown#290, Hmbown#291) that keep the DeepSeek prefix cache stable across turns. - Plus the items already in [Unreleased]: agent-mode network exec (Hmbown#272), /skill GitHub URL parsing (Hmbown#269), and the V4 Pro discount expiry extension (Hmbown#267). Bumps: - Cargo.toml workspace version 0.8.3 → 0.8.4 - npm/deepseek-tui/package.json version + deepseekBinaryVersion 0.8.3 → 0.8.4 - Cargo.lock regenerated from the new workspace version. Phase 1d (doctor output), Phase 2 (onboarding/init/missing-companion), and Phase 3 (tool errors / sandbox denials / approvals) deferred to v0.8.5. The shipped Phase 1 surfaces (slash commands, debug telemetry, footer chrome) cover the highest-traffic UI paths Chinese users see first. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(release): bump internal path-dep versions + repair doc link (Hmbown#301) CI on PR Hmbown#300 (release feat/v0.8.4 → main) flagged two regressions introduced by the 0.8.4 version bump: 1. Version drift — path-dependency `version = "0.8.3"` references inside the workspace crates (10 crates: agent, app-server, cli, config, core, execpolicy, hooks, mcp, tools, tui) did not move with the workspace `[workspace.package] version = "0.8.4"`. The CI guard `scripts/release/check-versions.sh` requires they match. 2. Broken intra-doc-link `[crate::localization::english]` in the CommandInfo doc comment — `english` is private. Replaced with a reference to the public `description_for` accessor and the public `tr()` function. Verified with: - scripts/release/check-versions.sh — Version state OK. - RUSTDOCFLAGS=-Dwarnings cargo doc --workspace --no-deps — green. - cargo fmt + clippy + test all green. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

🔴 Fix Hmbown#1 (Embedder blocks async runtime): - Pre-warm embedder in init_vector_db() via warmup_embedder() - Model downloaded during startup, not first retrieval - Eliminates 5-10s UI freeze on first embed call 🔴 Fix Hmbown#2 (Compaction summary pollution): - store_compaction_summary_to_vector_db() now extracts only the summary section from the SystemPrompt, stripping boilerplate headers and workflow context - Uses regex extraction between ## Summary and ## Workflow markers 🔴 Fix Hmbown#3 (No vector index → full table scan): - New ensure_vector_index() in LanceDbBackend - Checks list_indices() for existing IVF-PQ index on embedding column - Creates missing index via table.create_index(["embedding"], Index::Auto) - Called from ensure_tables() for all three tables 🔴 Fix Hmbown#4 (Compaction threshold regression for non-vector users): - Added MINIMUM_AUTO_COMPACTION_TOKENS_WITHOUT_VECTOR = 500K - should_compact() now accepts vector_db_enabled: bool - Non-vector users keep 500K floor; vector users use 50K floor - auto_floor_tokens=0 (explicit disable by tests) bypasses both floors - token_threshold default restored to 800K (fallback only; real values come from compaction_threshold_for_model_and_effort) 🟡 Fix Hmbown#7 (No score threshold → noise injection): - search_memories() and search_summaries() now filter results where score < 0.4 before returning - Prevents low-quality retrieval from polluting system prompt Tests: 62/62 compaction ✓, 17/17 vector_db ✓

🔴 Hmbown#1 Embedder blocks async runtime: - Pre-warm embedder in init_vector_db() via warmup_embedder() - Model downloaded at startup, eliminating 5-10s UI freeze 🔴 Hmbown#2 Compaction summary boilerplate pollution: - store_compaction_summary_to_vector_db() now extracts only the actual summary section using regex, stripping headers and workflow context from embeddings 🔴 Hmbown#3 No vector index → full table scan: - New ensure_vector_index() checks list_indices() for IVF-PQ - Creates missing index via create_index(["embedding"], Auto) - Called from ensure_tables() for all 3 tables 🔴 Hmbown#4 Compaction threshold regression: - Added MINIMUM_AUTO_COMPACTION_TOKENS_WITHOUT_VECTOR = 500K - should_compact() accepts vector_db_enabled: bool - Non-vector users keep 500K floor; vector users get 50K - auto_floor_tokens=0 bypasses both (test compatibility) - token_threshold default restored to 800K 🟡 Hmbown#7 Score threshold filtering added: - search_memories() and search_summaries() filter score < 0.4 Tests: 62/62 compaction ✓, 17/17 vector_db ✓, 99/100 engine ✓

… precision - resolve_path: normalise .. components; reject workspace escapes (Hmbown#1,Hmbown#5) - tool_read_file: clamp start_line >= 1 to prevent underflow panic (Hmbown#4) - tool_edit_file: use replacen(.., 1) to avoid mass-replacement bugs (Hmbown#6,Hmbown#8)

…trol, verify fast-path, naming - combined_hash now hashes full tool JSON (name + description + schema), not just tool names, so schema/description changes are detected (Hmbown#1). - build_messages preserves cache_control from SystemPrompt::Blocks so DeepSeek context-cache breakpoints are not lost (Hmbown#2). - FrozenPrefix::verify() compares raw text before hashing to avoid redundant SHA-256 computation (Hmbown#3). - build_messages uses self.message_count() for vector capacity (Hmbown#4). - Strict-mode error message no longer references nonexistent /prefix-unfreeze command — suggests restart or config toggle (Hmbown#5).

Copilot AI review requested due to automatic review settings March 12, 2026 16:17

Copilot started reviewing on behalf of Hmbown March 12, 2026 16:18 View session

Copilot AI reviewed Mar 12, 2026

View reviewed changes

devin-ai-integration Bot reviewed Mar 12, 2026

View reviewed changes

gemini-code-assist Bot reviewed Mar 12, 2026

View reviewed changes

Hmbown merged commit 1a75c71 into main Mar 12, 2026
12 of 16 checks passed

Hmbown mentioned this pull request Mar 12, 2026

fix: address PR #4 follow-ups #5

Merged

Hmbown deleted the feat/remove-normal-mode-consolidate-to-agent branch April 24, 2026 21:41

This was referenced Apr 28, 2026

UI redesign: sub-agent in-transcript cards (deferred from #121) #128

Closed

test(prompts): cache-prefix stability harness for #263 step 2 #279

Merged

working_set::summary_block churns the cache prefix every turn ( / interpolation) #280

Closed

Hmbown mentioned this pull request May 5, 2026

feat(client): --anthropic-wire flag for DeepSeek's Anthropic-compat endpoint; remove dead responses_api_proxy #723

Closed

7 tasks

douglarek mentioned this pull request May 6, 2026

Pending inputs not consumed until all todos complete in Agent mode — no mid-turn intervention granularity #874

Open

Hmbown mentioned this pull request May 10, 2026

chore(release): prepare v0.8.27 #1375

Merged

8 tasks

laoye2020 mentioned this pull request May 12, 2026

Add Catppuccin/Tokyo Night/Dracula/Gruvbox themes + /theme picker #1534

Merged

6 tasks

aboimpinto mentioned this pull request May 21, 2026

feat: configurable auto-compact threshold with Ctrl+L keybinding #1722

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove Normal mode and consolidate to Agent#4

Remove Normal mode and consolidate to Agent#4
Hmbown merged 1 commit into
mainfrom
feat/remove-normal-mode-consolidate-to-agent

Hmbown commented Mar 12, 2026 •

edited by devin-ai-integration Bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

gemini-code-assist Bot commented Mar 12, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Mar 12, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Mar 12, 2026

Uh oh!

gemini-code-assist Bot Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Hmbown commented Mar 12, 2026 • edited by devin-ai-integration Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot commented Mar 12, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Hmbown commented Mar 12, 2026 •

edited by devin-ai-integration Bot

Loading