Defer low-value native tools by default by Hmbown · Pull Request #2102 · Hmbown/CodeWhale

Hmbown · 2026-05-25T05:49:41Z

Summary

Defer-load native tools outside the core catalog by default so ToolSearch materializes low-frequency tools on demand.
Add [tools] always_load = ["..."] config support to opt native tools back into the active catalog.
Add scripts/measure-tool-catalog.py for deterministic serialized catalog byte/token estimates.

Closes #2076.
Partially addresses #1253.
Partially addresses #1177.
Partially addresses #1747.
Partially addresses #1120.

Measurement

CARGO_TARGET_DIR=/Volumes/VIXinSSD/whalebro/target-codewhale-issue-2076 scripts/measure-tool-catalog.py

baseline: 79 tools, 55,985 bytes, ~13,997 tokens
active default: 16 tools, 14,795 bytes, ~3,699 tokens
reduction: 73.57%

Tests

CARGO_TARGET_DIR=/Volumes/VIXinSSD/whalebro/target-codewhale-issue-2076 cargo test -p codewhale-tui tool_catalog -- --test-threads=1
CARGO_TARGET_DIR=/Volumes/VIXinSSD/whalebro/target-codewhale-issue-2076 cargo test -p codewhale-tui non_yolo_mode_retains_default_defer_policy -- --test-threads=1
CARGO_TARGET_DIR=/Volumes/VIXinSSD/whalebro/target-codewhale-issue-2076 cargo test -p codewhale-tui --bin codewhale-tui tools_always_load_parses_and_trims_names -- --test-threads=1
CARGO_TARGET_DIR=/Volumes/VIXinSSD/whalebro/target-codewhale-issue-2076 cargo test -p codewhale-config tools -- --test-threads=1

Greptile Summary

This PR reduces the model-visible native tool catalog from ~79 tools to ~16 by deferring low-frequency tools behind a ToolSearch gateway, achieving a ~73% token-count reduction. It also adds a [tools] always_load config option to opt specific tools back into the active catalog and ships a Python measurement script.

A fixed DEFAULT_ACTIVE_NATIVE_TOOLS list replaces the previous per-mode, per-name allowlist; apply_native_tool_deferral and ensure_advanced_tooling now accept and respect an always_load set threaded from EngineConfig.
crates/config/src/lib.rs introduces a compile error at line 391: a new provider-override guard references has_api_key, a variable that does not exist anywhere in the config crate.
update_plan is no longer in the active catalog; when force_update_plan_first triggers on "quick plan" phrases in Plan mode, the forced first step now sends a deferred tool, causing a schema-hydration round-trip before update_plan can actually execute.

Confidence Score: 1/5

Not safe to merge — the config crate will not compile, and Plan mode quick-plan optimization silently degrades to an extra round-trip.

The provider-override block added to merge_project_overrides references has_api_key, a name that does not exist anywhere in the config crate. Every crate that depends on codewhale-config will fail to build. Separately, update_plan was moved out of the active catalog without accounting for the force_update_plan first-step path, so quick-plan requests in Plan mode now pay an extra LLM round-trip before update_plan can execute.

crates/config/src/lib.rs (undefined variable at line 391) and crates/tui/src/core/engine/tool_catalog.rs (DEFAULT_ACTIVE_NATIVE_TOOLS missing update_plan)

Important Files Changed

Filename	Overview
crates/config/src/lib.rs	Adds ToolsToml struct and merges [tools] from project config, but introduces a compile error at line 391 by referencing the undefined variable `has_api_key` in the new provider-override guard.
crates/tui/src/core/engine/tool_catalog.rs	Replaces the old per-mode deferral logic with a fixed DEFAULT_ACTIVE_NATIVE_TOOLS list; `update_plan` and several workflow tools are now deferred, breaking the force_update_plan first-step optimization and removing special Yolo-mode native-tool handling.
crates/tui/src/config.rs	Adds ToolsConfig struct, tools_always_load() helper, and merge_config support; implementation is clean and well-tested.
crates/tui/src/core/engine/tests.rs	Updates all test call-sites to pass always_load; adds tools_always_load_overrides_default_native_deferral test and the ignored print_agent_tool_catalog_metrics benchmark.
crates/tui/src/core/engine/turn_loop.rs	One-line change to thread tools_always_load into ensure_advanced_tooling; no logic changes.
scripts/measure-tool-catalog.py	New measurement script that delegates to an ignored Rust test; correctly parses the TOOL_CATALOG_METRICS marker and falls back gracefully.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Config loaded] --> B[tools_always_load set]
    B --> C[build_model_tool_catalog]
    C --> D[apply_native_tool_deferral]
    C --> E[apply_mcp_tool_deferral]
    D --> D1{tool in always_load?}
    D1 -- Yes --> D2[defer_loading = false]
    D1 -- No --> D3{is ToolSearch?}
    D3 -- Yes --> D2
    D3 -- No --> D4{in DEFAULT_ACTIVE_NATIVE_TOOLS?}
    D4 -- Yes --> D2
    D4 -- No --> D5[defer_loading = true]
    E --> E1{mode == Yolo?}
    E1 -- Yes --> E2[defer_loading = false ALL MCP]
    E1 -- No --> E3{in keep_mcp_loaded list?}
    E3 -- Yes --> E2
    E3 -- No --> E4[defer_loading = true]
    D2 & D5 --> F[initial_active_tools]
    E2 & E4 --> F
    F --> G[active_tool_names HashSet]
    G --> H[active_tools_for_step]
    H --> I{force_update_plan?}
    I -- Yes --> J[return update_plan from catalog]
    I -- No --> K[active_tool_list_from_catalog]
    J --> L[Model calls update_plan]
    L --> M{in active_tools_at_batch_start?}
    M -- No --> N[Schema hydration - extra round-trip]
    M -- Yes --> O[Execute update_plan]

_{Reviews (1): Last reviewed commit: "Merge main into codex/issue-2076-defer-n..." | Re-trigger Greptile}

Greptile also left 4 inline comments on this PR.

gemini-code-assist · 2026-05-25T05:49:45Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

greptile-apps · 2026-05-26T15:18:12Z

+        if project.provider != ProviderKind::Deepseek || has_api_key {
+            self.provider = project.provider;
+        }


Undefined variable has_api_key — compile error

has_api_key is not defined anywhere in this crate; it doesn't exist as a local variable, a module-level function, or an imported symbol in crates/config/src/lib.rs. This will fail to compile. The closest candidates are self.api_key.is_some() (user-level credential already present) or project.api_key.is_some() (project config has an explicit key), but neither is what's written here. Additionally, the docstring for merge_project_overrides explicitly states the function "intentionally ignores … provider selection", so any change to that contract should be deliberate and clearly explained.

greptile-apps · 2026-05-26T15:18:13Z

+pub(super) const DEFAULT_ACTIVE_NATIVE_TOOLS: &[&str] = &[
+    "agent_open",
+    "apply_patch",
+    "edit_file",
+    "exec_shell",
+    "fetch_url",
+    "file_search",
+    "git_diff",
+    "git_status",
+    "grep_files",
+    "list_dir",
+    "read_file",
+    "run_tests",
+    "web_search",
+    "write_file",
+];
+
+pub(super) fn should_default_defer_tool(
+    name: &str,
+    _mode: AppMode,
+    always_load: &HashSet<String>,
+) -> bool {
+    if always_load.contains(name) {
        return false;
    }

-    // Shell exec tools are kept active in Agent so the model can run
-    // verification commands (build/test/git/cargo) without first having to
-    // discover them through ToolSearch. Plan mode does not register shell
-    // execution tools.
-    let always_loaded_in_action_modes = matches!(mode, AppMode::Agent)
-        && matches!(
-            name,
-            "exec_shell"
-                | "exec_shell_wait"
-                | "exec_shell_interact"
-                | "exec_wait"
-                | "exec_interact"
-        );
-    if always_loaded_in_action_modes {
+    if is_tool_search_tool(name) {
        return false;
    }

-    !matches!(
-        name,
-        "read_file"
-            | "write_file"
-            | "list_dir"
-            | "grep_files"
-            | "file_search"
-            | "diagnostics"
-            | "rlm_open"
-            | "rlm_eval"
-            | "rlm_configure"
-            | "rlm_close"
-            | "rlm_session_objects"
-            | "handle_read"
-            | "recall_archive"
-            | "notify"
-            | MULTI_TOOL_PARALLEL_NAME
-            | "update_plan"
-            | "checklist_write"
-            | "todo_write"
-            | "task_create"
-            | "task_list"
-            | "task_read"
-            | "task_gate_run"
-            | "task_shell_start"
-            | "task_shell_wait"
-            | "github_issue_context"
-            | "github_pr_context"
-            | REQUEST_USER_INPUT_NAME
-    )
+    !DEFAULT_ACTIVE_NATIVE_TOOLS
+        .iter()
+        .any(|core_tool| core_tool == &name)
 }


update_plan deferral breaks the force_update_plan first-step optimization

update_plan is not in DEFAULT_ACTIVE_NATIVE_TOOLS, so apply_native_tool_deferral marks it with defer_loading = Some(true). When should_force_update_plan_first returns true (Plan mode + "quick plan" phrases), active_tools_for_step returns only the update_plan tool straight from the catalog — still with defer_loading = Some(true). The model's first call to update_plan hits maybe_hydrate_requested_deferred_tool, which intercepts it (the tool is not yet in active_tools_at_batch_start) and returns a schema-hydration response instead of executing. This uses an extra LLM round-trip before update_plan can actually run, and force_update_plan_this_step is false for all subsequent steps, so the forced-first-step guarantee is gone entirely.

greptile-apps · 2026-05-26T15:18:15Z

+pub(super) fn should_default_defer_tool(
+    name: &str,
+    _mode: AppMode,
+    always_load: &HashSet<String>,
+) -> bool {
+    if always_load.contains(name) {
        return false;
    }

-    // Shell exec tools are kept active in Agent so the model can run
-    // verification commands (build/test/git/cargo) without first having to
-    // discover them through ToolSearch. Plan mode does not register shell
-    // execution tools.
-    let always_loaded_in_action_modes = matches!(mode, AppMode::Agent)
-        && matches!(
-            name,
-            "exec_shell"
-                | "exec_shell_wait"
-                | "exec_shell_interact"
-                | "exec_wait"
-                | "exec_interact"
-        );
-    if always_loaded_in_action_modes {
+    if is_tool_search_tool(name) {
        return false;
    }

-    !matches!(
-        name,
-        "read_file"
-            | "write_file"
-            | "list_dir"
-            | "grep_files"
-            | "file_search"
-            | "diagnostics"
-            | "rlm_open"
-            | "rlm_eval"
-            | "rlm_configure"
-            | "rlm_close"
-            | "rlm_session_objects"
-            | "handle_read"
-            | "recall_archive"
-            | "notify"
-            | MULTI_TOOL_PARALLEL_NAME
-            | "update_plan"
-            | "checklist_write"
-            | "todo_write"
-            | "task_create"
-            | "task_list"
-            | "task_read"
-            | "task_gate_run"
-            | "task_shell_start"
-            | "task_shell_wait"
-            | "github_issue_context"
-            | "github_pr_context"
-            | REQUEST_USER_INPUT_NAME
-    )
+    !DEFAULT_ACTIVE_NATIVE_TOOLS
+        .iter()
+        .any(|core_tool| core_tool == &name)
 }


_mode parameter is now completely unused — Yolo mode and Plan mode get identical native-tool deferral

Previously, Yolo mode loaded all native tools and Agent mode kept exec tools always loaded. Now _mode is silently ignored. Meanwhile apply_mcp_tool_deferral still uses mode to give Yolo mode special treatment — MCP tools are not deferred in Yolo mode while non-core native tools are. This asymmetry means a user in Yolo mode gets full access to MCP tools but must still discover git_show, notify, diagnostics, etc. via ToolSearch, contradicting the "unrestricted" semantics of Yolo mode.

greptile-apps · 2026-05-26T15:18:15Z

+pub(super) const DEFAULT_ACTIVE_NATIVE_TOOLS: &[&str] = &[
+    "agent_open",
+    "apply_patch",
+    "edit_file",
+    "exec_shell",
+    "fetch_url",
+    "file_search",
+    "git_diff",
+    "git_status",
+    "grep_files",
+    "list_dir",
+    "read_file",
+    "run_tests",
+    "web_search",
+    "write_file",
+];


Several previously always-loaded workflow tools are silently deferred with no documentation

request_user_input, diagnostics, recall_archive, todo_write, checklist_write, and the task_* family were all in the old always-loaded list and are absent from DEFAULT_ACTIVE_NATIVE_TOOLS. They now require a ToolSearch call before first use. CONFIGURATION.md documents always_load but doesn't mention which tools moved to deferred, so users relying on these tools will silently hit the schema-hydration detour without any config change on their part.

Copilot AI review requested due to automatic review settings May 25, 2026 05:49

Copilot started reviewing on behalf of Hmbown May 25, 2026 05:49 View session

Copilot AI reviewed May 25, 2026

Defer low-value native tools by default

6e64881

Hmbown force-pushed the codex/issue-2076-defer-native-tools branch from 6c9e6f3 to 6e64881 Compare May 25, 2026 05:52

Merge main into codex/issue-2076-defer-native-tools (resolve conflicts)

aff4050

Hmbown merged commit 2a41102 into main May 26, 2026
1 check passed

greptile-apps Bot reviewed May 26, 2026

View reviewed changes

Hmbown deleted the codex/issue-2076-defer-native-tools branch June 14, 2026 19:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Defer low-value native tools by default#2102

Defer low-value native tools by default#2102
Hmbown merged 2 commits into
mainfrom
codex/issue-2076-defer-native-tools

Hmbown commented May 25, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

gemini-code-assist Bot commented May 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

greptile-apps Bot May 26, 2026

Uh oh!

greptile-apps Bot May 26, 2026

Uh oh!

greptile-apps Bot May 26, 2026

Uh oh!

greptile-apps Bot May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Hmbown commented May 25, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Measurement

Tests

Greptile Summary

Confidence Score: 1/5

Important Files Changed

Flowchart

Uh oh!

gemini-code-assist Bot commented May 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Hmbown commented May 25, 2026 •

edited by greptile-apps Bot

Loading