fix(hooks): reword WebFetch deny so agents don't read it as network block by kerneltoast · Pull Request #654 · mksglu/context-mode

kerneltoast · 2026-05-20T23:38:25Z

Summary

Reword the WebFetch / curl-wget / inline-HTTP redirect messages so the leading word is "redirected" (not "blocked"), and explicitly state the redirect is a context-window optimization, NOT a network restriction. Newer models tolerate "blocked" fine, but on Opus 4.6 the wording trips the network-error-handling pattern and the agent falls back to training knowledge instead of trying ctx_fetch_and_index.
Append a sandbox hint to ssrfGuard's DNS error when the failure code looks like the resolver itself is unreachable (ETIMEOUT/ETIMEDOUT/EAI_AGAIN/ENETUNREACH/EPERM) so users learn that the MCP host inherits its sandbox state at spawn and runtime sandbox toggles on the host won't reach it -- the actionable fix is to restart the host with network access enabled.
The bug is most lethal when both fire in sequence (ctx_fetch_and_index ETIMEOUT, then WebFetch "blocked"). On the same parallelize-skill session, here's the before/after — same model class (Opus 4.6), same compound-failure conditions:

	Pre-fix verdict (old "blocked")	Post-fix verdict (new "redirected")
Agent's verbatim decision	"Network access to arxiv.org is blocked by the sandbox. Let me work with what I can access and rely on my training knowledge of these papers."	"WebFetch is being redirected to context-mode. Let me use the context-mode fetch tool for all of these, plus run more targeted searches."
Outcome	Capitulated to training	Pivoted to `ctx_fetch_and_index`, hit the (still-failing) DNS issue, fell back to WebSearch, produced a 5,172-char synthesis with cited findings
Fallback phrases (`"Network access blocked"`, `"fall back to training"`)	Present	Absent everywhere in the transcript

Test plan

Update hook tests for the new substrings: tests/hooks/core-routing.test.ts, tests/hooks/cursor-hooks.test.ts, tests/hooks/tool-naming.test.ts, tests/hooks/integration.test.ts.
Hook + session-DB suites green: bun x vitest run tests/hooks/ tests/session/session-db.test.ts -> 521 passed, 1 skipped.
Pre-existing failures across tests/statusline*, tests/util/project-dir.test.ts, tests/security.test.ts, tests/core/server.test.ts reproduce on bare main (10 failed / 3307 passed) and are unrelated to this change.
bun run assert-bundle and bun run assert-asymmetric-drift clean. Bundles rebuilt so the DNS hint ships via both server.bundle.mjs and cli.bundle.mjs.
End-to-end verified on the original failure scenario. Post-fix Opus 4.6 worker on the same parallelize-skill session hit the WebFetch redirect on four URLs and pivoted as quoted in the table above; reasoning stayed constructive through the subsequent DNS failure on ctx_fetch_and_index instead of capitulating.
Synthetic A/B with Sonnet and Opus 4.7 subagents calling WebFetch directly under both wordings: both models quote the new wording verbatim and classify it as a tool redirect.

Known follow-up

The DNS hint in ssrfGuard only lands on the single-URL error return path. ctx_fetch_and_index's batch wrapper formats each DNS ETIMEOUT through its own per-request error line, which bypasses ssrfGuard's return — so in the post-fix Opus 4.6 transcript the hint didn't actually surface (the agent saw the bare DNS lookup failed for "arxiv.org": getaddrinfo ETIMEOUT line). A follow-up needs to plumb the sandbox hint through the batch formatter so the common multi-URL case gets the same diagnostic. The wording-only fix still resolves the agent-behavior part of the bug independently, as the table above shows.

Notes

Drafted with the help of Claude (Opus 4.7) under direct human review (bug report, JSONL transcripts, and test scenarios supplied by the author; PR text reviewed before submission).

… bundles

…lock The PreToolUse hook denies WebFetch and points the agent at ctx_fetch_and_index. The reason text used to lead with "context-mode: WebFetch blocked." For most models this reads fine, but the word "blocked" trips the network-error-handling pattern: the agent classifies the message as a network restriction rather than a tool redirect and falls back to training knowledge instead of trying ctx_fetch_and_index. The bug is most lethal when paired with a DNS failure on the ctx_fetch_and_index side. A recent Opus 4.6 transcript shows it: the agent calls ctx_fetch_and_index first, gets `DNS lookup failed for "arxiv.org": getaddrinfo ETIMEOUT` (the MCP host inherits the editor's sandbox state at spawn and can't reach a resolver), falls back to WebFetch, sees "WebFetch blocked", and concludes "Network access to arxiv.org is blocked by the sandbox. Let me work with what I can access and rely on my training knowledge of these papers." Two consecutive failures that both look like network errors are enough to trip the fallback even when the agent has a working alternative in the toolbox. Fix it by rewording all three redirect/deny messages (WebFetch, curl/wget, inline HTTP) to lead with "redirected" instead of "blocked" and to explicitly state "(context-window optimization, NOT a network restriction)" plus "ctx_fetch_and_index has full network access". Swap the soft "use X" phrasing for the imperative "Call X now" so the agent reads it as a directive, not a deferrable suggestion. The build-tool redirect at routing.mjs:751 already used "redirected" -- same precedent. Also surface a sandbox hint on the DNS side. When ssrfGuard's dnsPromises.lookup() throws with a libuv error code that typically indicates the resolver itself is unreachable (ETIMEOUT, ETIMEDOUT, EAI_AGAIN, ENETUNREACH, EPERM), append a note explaining that the MCP host process inherits its sandbox state at spawn and a runtime sandbox toggle on the host won't reach it -- the fix is to restart the host with network access enabled. The underlying lifecycle issue isn't ours to solve, but the diagnostic message can at least point the user at the real cause instead of looking like a bad URL. Caveat on the DNS hint's coverage: the post-fix Opus 4.6 transcript showed ctx_fetch_and_index's batch wrapper printing each DNS ETIMEOUT through its own per-request error line, which bypasses ssrfGuard's return path. The hint lands on the single-URL shape but not yet on the batch shape -- a follow-up needs to plumb it through the batch formatter so multi-URL fetches (the common case) get the same diagnostic. Tests updated for the new substrings (core-routing, cursor-hooks, tool-naming, integration). 521 hook+session-db tests green; the 10 pre-existing failures across statusline/project-dir/server tests reproduce on bare HEAD and are unrelated. Bundles rebuilt so the DNS hint ships via server.bundle.mjs and cli.bundle.mjs; assert-bundle and assert-asymmetric-drift both clean. Verified end-to-end on the original failure scenario. A post-fix Opus 4.6 worker on the same parallelize-skill session hit the WebFetch redirect on four URLs and pivoted with "WebFetch is being redirected to context-mode. Let me use the context-mode fetch tool for all of these, plus run more targeted searches." Even with ctx_fetch_and_index then failing on DNS (the host-side sandbox isolation issue, untouched by this change), the agent kept reasoning constructively, fell back to WebSearch, and produced a 5,172-char synthesis with specific cited findings -- the "Network access blocked" / "fall back to training" phrases that triggered the capitulation in the pre-fix run on the same session don't appear anywhere in the post-fix transcript. Synthetic subagent A/B with Sonnet and Opus 4.7 corroborated: both quote the new wording verbatim and classify it as a tool redirect. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mksglu · 2026-05-21T06:44:58Z

Did you test that manually?

kerneltoast · 2026-05-21T06:50:33Z

Did you test that manually?

Yes. I had an Opus 4.6 session that reproduced this issue 100% of the time trying to fetch an arxiv.org page due to this issue

I started a separate Opus 4.7 session on the context mode tree to diagnose the problem, and then restarted the MCP in the 4.6 session. The fix worked and my 4.6 session could finally fetch from arxiv.org.

Anecdotally, I have actually run into this issue several times over the past couple months but only debugged it today because I saw the failure happened 100% of the time with arxiv in my 4.6 session.

ousamabenyounes · 2026-05-21T17:15:02Z

Hi @kerneltoast — thanks for this fix, the wording change itself is solid. I reproduced the bug locally and confirmed the fix works (red/green proof below), then took a closer look at the ssrfGuard half. A few things I'd like to flag, with a patch you're welcome to take, adapt, or drop.

Red/green verification

Checked out the PR, reverted only src/server.ts + hooks/core/routing.mjs to next while keeping the new tests → 14 failures (Expected: "WebFetch redirected" / Received: "...WebFetch blocked..."). Restored the PR sources → all hook-routing tests pass. The 2 pre-existing failures in tests/core/server.test.ts (tsc binary + projectRoot resolution) reproduce identically on next — not caused by this PR.

Findings on the `ssrfGuard` half

No test coverage for the new code. The existing SSRF suite (tests/core/server.test.ts:3528) never exercises a dns.lookup that throws ETIMEOUT/ETIMEDOUT/EAI_AGAIN/ENETUNREACH/EPERM, so looksLikeSandbox + the hint message land untested. Repo policy is "every code change ships with a test that fails when the patch is reverted".
The hint doesn't surface on the path most users will hit. You called this out yourself in the "Known follow-up" — ctx_fetch_and_index runs each URL in a worker subprocess (server.ts:2721, reason: "exit"), so DNS failures arrive as captured stderr from the child and bypass ssrfGuard's catch entirely. As-is, the hint is wired but inert for the batch flow.
Hardcoded literals. The five DNS codes + the multi-line hint string live inline in the catch. Repo's "no hardcoded values" rule + the "same literal 2+ times → constant" rule both apply once we want this in two paths.

Proposed patch

Hoists the codes and the hint to module-level exports, adds a small looksLikeSandboxError(message) helper that matches against the codes in either an errno or the formatted message, and plumbs the hint through the two fetchOneUrl error branches (subprocess exit and throw). 13 new tests cover each code, two negative cases (ENOTFOUND, ECONNREFUSED), the constants surface, and a source-grep check that both fetchOneUrl branches reference the helper + constant.

diff — src/server.ts

diff --git a/src/server.ts b/src/server.ts
index ef39dc7..0ebe127 100644
--- a/src/server.ts
+++ b/src/server.ts
@@ -2525,6 +2525,34 @@ main();
 const FETCH_TTL_MS = 24 * 60 * 60 * 1000; // 24 hours
 const FETCH_PREVIEW_LIMIT = 3072;
 
+// libuv / getaddrinfo error codes that typically indicate the resolver itself
+// can't reach a nameserver — common when the MCP host process is running under
+// a sandbox that blocks outbound network. Surfaced as a hint so the user knows
+// the failure isn't a bad URL; the MCP server inherits its sandbox state at
+// spawn time and can't be re-permissioned at runtime by toggling the host's
+// sandbox (e.g. Claude Code's /sandbox).
+export const SANDBOX_DNS_CODES: ReadonlySet<string> = new Set([
+  "ETIMEOUT",
+  "ETIMEDOUT",
+  "EAI_AGAIN",
+  "ENETUNREACH",
+  "EPERM",
+]);
+
+export const SANDBOX_HINT =
+  " — looks like the MCP host process can't reach a DNS resolver. If your" +
+  " editor/CLI runs MCP servers under a network sandbox, the server inherits" +
+  " that state at spawn and a runtime sandbox toggle won't reach it. Restart" +
+  " the MCP host with network access enabled.";
+
+/** True if `s` mentions a libuv DNS code that suggests a sandboxed resolver. */
+export function looksLikeSandboxError(s: string): boolean {
+  for (const code of SANDBOX_DNS_CODES) {
+    if (s.includes(code)) return true;
+  }
+  return false;
+}
+
 type FetchOneResult =
   | { kind: "cached"; label: string; chunkCount: number; estimatedBytes: number; ageStr: string }
   | { kind: "fetched"; url: string; source?: string; markdown: string; header: string }
@@ -2605,18 +2633,10 @@ async function ssrfGuard(rawUrl: string): Promise<FetchOneResult | null> {
       }
     }
   } catch (err) {
-    // libuv DNS error codes that typically indicate the resolver itself can't
-    // reach a nameserver — common when the MCP host process is running under
-    // a sandbox that blocks outbound network. Surface a sandbox hint so the
-    // user knows the failure isn't a bad URL; the MCP server inherits its
-    // sandbox state at spawn time and can't be re-permissioned at runtime by
-    // toggling the host's sandbox (e.g. Claude Code's /sandbox).
     const errCode = (err as NodeJS.ErrnoException | undefined)?.code ?? "";
-    const looksLikeSandbox = errCode === "ETIMEOUT" || errCode === "ETIMEDOUT" ||
-      errCode === "EAI_AGAIN" || errCode === "ENETUNREACH" || errCode === "EPERM";
     const baseMsg = err instanceof Error ? err.message : String(err);
-    const hint = looksLikeSandbox
-      ? " — looks like the MCP host process can't reach a DNS resolver. If your editor/CLI runs MCP servers under a network sandbox, the server inherits that state at spawn and a runtime sandbox toggle won't reach it. Restart the MCP host with network access enabled."
+    const hint = SANDBOX_DNS_CODES.has(errCode) || looksLikeSandboxError(baseMsg)
+      ? SANDBOX_HINT
       : "";
     return {
       kind: "fetch_error",
@@ -2718,7 +2738,9 @@ async function fetchOneUrl(url: string, source: string | undefined, force: boole
       timeout: 30_000,
     });
     if (result.exitCode !== 0) {
-      return { kind: "fetch_error", url, error: result.stderr || result.stdout || "unknown error", reason: "exit" };
+      const baseErr = result.stderr || result.stdout || "unknown error";
+      const hint = looksLikeSandboxError(baseErr) ? SANDBOX_HINT : "";
+      return { kind: "fetch_error", url, error: `${baseErr}${hint}`, reason: "exit" };
     }
     const header = (result.stdout || "").trim();
     let markdown: string;
@@ -2732,10 +2754,15 @@ async function fetchOneUrl(url: string, source: string | undefined, force: boole
     }
     return { kind: "fetched", url, source, markdown, header };
   } catch (err: unknown) {
+    const errCode = (err as NodeJS.ErrnoException | undefined)?.code ?? "";
+    const baseErr = err instanceof Error ? err.message : String(err);
+    const hint = SANDBOX_DNS_CODES.has(errCode) || looksLikeSandboxError(baseErr)
+      ? SANDBOX_HINT
+      : "";
     return {
       kind: "fetch_error",
       url,
-      error: err instanceof Error ? err.message : String(err),
+      error: `${baseErr}${hint}`,
       reason: "throw",
     };
   } finally {

new file — tests/core/sandbox-hint.test.ts

import { describe, test, expect } from "vitest";
import { readFileSync } from "node:fs";
import { resolve, dirname } from "node:path";
import { fileURLToPath } from "node:url";
import {
  SANDBOX_DNS_CODES,
  SANDBOX_HINT,
  looksLikeSandboxError,
} from "../../src/server.js";

const __dirname = dirname(fileURLToPath(import.meta.url));

describe("sandbox-hint: DNS code detection", () => {
  test.each([
    ["getaddrinfo ETIMEOUT arxiv.org"],
    ["connect ETIMEDOUT 1.2.3.4:443"],
    ["getaddrinfo EAI_AGAIN example.com"],
    ["connect ENETUNREACH 1.2.3.4"],
    ["EPERM: operation not permitted"],
  ])("flags %s as sandbox-style failure", (msg) => {
    expect(looksLikeSandboxError(msg)).toBe(true);
  });

  test("does NOT flag a normal NXDOMAIN (ENOTFOUND)", () => {
    expect(looksLikeSandboxError("getaddrinfo ENOTFOUND no-such-domain.example"))
      .toBe(false);
  });

  test("does NOT flag a refused connection (ECONNREFUSED)", () => {
    expect(looksLikeSandboxError("connect ECONNREFUSED 1.2.3.4")).toBe(false);
  });

  test("does NOT flag empty / unrelated strings", () => {
    expect(looksLikeSandboxError("")).toBe(false);
    expect(looksLikeSandboxError("Some other error")).toBe(false);
  });
});

describe("sandbox-hint: constants surface", () => {
  test("SANDBOX_DNS_CODES exposes the documented libuv codes", () => {
    expect(SANDBOX_DNS_CODES.size).toBe(5);
    for (const c of ["ETIMEOUT", "ETIMEDOUT", "EAI_AGAIN", "ENETUNREACH", "EPERM"]) {
      expect(SANDBOX_DNS_CODES.has(c)).toBe(true);
    }
  });

  test("SANDBOX_HINT mentions sandbox + restart guidance", () => {
    expect(SANDBOX_HINT).toMatch(/sandbox/i);
    expect(SANDBOX_HINT).toMatch(/restart/i);
    expect(SANDBOX_HINT.startsWith(" — ")).toBe(true);
  });
});

describe("sandbox-hint: fetchOneUrl plumbs hint through every error path", () => {
  const serverSrc = readFileSync(
    resolve(__dirname, "../../src/server.ts"),
    "utf-8",
  );
  const fetchOneSrc = serverSrc.match(/async function fetchOneUrl\([\s\S]+?^}/m);
  if (!fetchOneSrc) throw new Error("fetchOneUrl source not found");
  const block = fetchOneSrc[0];

  test("subprocess stderr branch (exitCode !== 0) appends SANDBOX_HINT", () => {
    const exitBranch = block.match(/if \(result\.exitCode !== 0\)[\s\S]+?\}/);
    expect(exitBranch).not.toBeNull();
    expect(exitBranch![0]).toContain("looksLikeSandboxError");
    expect(exitBranch![0]).toContain("SANDBOX_HINT");
  });

  test("throw catch branch appends SANDBOX_HINT", () => {
    const catchBranch = block.match(/catch \(err: unknown\)[\s\S]+?reason: "throw",[\s\S]+?\}/);
    expect(catchBranch).not.toBeNull();
    expect(catchBranch![0]).toContain("looksLikeSandboxError");
    expect(catchBranch![0]).toContain("SANDBOX_HINT");
  });

  test("ssrfGuard catch uses the shared SANDBOX_HINT constant (no inline literal)", () => {
    const ssrfSrc = serverSrc.match(/async function ssrfGuard\([\s\S]+?^}/m);
    expect(ssrfSrc).not.toBeNull();
    expect(ssrfSrc![0]).toContain("SANDBOX_HINT");
    expect(ssrfSrc![0]).not.toContain('"ETIMEOUT"');
  });
});

Local test logs

Same workspace as the red/green check above, against fix-webfetch-redirect-wording with the patch + new test file applied.

# RED — stash the src/server.ts patch, keep the new test file
$ git stash push -- src/server.ts && bun x vitest run tests/core/sandbox-hint.test.ts
 Test Files  1 failed (1)
      Tests  13 failed (13)
   Duration  1.18s

# GREEN — restore patch
$ git stash pop && bun x vitest run tests/core/sandbox-hint.test.ts
 Test Files  1 passed (1)
      Tests  13 passed (13)
   Duration  1.06s

# Full hooks + sandbox-hint regression with patch applied
$ bun x vitest run tests/core/sandbox-hint.test.ts tests/hooks/
 Test Files  22 passed (22)
      Tests  451 passed | 1 skipped (452)
   Duration  10.92s

After applying, server.bundle.mjs / cli.bundle.mjs / stats.json need regeneration (bun run build + bun run assert-bundle).

Happy to open a follow-up PR on top of this branch if that's easier — whatever you prefer. The wording fix itself stands on its own either way.

murataslan1 · 2026-05-21T17:21:47Z

Manual behavior test verdict: mergeable.

I tested this as a model-behavior change, not as a normal string/code review. Full evidence and raw probe outputs are here:

https://gist.github.com/murataslan1/befe0628e4caa4ebff8fbc97e2b5e2b7

The core issue is real: the old WebFetch blocked framing can be interpreted by the model as a network restriction after a preceding ctx_fetch_and_index DNS failure. In a compound-failure probe, the old wording produced network_is_blocked: true and the next action was to stop network attempts / search existing context. The PR wording produced network_is_blocked: false and kept the next action on ctx_fetch_and_index, with ctx_execute as fallback.

I also verified the PR hook path directly for WebFetch, mcp_web_fetch, mcp_fetch_tool, curl, inline HTTP, and the silent file-output curl passthrough. Focused tests passed: 521 passed | 1 skipped, plus assert-bundle and assert-asymmetric-drift.

Limitations: I could not run the exact Opus 4.6 reproduction locally because my local Claude CLI is not logged in. The behavior probe still reproduced the critical WebFetch compound-failure distinction under Codex. A parallel Claude Code pass reported current Sonnet handled both old and new wording correctly, which matches the PR's claim that newer models tolerate blocked better.

One optional tightening: bind the full network access claim even more explicitly to the context-mode tool, e.g. ctx_fetch_and_index has full network access - it can reach the same URL. WebFetch/curl/wget are disabled here regardless; do not retry them. That would reduce the chance that a weaker model reads full network access as permission to retry curl/WebFetch.

mksglu · 2026-05-21T20:02:12Z

Hi @kerneltoast! It's Mert. Firstly, thanks a lot. A straightforward code review is not enough here, because even a single-word change in the tool instructions can alter model behavior.

This is a critical PR, so it needs to be manually tested, and the actual problem must be clearly understood before moving forward.

Please give me time to test that on manually. I'd like to understand the root cause here actually.

Share with me like "Try it" prompts for real-world testing.

kerneltoast · 2026-05-21T20:05:06Z

Hi @kerneltoast! It's Mert. Firstly, thanks a lot. A straightforward code review is not enough here, because even a single-word change in the tool instructions can alter model behavior.

This is a critical PR, so it needs to be manually tested, and the actual problem must be clearly understood before moving forward.

Please give me time to test that on manually. I'd like to understand the root cause here actually.

Share with me like "Try it" prompts for real-world testing.

Gotcha, no problem. I'll work on creating a minimal reproducer prompt today.

mksglu · 2026-05-21T20:08:02Z

Hi @kerneltoast! It's Mert. Firstly, thanks a lot. A straightforward code review is not enough here, because even a single-word change in the tool instructions can alter model behavior.
This is a critical PR, so it needs to be manually tested, and the actual problem must be clearly understood before moving forward.
Please give me time to test that on manually. I'd like to understand the root cause here actually.
Share with me like "Try it" prompts for real-world testing.

Gotcha, no problem. I'll work on creating a minimal reproducer prompt today.

That's nice. Let me know us when you're ready. We're going to test that case to the different LLM's. Firstly, we should repro that issue. @murataslan1 @ousamabenyounes So please give us instruction for the repro.

kerneltoast · 2026-05-22T03:24:59Z

@mksglu Here is a minimal reproducer prompt: webfetch-deny-misread-repro.txt

To run it:

claude -p --model "claude-opus-4-6[1m]" --output-format json < webfetch-deny-misread-repro.txt

The bug reproduces sporadically on Opus 4.6 about 60% of the time from several runs I did, so run it ~10 times to be certain. Look for "concludes_no_network": true|false in the result field of the returned JSON (true = bug hit).

And here is Claude's analysis for the reproducer: webfetch-deny-misread-analysis.md

The analysis explains what this is actually testing and why it's an artificial reproducer instead of a natural prompt (because creating a "natural" way to consistently reproduce this is tricky).

murataslan1 · 2026-05-22T15:45:15Z

Manual behavior validation from my side:

I tested the old and new hook wording with the same prompt/task.

Old wording:
WebFetch blocked

Observed behavior:
The agent tended to interpret this as a real network/DNS failure and stopped instead of choosing the next context-mode route.

New wording:
WebFetch redirected ... NOT a network restriction

Observed behavior:
The agent correctly treated this as a context-window routing redirect and selected ctx_fetch_and_index / ctx_execute as the next step.

Detailed notes:
https://gist.github.com/murataslan1/800422c44f9df4da2abf6029518fc2e3

Screen recording:
https://x.com/iammurataslan/status/2057844261303783846?s=20

This is not a deterministic unit test, but it is a useful manual behavior check. From an agent-behavior perspective, this PR looks directionally correct to me.

…ative retry hint (substitutes #654) Three redirect messages in hooks/core/routing.mjs (curl/wget, Inline HTTP, WebFetch) reframed from the negation-heavy "blocked" voice to imperative- positive "redirected (NOT a network restriction)" — and crucially append "— retry if it fails with a transient DNS error" so the next action is explicit across all model tiers. PR #654 (contributor) correctly identified Opus 4.6's "blocked → capitulate to training" failure mode under the EAI_AGAIN cascade. Our internal A/B audit (Probe 3, 6 trials Haiku) confirmed the fix on Opus but uncovered a 2/6 Haiku regression — the parenthetical "(NOT a network restriction)" landed as information without a paired action, and 2 trials concluded "since the redirect isn't a restriction either, I can just use training data." Audit recommended appending the imperative retry clause — this substitute ships exactly that. Sibling-tool consistency on ctx_fetch_and_index: - ssrfGuard pre-flight DNS path: classify EAI_AGAIN / ETIMEDOUT / ETIMEOUT / ENETUNREACH / EPERM as transient and append the same retry hint. Non-transient codes (ENOTFOUND) stay silent — retry won't help on a genuinely bad domain. - Subprocess fetch stderr path: closes the contributor's flagged "Known follow-up" (batch wrapper bypassed the single-URL hint). Same code regex on result.stderr — same retry hint surfaces in the common multi-URL batch case the original PR couldn't reach. Tests updated, no new test files (CONTRIBUTING L275): - tests/hooks/core-routing.test.ts: assert "redirected" + retry hint, explicit negative-assert .not.toContain("blocked") regression guard. - tests/hooks/cursor-hooks.test.ts, tool-naming.test.ts: wording sync. - tests/hooks/integration.test.ts: curl-warning regex relaxed to survive both old "Do NOT use curl" and new "Do NOT retry with curl". Bundles untouched — CI rebuilds on main push (project_ci_bundles). Targeted: npx vitest run tests/hooks/ → 438/438 pass. TypeScript: npx tsc --noEmit clean. Closes work on #654 (PR closed in favor of this direct-to-next commit). Audit doc: .cw/ctx-analytics/TOOL-DESCRIPTIONS-AUDIT.md §6.1 Substitute log: .cw/ctx-analytics/PR-654-SUBSTITUTE-LOG.md

…ADR-0003) ADR-0002 formalizes the structure every ctx_* tool description must follow (1-line role / WHEN: / WHEN NOT: / RETURNS: / EXAMPLE:), the forbidden-token list (MANDATORY:, BLOCKED, PREFER X OVER Y, Do NOT, Never use, SESSION STATE, emoji bullets), and the MUST/SHOULD/MAY hierarchy reserved for post-call obligations only. Grounded in 38 trials x 6 probes A/B evidence: heavy framing helps ctx_purge on Haiku (5/5 vs 3/5 parameter fidelity) but hurts ctx_execute selection — one size does not fit all, so rewrites are probe-gated. ADR-0003 splits routing deny reasons into CASE A (redirect — supported via alternative tool) and CASE B (true policy restriction). PR #654's finding: the bare word "blocked" in WebFetch's CASE A denial was misread by Opus 4.6 as a network restriction, triggering training-data capitulation. CASE A MUST use "redirected", state "this is NOT a network/security restriction", and end with a transient-error retry hint. CASE B keeps "denied"/"blocked by security policy". PR #683 (substitutes #654).

…ract test Comprehensive audit of all 11 ctx_* MCP tool descriptions (see TOOL-DESCRIPTIONS-AUDIT.md). Six tools rewritten per ADR-0002 verbatim templates; the remaining 5 are unchanged (3 minimal-description exemptions, 1 MUST-allowed post-call obligation, 1 deferred to a probe-gated follow-up PR). HIGH severity (audit §3): voice consistency on the ctx_execute family. - ctx_execute (src/server.ts:1419): drop "MANDATORY:" opener, "PREFER THIS OVER BASH", "THINK IN CODE" voice-of-trainer paragraph, "Do NOT read raw data". Replace with role definition + WHEN: / WHEN NOT: / RETURNS: / EXAMPLE: sections. ~1200 -> ~700 chars. - ctx_execute_file (src/server.ts:1755): same shape; drop "PREFER THIS OVER Read/cat" and "Don't read files into context to analyze mentally". Probe 2 evidence: disambiguation was already strong; this is a voice-consistency pass. - ctx_batch_execute (src/server.ts:3109): drop "THIS IS THE PRIMARY TOOL", "THINK IN CODE — NON-NEGOTIABLE", and the emoji-bulleted PARALLELIZE I/O block. Replace with WHEN: / CONCURRENCY: prose. ~1700 -> ~900 chars. MEDIUM severity (audit §3): - ctx_search (src/server.ts:2072): drop the SESSION STATE clause (it is a routing-block.mjs concern; semantic-equivalence proof in GRILL-Q1-VERDICT.md Round 5). Add explicit WHEN: structure and a one-line EXAMPLE with batched queries. - ctx_index (src/server.ts:1900): rewrite "Do NOT use for: log files..." as a positive WHEN NOT: clause pointing at ctx_execute_file. Keep the existing WHEN TO USE: header (transitional alias permitted by ADR-0002). - ctx_fetch_and_index (src/server.ts:2865): replace "PARALLELIZE I/O" banner + ✅/❌ emoji bullets with a positive CONCURRENCY: prose block. ✅/❌ tokenize inconsistently across Llama/Gemini and act as negative-example leakage (rubric #4 + Probe 3 evidence). Regression guard (audit §10.1, folded into existing test file per CONTRIBUTING.md L282 "Do NOT create new test files"): - tests/core/server.test.ts: new describe block "tool description style contract (#683 ADR-0002)" parses every server.registerTool() block and asserts: * MUST NOT contain forbidden tokens (MANDATORY:, BLOCKED, PREFER X OVER Y, Do NOT read/use/pull, Never use, SESSION STATE, ✅, ❌) * MUST contain a WHEN: section (WHEN TO USE: accepted) Exemptions: ctx_stats/ctx_doctor/ctx_insight (minimal by design), ctx_upgrade (MUST is appropriate for post-call obligation), ctx_purge (deferred entirely — see below). - Updated two existing tests that asserted the old wording: concurrency-field guidance now checks prose form; PARALLELIZE I/O test now checks CONCURRENCY: section. Explicitly deferred — ctx_purge: Probe 4 (5 trials x 2 variants, Haiku) showed the proposed soft rewrite REGRESSES parameter fidelity 5/5 -> 3/5. Counter-intuitive: the heavy negative framing (DESTRUCTIVE, REFUSAL RULES, NEVER call with bare {confirm:true}) actually anchors small models to the required scope discipline. A follow-up PR must run a tri-LLM probe (Haiku/Sonnet/Opus) and gate merge on that probe before changing this tool. Documented inline in tests/core/server.test.ts via EXEMPT_FROM_FORBIDDEN_TOKENS with rationale. Verification: - npx tsc --noEmit: clean - tests/core/server.test.ts: 361/364 pass (3 pre-existing failures unrelated to this PR — confirmed via git stash diff) - tests/hooks/: 438/439 pass (1 skipped, unchanged) - tests/adapters/: 787/787 pass PR #683 (substitutes #654). See docs/adr/0002 and docs/adr/0003.

…fusal (substitutes #654) (#683) * fix(server): replace "blocked" wording in WebFetch refusal with imperative retry hint (substitutes #654) Three redirect messages in hooks/core/routing.mjs (curl/wget, Inline HTTP, WebFetch) reframed from the negation-heavy "blocked" voice to imperative- positive "redirected (NOT a network restriction)" — and crucially append "— retry if it fails with a transient DNS error" so the next action is explicit across all model tiers. PR #654 (contributor) correctly identified Opus 4.6's "blocked → capitulate to training" failure mode under the EAI_AGAIN cascade. Our internal A/B audit (Probe 3, 6 trials Haiku) confirmed the fix on Opus but uncovered a 2/6 Haiku regression — the parenthetical "(NOT a network restriction)" landed as information without a paired action, and 2 trials concluded "since the redirect isn't a restriction either, I can just use training data." Audit recommended appending the imperative retry clause — this substitute ships exactly that. Sibling-tool consistency on ctx_fetch_and_index: - ssrfGuard pre-flight DNS path: classify EAI_AGAIN / ETIMEDOUT / ETIMEOUT / ENETUNREACH / EPERM as transient and append the same retry hint. Non-transient codes (ENOTFOUND) stay silent — retry won't help on a genuinely bad domain. - Subprocess fetch stderr path: closes the contributor's flagged "Known follow-up" (batch wrapper bypassed the single-URL hint). Same code regex on result.stderr — same retry hint surfaces in the common multi-URL batch case the original PR couldn't reach. Tests updated, no new test files (CONTRIBUTING L275): - tests/hooks/core-routing.test.ts: assert "redirected" + retry hint, explicit negative-assert .not.toContain("blocked") regression guard. - tests/hooks/cursor-hooks.test.ts, tool-naming.test.ts: wording sync. - tests/hooks/integration.test.ts: curl-warning regex relaxed to survive both old "Do NOT use curl" and new "Do NOT retry with curl". Bundles untouched — CI rebuilds on main push (project_ci_bundles). Targeted: npx vitest run tests/hooks/ → 438/438 pass. TypeScript: npx tsc --noEmit clean. Closes work on #654 (PR closed in favor of this direct-to-next commit). Audit doc: .cw/ctx-analytics/TOOL-DESCRIPTIONS-AUDIT.md §6.1 Substitute log: .cw/ctx-analytics/PR-654-SUBSTITUTE-LOG.md * docs(adr): tool description style (ADR-0002) + routing deny reasons (ADR-0003) ADR-0002 formalizes the structure every ctx_* tool description must follow (1-line role / WHEN: / WHEN NOT: / RETURNS: / EXAMPLE:), the forbidden-token list (MANDATORY:, BLOCKED, PREFER X OVER Y, Do NOT, Never use, SESSION STATE, emoji bullets), and the MUST/SHOULD/MAY hierarchy reserved for post-call obligations only. Grounded in 38 trials x 6 probes A/B evidence: heavy framing helps ctx_purge on Haiku (5/5 vs 3/5 parameter fidelity) but hurts ctx_execute selection — one size does not fit all, so rewrites are probe-gated. ADR-0003 splits routing deny reasons into CASE A (redirect — supported via alternative tool) and CASE B (true policy restriction). PR #654's finding: the bare word "blocked" in WebFetch's CASE A denial was misread by Opus 4.6 as a network restriction, triggering training-data capitulation. CASE A MUST use "redirected", state "this is NOT a network/security restriction", and end with a transient-error retry hint. CASE B keeps "denied"/"blocked by security policy". PR #683 (substitutes #654). * fix(server): apply ADR-0002 voice to 6 ctx_* tool descriptions + contract test Comprehensive audit of all 11 ctx_* MCP tool descriptions (see TOOL-DESCRIPTIONS-AUDIT.md). Six tools rewritten per ADR-0002 verbatim templates; the remaining 5 are unchanged (3 minimal-description exemptions, 1 MUST-allowed post-call obligation, 1 deferred to a probe-gated follow-up PR). HIGH severity (audit §3): voice consistency on the ctx_execute family. - ctx_execute (src/server.ts:1419): drop "MANDATORY:" opener, "PREFER THIS OVER BASH", "THINK IN CODE" voice-of-trainer paragraph, "Do NOT read raw data". Replace with role definition + WHEN: / WHEN NOT: / RETURNS: / EXAMPLE: sections. ~1200 -> ~700 chars. - ctx_execute_file (src/server.ts:1755): same shape; drop "PREFER THIS OVER Read/cat" and "Don't read files into context to analyze mentally". Probe 2 evidence: disambiguation was already strong; this is a voice-consistency pass. - ctx_batch_execute (src/server.ts:3109): drop "THIS IS THE PRIMARY TOOL", "THINK IN CODE — NON-NEGOTIABLE", and the emoji-bulleted PARALLELIZE I/O block. Replace with WHEN: / CONCURRENCY: prose. ~1700 -> ~900 chars. MEDIUM severity (audit §3): - ctx_search (src/server.ts:2072): drop the SESSION STATE clause (it is a routing-block.mjs concern; semantic-equivalence proof in GRILL-Q1-VERDICT.md Round 5). Add explicit WHEN: structure and a one-line EXAMPLE with batched queries. - ctx_index (src/server.ts:1900): rewrite "Do NOT use for: log files..." as a positive WHEN NOT: clause pointing at ctx_execute_file. Keep the existing WHEN TO USE: header (transitional alias permitted by ADR-0002). - ctx_fetch_and_index (src/server.ts:2865): replace "PARALLELIZE I/O" banner + ✅/❌ emoji bullets with a positive CONCURRENCY: prose block. ✅/❌ tokenize inconsistently across Llama/Gemini and act as negative-example leakage (rubric #4 + Probe 3 evidence). Regression guard (audit §10.1, folded into existing test file per CONTRIBUTING.md L282 "Do NOT create new test files"): - tests/core/server.test.ts: new describe block "tool description style contract (#683 ADR-0002)" parses every server.registerTool() block and asserts: * MUST NOT contain forbidden tokens (MANDATORY:, BLOCKED, PREFER X OVER Y, Do NOT read/use/pull, Never use, SESSION STATE, ✅, ❌) * MUST contain a WHEN: section (WHEN TO USE: accepted) Exemptions: ctx_stats/ctx_doctor/ctx_insight (minimal by design), ctx_upgrade (MUST is appropriate for post-call obligation), ctx_purge (deferred entirely — see below). - Updated two existing tests that asserted the old wording: concurrency-field guidance now checks prose form; PARALLELIZE I/O test now checks CONCURRENCY: section. Explicitly deferred — ctx_purge: Probe 4 (5 trials x 2 variants, Haiku) showed the proposed soft rewrite REGRESSES parameter fidelity 5/5 -> 3/5. Counter-intuitive: the heavy negative framing (DESTRUCTIVE, REFUSAL RULES, NEVER call with bare {confirm:true}) actually anchors small models to the required scope discipline. A follow-up PR must run a tri-LLM probe (Haiku/Sonnet/Opus) and gate merge on that probe before changing this tool. Documented inline in tests/core/server.test.ts via EXEMPT_FROM_FORBIDDEN_TOKENS with rationale. Verification: - npx tsc --noEmit: clean - tests/core/server.test.ts: 361/364 pass (3 pre-existing failures unrelated to this PR — confirmed via git stash diff) - tests/hooks/: 438/439 pass (1 skipped, unchanged) - tests/adapters/: 787/787 pass PR #683 (substitutes #654). See docs/adr/0002 and docs/adr/0003. * fix(hooks): apply ADR-0002 + ADR-0003 contract to routing-block.mjs + lock with regression test Extends PR #683 to the highest-blast-radius prompt surface in the project: hooks/routing-block.mjs ships into the system prompt of every session, while src/server.ts tool descriptions only fire at tool-selection time. The original PR #683 scope cleaned up the per-tool surface and the routing.mjs deny reasons but missed the system-prompt surface itself. Three forbidden-token violations rewritten per ADR-0002 rubric #2 (affirmative > negative) + #9 (cross-LLM Constitutional AI safety bias): - <forbidden_actions> XML container -> <when_not_to_use>; the container name itself is a Constitutional AI trigger on Anthropic-tier models. - "NEVER use ctx_execute ... for file writes" -> descriptive form "File writes use the native Write or Edit tool -- ctx_execute, ctx_execute_file, and Bash subprocesses do not persist edits to the host filesystem." Same operational intent, no forbidding voice. - "Write artifacts ... NEVER inline" -> "Write artifacts ... to files. Return only: file path + 1-line description." Semantic-equivalence verified by enumerating all 16 directives in the current block and mapping each to its rewrite (0 orphans, 0 additions). Net character delta: -76 chars. RFC 2119 MUST kept in <priority_instructions> per the ADR-0002 post-call-obligation carve-out. Adds a sibling contract describe block to tests/core/server.test.ts: "hook routing prompt-surface contract (#683 ADR-0002 + ADR-0003)". Folded into the same file per CONTRIBUTING.md L282 (no new test files). Scans: - hooks/routing-block.mjs and hooks/core/routing.mjs for forbidden tokens (<forbidden_actions>, NEVER, FORBIDDEN, "NO X for Y" bullets). - Every "redirected"-bearing template literal in routing.mjs for ADR-0003 CASE A compliance: MUST open with "redirected", MUST NOT contain bare uppercase BLOCKED, MUST name at least one ctx_* alternative tool. 15 assertions total. CASE B strings (Blocked by security policy: ...) correctly excluded by the extractor. This is the contract test ADR-0003 Consequences L79-82 invited as follow-up. Three tests/hooks/core-routing.test.ts assertions and one tests/core/ server.test.ts hook-injection assertion updated to match the new positive wording (same semantic coverage, new container name). Full regression sweep: 841 passing / 1 skipped / 3 pre-existing storage- roots failures (verified by git stash on the branch HEAD; out of scope per PR #683 body). * fix(server): apply ADR-0002 canonical structure to ctx_purge + 4 ctx_* tools (PR #683 WS2/WS3) WS2 — ctx_purge rewrite (audit §6.5, Probe 4 evidence preserved): - Replace negative flat framing (DESTRUCTIVE/REFUSAL RULES/NEVER) with the canonical WHEN/WHEN NOT/SCOPES/CONTRACT/RETURNS/EXAMPLE structure. - Preserve all four refusal rules verbatim under CONTRACT (confirm:false, sessionId+scope ambiguity, scope:'session' without sessionId, deprecated bare {confirm:true}) so Probe 4 parameter-fidelity discipline holds on Haiku (5/5 baseline must not regress). - Keep DESTRUCTIVE headline as accurate user-facing signaling (distinct from the cross-LLM-bias negative framing the ADR-0002 rubric forbids). - Add two EXAMPLE lines for the two valid input shapes (per-session + per-project) so the LLM has explicit parameter templates. - Add WHEN NOT clause covering the ambiguous-scope handler ("User says 'reset'/'clear'/'wipe' without naming a scope -> ask first"). WS3 — corpus-wide canonical structure pass: - ctx_index: add EXAMPLE: line (was missing); fold the path-hash sentence into the RETURNS block so the canonical four-section shape holds. - ctx_search: drop the non-canonical TIPS: header (fold into RETURNS prose); add explicit WHEN NOT clauses (empty-index redirect, single one-off question -> ctx_execute). - ctx_fetch_and_index: drop the non-canonical CONCURRENCY: header (fold the I/O-bound split into the WHEN clause; fold the SQLite single-writer note into RETURNS); add WHEN NOT clauses (local content -> ctx_index, SPA-rendered content -> headless browser). - ctx_batch_execute: drop the non-canonical CONCURRENCY: header (fold the I/O-bound guidance into WHEN; fold the CPU-bound + stateful guidance into WHEN NOT). Section order on all six routing-target tools is now strictly WHEN -> WHEN NOT -> RETURNS -> EXAMPLE per ADR-0002 §Canonical structure. Bullets are markdown '- ' only. ctx_stats/ctx_doctor/ctx_upgrade/ctx_insight remain minimal one-line diagnostic descriptions (exempt). Empirical reference: TOOL-DESCRIPTIONS-AUDIT.md §6.1 (ctx_purge Probe 4), audit §3 row-by-row standardization verdicts. * test(server): lock canonical-structure contract + amend ADR-0002 (PR #683 WS3) ADR-0002 amendment (docs/adr/0002-tool-description-style.md): - Add ### Canonical structure (locked rubric — PR #683 WS3) subsection with seven numbered rules (section order, bullet uniformity, header casing, indent, blank-line spacing, single canonical EXAMPLE per tool, per-tool carve-out allow-list). - Add ### Cross-LLM rationale subsection citing the tokenizer-uniformity argument across Claude / GPT / Gemini / Llama as the empirical basis for the UPPERCASE+colon header shape. - Update ### Exemptions and ## Consequences to reflect that ctx_purge is no longer deferred — the WS2 rewrite ships with audit-approved DESTRUCTIVE/SCOPES/CONTRACT carve-outs allow-listed in the contract test, while still meeting the canonical four-section shape. Contract test extensions (tests/core/server.test.ts): - Remove ctx_purge from EXEMPT_FROM_FORBIDDEN_TOKENS and EXEMPT_FROM_WHEN (the WS2 rewrite passes the canonical structure with the carve-outs). - Add ALLOWED_EXTRA_SECTIONS map carving out DESTRUCTIVE/SCOPES/CONTRACT on ctx_purge only, with inline rationale citing Probe 4. - Add four new per-tool assertions (run on every non-exempt ctx_* tool): 1. MUST contain RETURNS: and EXAMPLE: (mandatory presence). 2. Section order WHEN -> WHEN NOT -> RETURNS -> EXAMPLE (strictly increasing flat.indexOf() positions for canonical sections). 3. UPPERCASE+colon headers must be in the canonical set OR the per-tool carve-out list (rejects off-spec sections like CONCURRENCY: and TIPS:). 4. Bullets must be markdown '- ' only (rejects 1./1-/* /•). - Add flattenDescription() helper that collapses the literal '\n' escapes and joins the "+ \n " concat continuation so the assertions run against the shape the host LLM actually sees at tool-selection time. Two stale-test updates (folded CONCURRENCY: into WHEN: prose): - "tool description documents the concurrency field with positive guidance" — expect 'parallelize I/O-bound calls' + 'concurrency 4-8' + 'CPU-bound or stateful' + 'keep concurrency at 1' (inline now). - "PARALLELIZE I/O guidance + locked requests:[] schema in description" — expect 'requests: [{url' + 'concurrency 4-8' + 'FTS5 indexing then serializes writes' (inline now). CONTRIBUTING.md L282 compliance: all assertions folded into the existing tests/core/server.test.ts file; no new test files. Result: tests/core/server.test.ts goes from 88 to 124 contract assertions across 7 non-exempt ctx_* tools; all 124 pass. Three pre-existing baseline failures (ctx_index storage-error + 2 ctx_doctor settings.json) are environment-specific and not introduced by this PR. Empirical reference: PR-683-FINALIZE-LOG.md (WS1 verdict table, WS2 probe design, WS3 before/after section structure). * fix: skip context-mode redirect echoes in isToolError + rename forbidden_actions test anchor PR #683 CI failed across all 3 OS on two tests, both downstream of this PR's own intentional changes: 1. tests/session/continuity.test.ts:79 'outputs additionalContext with XML routing block' — Expected <forbidden_actions> tag. The PR renamed <forbidden_actions> → <when_not_to_use> in hooks/ routing-block.mjs (ADR-0002, affirmative framing — describe when NOT to reach for a tool instead of declaring it forbidden). The continuity test still asserted on the old name. Update the assertion to match the new tag + cross-reference ADR-0002 in the failure message so a future maintainer who runs `npm test` sees the rename instead of a bare diff. 2. tests/opencode-plugin.test.ts:1241 'blocked tool command is replaced before execution' — expected snapshot=="", got <session_resume events="1"> containing a fake <errors count="1"> with our own echo text. The PR rewrote the curl/wget/inline-HTTP/WebFetch redirect echo from "context-mode: curl/wget blocked. …" to user-friendlier "context-mode: curl/wget redirected … retry if it fails with a transient DNS error. …". The new copy legitimately mentions failure modes ("fails", "transient DNS error"), but `isToolError` at src/session/extract.ts:63 keyword-matches /FAIL/i and `failed/i` (case-insensitive, no word boundary), so "fails" inside "if it fails" triggered a substring match → our OWN guidance echo was captured as a session error → next chat would show a fake error in <session_resume>. Fix: gate isToolError on the unique `context-mode:` prefix. The check is defensive at the source — any future copy change to the guidance text cannot reintroduce the bug. Match BOTH sides because real shell runs report `response = "context-mode: …"` (the echo stdout), while the OpenCode plugin test path captures `response = 'echo "context-mode: …"'` (the raw command itself, never executed). Verified locally on Node 20: npx vitest run tests/session/continuity.test.ts -t "outputs additionalContext" → 1 passed npx vitest run tests/opencode-plugin.test.ts -t "blocked tool command" → 1 passed npx vitest run tests/session/ (all 28 files) → 594 passed | 4 skipped, no regressions * fix(routing): drop negation framing from CASE A deny reasons (#683) All four redirect-style deny reasons in hooks/core/routing.mjs (curl/wget, inline HTTP, build tools, WebFetch) rewritten to fully positive imperative voice per ADR-0002 + ADR-0003. - Removed "(context-window optimization, NOT a network restriction)" - Removed "Do NOT retry with curl/wget|Bash|WebFetch" - Replaced "retry if it fails" hedge with imperative "Retry the same call on a transient DNS error (EAI_AGAIN, ETIMEDOUT, ENETUNREACH)" Cross-LLM rationale: negation framing primes LLM attention on the forbidden item (ironic process theory). Positive routing intent + explicit capability affirmation + imperative next-action work uniformly across Claude/GPT/Gemini/Llama. ADR-0003 amended with §Amendment noting the empirical rationale. Contract tests in tests/core/server.test.ts gain two guards (PR #683 follow-up) that fail loud if "NOT a network" or "Do NOT retry" reappear. * test(hooks): update WebFetch + curl deny assertions for affirmative voice (#683) 77f4ec6 rewrote the CASE A deny reasons in hooks/core/routing.mjs to positive imperative voice. The two existing assertions that pinned the old "Do NOT retry" / "Think in Code" wording now need to pin the surviving affirmative anchors instead. - tests/hooks/integration.test.ts: WebFetch path asserts on the "Retry the same call on a transient DNS error" hint and the explicit "Call ctx_fetch_and_index" instruction. - tests/hooks/tool-naming.test.ts: curl redirect path asserts on "Call ctx_execute" — the imperative call instruction that absorbed the "Think in Code" trainer voice. Both keep the integration coverage of the redirect path intact while matching the new wording. * refactor(server): standardize MCP tool description source format (#683) Tool descriptions in src/server.ts used two competing source styles — template literals with embedded "\n\n" escape sequences, and multi-line string concatenation with "+". Both render identically at runtime but the inconsistency made the source hard to scan and made the canonical WHEN/WHEN NOT/RETURNS/EXAMPLE rubric (ADR-0002) harder to enforce by sight during review. All multi-section tool descriptions now use a single style: template literals with real newlines inside the literal. The runtime payload is unchanged. The forbidden-word + canonical-structure contract tests in tests/core/server.test.ts (PR #683 WS3) continue to pass against the normalized source. * fix(routing+desc): drop org-rationale + normalize RETURNS form (#683) Second-pass review of PR #683. Two visual / prompt-economy regressions the first pass missed. (1) "for context-window efficiency" — org-rationale, not action input. The first amendment replaced the bare-NOT parenthetical with the affirmative opener "redirected to <ctx_tool> for context-window efficiency". The "for X reason" preface is still post-hoc justification the agent does not need to act on. Compare HTTP 301: the response carries Location: <new-url> and the client uses it — the server never appends "for SEO efficiency". The capability affirmation "<ctx_tool> has full network access" already carries the substantive signal the rationale was double-encoding. All four CASE A sites in hooks/core/routing.mjs (curl/wget, inline HTTP, build tools, WebFetch) now open with "redirected to <ctx_tool>." flat — affirmative routing intent, no rationale preface. ADR-0003 gains §Second amendment documenting the rule and rationale. A new contract test in tests/core/server.test.ts asserts the phrase "for context-window efficiency" never reappears in any CASE A site. (2) RETURNS form inconsistency — three tools used inline form. ADR-0002 L56-57 specifies RETURNS as a header on its own line with the body indented below (matching WHEN: / WHEN NOT: shape). Three tools (ctx_execute, ctx_execute_file, ctx_purge) used inline form ("RETURNS: only your printed output.") while four tools used canonical header+body form. Mert flagged the visual inconsistency on review. All three inline tools rewritten to header+body form. EXAMPLE: stays inline per ADR-0002 L59 — the asymmetry is intentional (RETURNS prose is multi-line capable; EXAMPLE values are one-call-per-line). A new contract test asserts RETURNS: never appears in inline form on any multi-section ctx_* tool. Both new guards run alongside the existing ADR-0002 forbidden-token + canonical-structure suites. * feat(desc): surface auto-captured session memory in ctx_search + ctx_purge (#683) context-mode captures 23 categories of structured events at hook time (decisions, errors, blockers, plans, user prompts, rejected approaches, file ops, git ops, tasks, latency, MCP tool counts, etc.) and persists them across compaction. The mechanism is documented in README and the project CLAUDE.md routing block. The MCP tool descriptions themselves do not surface this — so at tool-selection time the LLM only sees "search indexed content" and misses the much larger search-session-memory capability. ctx_search description gains: - A 4th WHEN: bullet calling out session-memory queries as a valid use case alongside indexed content. - A RETURNS: note listing the common session-memory source labels (decision, error, error-resolution, blocker, plan, user-prompt, rejected-approach, compaction) and a pointer to ctx_stats for live category counts. - A second EXAMPLE: showing a timeline-sorted decision lookup — permitted under ADR-0002 L103-106 (tools with two valid input shapes MAY include two EXAMPLE lines). ctx_purge SCOPES block gains: - Per-session: clarifies "events" means auto-captured session memory so the agent knows what is being deleted. - Per-project: adds a "use ctx_stats first to preview category counts" hint to prevent destructive surprises. No structural changes: WHEN/WHEN NOT/RETURNS/EXAMPLE canonical order preserved, header-on-own-line RETURNS form preserved, no forbidden tokens introduced, no org-rationale prefaces. 131 ADR-0002 forbidden- token tests + 7 canonical-structure tests + 12 PR #683 contract tests all pass. * feat(desc): principle-first rewrite of ctx_execute/_file + truth-fix ctx_index/fetch (#683) Five maintainer critiques on tool descriptions, addressed together. Two overarching principles emerged from the review: 1. PRINCIPLE OVER HEURISTIC. The LLM picks a tool BEFORE seeing the output. "Use when output >= 20 lines" is unactionable — the LLM can't predict that. Replace with the INTENT principle: "use when you intend to derive an answer FROM the data". The LLM knows its own intent at tool-selection time. 2. TRUTHFUL RETURN DESCRIPTIONS. ctx_index handler at L2022 returns chunk count + source label + ctx_search call hint — NOT a summary. The headline "Only a brief summary is returned" was false. Same false-claim pattern existed in ctx_fetch_and_index RETURNS ("plus an indexing summary"). Reframe as "indexing metadata" and make explicit that raw content is NOT echoed back. ctx_execute (largest rewrite): - Headline + philosophy paragraph: Think-in-Code is now taught explicitly with the concrete 47-files / 700 KB → 3.6 KB example (700 KB into conversation vs 3.6 KB summary printed). The principle is the load-bearing concept, not the implementation detail. - WHEN bullets reframed around INTENT (derive / parse / aggregate) rather than predicted output size. - WHEN NOT bullets reframed around INTENT (observe vs process) rather than enumerated command shapes. - Examples replaced: 'npm test | tail -40' (naive truncate) → smart grep for failure-relevant lines; awkward 'aws' example → gh JSON query with filter+count in code. ctx_execute_file: - Same Think-in-Code framing scoped to single-file analysis. - "raw contents would flood context" reframed in LLM-native terms ("every byte you Read enters your conversation memory and costs reasoning capacity for the rest of the session"). The LLM does not necessarily know which host it is running in or what "context" means in technical terms — describe the actual cost in cognitive terms it reasons about. - Better examples: error filtering with count + tail; CSV row count with header read. ctx_index: - Headline corrected: stores raw content, returns indexing metadata + retrieval hint. Nothing is summarized. - RETURNS body lists what is actually returned: chunk counts (total, code-bearing), source label, ctx_search call shape. Makes explicit that the raw content is NOT echoed back — it lives in storage and is retrievable via ctx_search. - Added second EXAMPLE showing file-backed path (auto-refresh hash). ctx_fetch_and_index: - "raw HTML entering context" → "raw page bytes should NOT enter your conversation memory" (same LLM-native phrasing principle). - RETURNS "plus an indexing summary" → "plus indexing metadata" + explicit "Raw content is NOT echoed back" — eradicates the same false-claim pattern. 131 ADR-0002 forbidden-token tests + 7 canonical-structure tests + 12 PR #683 contract tests all pass. No structural changes — WHEN / WHEN NOT / RETURNS / EXAMPLE canonical order preserved, header-on-own-line RETURNS preserved, no forbidden tokens introduced. * feat(desc): comprehensive description batch — ranking, TTL custom, capabilities, technical depth (#683) Maintainer's third-pass review covered ten distinct critiques. Addressed together because they all stem from the same direction: surface the load-bearing mechanism the LLM needs to act correctly, in terms the LLM understands at tool-selection time. ctx_search — full rewrite. The previous headline ("BM25 over FTS5") sold the tool short. The actual ranking pipeline is BM25 + Reciprocal Rank Fusion over two parallel tokenizers (Porter stemming + trigram substring), plus a proximity rerank pass for multi-term queries, plus Levenshtein typo correction, plus window-extracted smart snippets. The knowledge base is unified: ctx_search reaches both content the user indexed AND auto-captured session memory (26 event categories). WHEN NOT bullets reframed to intent ("you have an ad-hoc question") so the tool-name references don't get lost across long conversations. contentType code|prose filter surfaced. Four EXAMPLE lines cover the four most common shapes (source-scoped batch, timeline-sorted memory, contentType-filtered, multi-source recall). ctx_fetch_and_index — custom TTL (PR #666) now first-class. Default 24h, override per-call with ttl: <ms>, ttl: 0 bypasses like force:true. Removed the "~3KB" hard preview claim — replaced with mechanism prose. RETURNS explains the FTS5 single-writer reality and net-latency math (parallel-fetch + serial-index). Concurrency guardrails kept generic (I/O-bound 4-8, lower for rate-limited hosts) — no third-party API specifics (Mert flag: not our policy to editorialize on someone else's API contract). ctx_batch_execute — Think-in-Code restored as the load-bearing concept ("concurrency parallelizes FETCH; derivation belongs in code"). Same generic concurrency guardrails as ctx_fetch_and_index. EXAMPLE shows the pattern with a summarize-step command at the end. ctx_execute — background and intent capabilities now surfaced in description (previously only in the schema field describe). background covers server/daemon detach. intent triggers auto-index of large output into the knowledge base, with title+preview return instead of raw stdout. ctx_execute_file — same intent surfacing for file-derivation output. ctx_insight — port (default 4747) and sessionDir/contentDir overrides surfaced. Useful for diagnosing multi-install setups or pointing at a sibling project's data. README.md — TTL Cache section rewritten for custom TTL. The "Fresh (<24h)" hard claim is now "Cache hit (within TTL)" with the default-and-override mechanism explained. Tools table row for ctx_fetch_and_index updated to match. Two contract tests (tests/core/server.test.ts) that pinned the old exact phrasing for batch_execute / fetch_and_index concurrency guidance now pin the LOAD-BEARING CONCEPTS via regex (4-8 window, CPU/stateful keep at 1, FTS5 serial-write) — same coverage, immune to future copy-edits. Three pre-existing PR #617 failures (ctx_doctor + ctx_index storage path e2e under CONTEXT_MODE_DIR) remain — out of scope, separate work. * feat(hooks): apply principle-first pattern to routing-block + 4 CASE A deny reasons (#683) The MCP tool descriptions in src/server.ts were rewritten over the previous PR #683 commits to follow a consistent pattern: principle over heuristic, intent over threshold, LLM-native phrasing over jargon, Think-in-Code as the load-bearing concept. The hook layer (routing-block.mjs system-prompt injection + routing.mjs CASE A deny reasons) had not yet been brought to the same standard. Two maintainer flags drove the work: - "bash with output >20 lines → use ..." — the LLM cannot predict output size BEFORE running. Threshold-based heuristic, same anti- pattern as the original ctx_execute description. - The curl/wget deny reason doesn't follow the same pattern as the description rewrites — overloaded with implementation prescription ("Write pure JS with try/catch, no npm deps"), missing principle. routing-block.mjs (system-prompt injection on every session): - <priority_instructions> reframed in cognitive-cost terms: "every byte enters your conversation memory and costs reasoning capacity for the rest of the session". Think-in-Code surfaced explicitly ("program the analysis, do not compute it by reading raw data"). - <tool_selection_hierarchy> entries enriched: ctx_search now describes the auto-captured session memory it reaches; ctx_batch_- execute explains why batching matters (round-trip cost paid once); PROCESSING entry frames around derivation intent rather than the old "API calls, log analysis, data processing" enumeration. - <when_not_to_use> intent-based, no thresholds: "intend to PROCESS the output" vs "intend to OBSERVE a short fixed output". Same shape for Read (analyze vs Edit), WebFetch, ctx_execute/_file file writes. createReadGuidance / createGrepGuidance / createBashGuidance / createExternalMcpGuidance: - All four rewritten with the same intent-based / principle-first framing. "May flood context" / "May produce large output" / "with output >20 lines" all replaced with intent triggers ("when you intend to PROCESS / count / filter / aggregate"). - LLM-native phrasing throughout: "your conversation" not "context", "your derived answer" not "your printed summary" (the latter implies a pre-shaped narrative; the former matches what code actually produces). - Bash carve-out preserved (mutating state / observational commands) but expressed as intent shapes, not enumerated commands. hooks/core/routing.mjs CASE A deny reasons (curl/wget, inline HTTP, build tool, WebFetch): - All four open with the bare "redirected" verb (no preamble), then go straight into the imperative call with the principle embedded ("derive your answer in code, and print only the result — the raw HTTP body stays in the sandbox instead of entering your conversation"). - Selection criteria added where multiple paths exist (curl: inline derivation via ctx_execute vs persist-for-later-query via ctx_fetch_and_index; WebFetch: same). - "Write pure JS with try/catch, no npm deps" preachiness removed — the LLM gets that info from the ctx_execute schema when it actually calls the tool. The deny reason stays focused on routing. - Build tool deny reason now also nudges toward smarter filtering (grep over ERROR|warning|FAIL patterns) instead of leaving the agent with naive `tail -30` as the only suggested shape. All four CASE A sites still satisfy the ADR-0003 contract: open with "redirected", name at least one ctx_* alternative, contain no bare "BLOCKED" / "NOT a network" / "Do NOT retry" / "for context-window efficiency". 635 tests pass (was 579 before this batch), zero new failures. The 3 remaining failures are pre-existing PR #617 ctx_doctor / ctx_index storage-path e2e tests — out of scope.

mksglu · 2026-05-24T11:29:16Z

@kerneltoast — first and most importantly: thank you. Your reproduction on Opus 4.6 and the precise diagnosis (the "blocked" token tripping the network-error-handling pattern and the agent falling back to training knowledge) was the single observation that turned a one-line wording tweak into a corpus-wide audit of how every prompt surface we ship interacts with cross-LLM safety priors. The rewrite landed in #683 and shipped in v1.0.147, but the credit for noticing the problem and giving us a reliable repro is yours.

I'm closing this PR in favor of #683 because the surface area outgrew a single deny-reason fix — but everything you pointed at is now codified as policy, not just patched. Closing carries no "your PR was wrong" signal; the opposite. You can ignore the section below if you just want the summary, but I want to be transparent about what landed so this isn't a black-box close.

What landed in v1.0.147 (substitutes #654)

1. The original WebFetch / curl / wget / inline-HTTP fix. All four CASE A redirect deny reasons in hooks/core/routing.mjs now open with the affirmative verb "redirected", name the alternative ctx_* tool by an imperative call, affirm capability ("<tool> has full network access"), and end with a positive transient-DNS retry hint (EAI_AGAIN | ETIMEDOUT | ENETUNREACH). The "blocked" token is gone from CASE A entirely.

2. ADR-0002 (tool description style) — new, canonical and contract-tested. Documents the cross-LLM bias rubric (Constitutional AI priors, bare-NOT negation, emoji-bullet leakage, off-spec UPPERCASE sections, RFC 2119 imperative hierarchy). Locks a canonical structure for every routing-target ctx_* tool: WHEN: → WHEN NOT: → RETURNS: → EXAMPLE:, markdown - bullets only, header-on-own-line RETURNS form. Enforced by a contract test that scans src/server.ts on every commit (131 forbidden-token assertions + 7 canonical-structure assertions + per-tool guards). Future contributors either adhere to it or amend the ADR.

3. ADR-0003 (routing deny reasons: redirect ≠ restriction) — also new. Distinguishes CASE A (routing redirect to an alternative tool) from CASE B (true security/policy denial). CASE A MUST open with "redirected", MUST NOT contain bare BLOCKED, MUST NOT contain bare-NOT negations (NOT a network restriction, Do NOT retry with curl), MUST NOT contain org-rationale prefaces (for context-window efficiency). Three contract-test guards lock each rule. The §Amendment + §Second amendment sections document the empirical reasoning chain that led from your original observation to the final policy.

4. Two LLM-behavioral findings the rewrite was calibrated against.

Negation framing primes the forbidden frame (ironic process theory). Tri-LLM probe measured Haiku capitulation regressing from 0/6 → 2/6 when the original "blocked" was replaced with "(context-window optimization, NOT a network restriction)" — the bare-NOT construct anchored attention on the very frame it tried to deny. Resolution: affirmative "<tool> has full network access" carries the same signal positively.
Threshold-based heuristics are unactionable at tool-selection time. Phrases like "Bash with output >20 lines → use ctx_execute" were rewritten around INTENT ("when you intend to PROCESS the output (filter, count, parse, aggregate)") because the LLM cannot predict output size before running — only its own intent.

5. Description corpus rewrite — every ctx_ tool.* Eleven tool descriptions in src/server.ts rewritten to surface load-bearing mechanism in LLM-actionable terms (the actual ranking pipeline, custom TTL cache, Think-in-Code principle, unified knowledge base reaching auto-captured session memory, etc.). The hooks/routing-block.mjs system-prompt injection got the same treatment — "context window floods" → "every byte enters your conversation memory and costs reasoning capacity". Five-language-aware tokenizer thinking throughout (Claude / GPT / Gemini / Llama / Qwen).

6. Behavior preserved. No tool surface area changed, no schema breaking change, no behavior regression. Description-and-deny-reason rewrite only. 635 tests pass.

Please test v1.0.147

If you have the repro from your original report (Opus 4.6 + WebFetch fetch attempt), npm i -g @mksglu/context-mode@1.0.147 or /context-mode:ctx-upgrade and verify the agent now follows the redirect to ctx_fetch_and_index cleanly. If it still falls back to training knowledge, that's a real regression and I want to know about it — open an issue and tag this PR.

Invitation

Your diagnostic discipline (reproduction → token-level hypothesis → repro on a specific model version) is the exact bar I want collaborators at. If you'd like to be part of the private engineering channel where the deeper architectural reviews happen (cross-LLM probe design, ADR drafting, contract-test extensions), reply here or open a discussion and I'll send you the invite. No pressure, no commitment expected — the door is open.

Thanks again. This release exists in this shape because of your report.

— Mert

mksglu · 2026-05-24T11:29:24Z

Closing in favor of #683 (now merged into next and shipped in v1.0.147). The credit + technical breakdown is in the comment above — thank you again @kerneltoast.

kerneltoast · 2026-05-24T17:34:33Z

If you have the repro from your original report (Opus 4.6 + WebFetch fetch attempt), npm i -g @mksglu/context-mode@1.0.147 or /context-mode:ctx-upgrade and verify the agent now follows the redirect to ctx_fetch_and_index cleanly. If it still falls back to training knowledge, that's a real regression and I want to know about it — open an issue and tag this PR.

@mksglu Just tested 1.0.150 on the original session where I ran into this and it is indeed fixed. Thanks for the thorough audit and fixing this pattern across context-mode! I didn't catch the negative framing issue in my PR with smaller models like Haiku fixating on "NOT", and I hadn't tested Haiku. The positive framing is indeed the correct solution for better compliance.

If you'd like to be part of the private engineering channel where the deeper architectural reviews happen (cross-LLM probe design, ADR drafting, contract-test extensions), reply here or open a discussion and I'll send you the invite. No pressure, no commitment expected — the door is open.

Sure, I'd love to. I rely on context-mode quite heavily 🙂

mksglu · 2026-05-24T17:48:15Z

If you have the repro from your original report (Opus 4.6 + WebFetch fetch attempt), npm i -g @mksglu/context-mode@1.0.147 or /context-mode:ctx-upgrade and verify the agent now follows the redirect to ctx_fetch_and_index cleanly. If it still falls back to training knowledge, that's a real regression and I want to know about it — open an issue and tag this PR.

@mksglu Just tested 1.0.150 on the original session where I ran into this and it is indeed fixed. Thanks for the thorough audit and fixing this pattern across context-mode! I didn't catch the negative framing issue in my PR with smaller models like Haiku fixating on "NOT", and I hadn't tested Haiku. The positive framing is indeed the correct solution for better compliance.

If you'd like to be part of the private engineering channel where the deeper architectural reviews happen (cross-LLM probe design, ADR drafting, contract-test extensions), reply here or open a discussion and I'll send you the invite. No pressure, no commitment expected — the door is open.

Sure, I'd love to. I rely on context-mode quite heavily 🙂

Reach out me via DM

github-actions Bot and others added 3 commits May 20, 2026 08:17

ci: update server.bundle.mjs, cli.bundle.mjs, session hook & security…

55b51d3

… bundles

ci: update install stats

4dcbd45

kerneltoast force-pushed the fix-webfetch-redirect-wording branch from abd6488 to 455788a Compare May 20, 2026 23:49

mksglu changed the base branch from main to next May 21, 2026 06:43

Merge branch 'next' into fix-webfetch-redirect-wording

e53c4bb

mksglu closed this May 24, 2026

mksglu mentioned this pull request May 24, 2026

fix(server): comprehensive ctx_* tool description audit + WebFetch refusal (substitutes #654) #683

Merged

mksglu reopened this May 24, 2026

mksglu closed this May 24, 2026

mksglu mentioned this pull request May 24, 2026

Beta testers wanted — 15 platforms × 3 operating systems #45

Open

Uh oh!

Conversation

kerneltoast commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Known follow-up

Notes

Uh oh!

mksglu commented May 21, 2026

Uh oh!

kerneltoast commented May 21, 2026

Uh oh!

ousamabenyounes commented May 21, 2026

Red/green verification

Findings on the ssrfGuard half

Proposed patch

Local test logs

Uh oh!

murataslan1 commented May 21, 2026

Uh oh!

mksglu commented May 21, 2026

Uh oh!

kerneltoast commented May 21, 2026

Uh oh!

mksglu commented May 21, 2026

Uh oh!

kerneltoast commented May 22, 2026

Uh oh!

murataslan1 commented May 22, 2026

Uh oh!

mksglu commented May 24, 2026

What landed in v1.0.147 (substitutes #654)

Please test v1.0.147

Invitation

Uh oh!

mksglu commented May 24, 2026

Uh oh!

kerneltoast commented May 24, 2026

Uh oh!

mksglu commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kerneltoast commented May 20, 2026 •

edited

Loading

Findings on the `ssrfGuard` half