You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a bundled report covering 3 distinct issues discovered during the
same cross-platform audit that produced #687 (resolved) and the
ctx_batch_execute scope issue (opened separately). Each issue below has
its own reproducer prompt and was verified across 4 client/model
combinations on v1.0.146, v1.0.148, and v1.0.151 (current).
────────────────────────────────────────────────────────────────────────
ISSUE A — ctx_insight ignores the `query` parameter (dashboard launcher only)
────────────────────────────────────────────────────────────────────────
Prompt that triggered:
ctx_insight(query: "What is the practical difference between skill A
and skill B in this codebase?")
────────────────────────────────────────────────────────────────────────
ISSUE B — ctx_stats omits cache hit rate, chunk count, last_indexed_at
────────────────────────────────────────────────────────────────────────
Prompt that triggered:
ctx_stats()
(run twice — once cold, once after ~6 ctx_search calls — to observe
what the second call reports vs what's available to compute.)
────────────────────────────────────────────────────────────────────────
ISSUE C — ctx_search applies undocumented progressive throttling (3→2→1)
────────────────────────────────────────────────────────────────────────
Prompt that triggered:
Issue ctx_search sequentially 5+ times within ~60 seconds, each call
passing `limit: 3`. Observe the response `limit` shrinking after the
2nd call.
Full error output (REQUIRED)
═══════════════════════════════════════════════════════════════════════
CONTEXT: This bundles 3 issues from an audit that produced 5 findings.
#687 (ctx_index directory) — RESOLVED in v1.0.149, thank you
#<bug2 number> (ctx_batch_execute) — opened separately yesterday
THIS issue — the 3 remaining items below
Bundling these instead of opening 3 separate issues to respect your
time as a solo maintainer. Happy to split into 3 if you prefer — just
say the word.
═══════════════════════════════════════════════════════════════════════
──────────────────────────────────────────────
ISSUE A — ctx_insight ignores `query` parameter
──────────────────────────────────────────────
Tool: ctx_insight
Input: { "query": "<any natural language question>" }
Output:
- Copying source files...
- Source files copied.
- Installing dependencies (first run, ~30s)...
- Dependencies installed.
- Building dashboard...
- Dashboard running at http://localhost:4747
PID: <pid>
The `query` parameter is accepted by the JSON-Schema but completely
ignored at runtime. The tool only launches a React analytics dashboard
on port 4747. Naming + parameter shape strongly imply Q&A / RAG-style
synthesis over the indexed corpus — that expectation is wrong.
Cross-platform verification (4/4 ignored the parameter identically):
Claude Code (Opus 4.7) — dashboard launched, query ignored
Codex CLI (GPT-5) — dashboard launched, query ignored
Antigravity (Gemini 3.5 Flash) — dashboard launched, query ignored
Antigravity (Gemini 3.1 Pro) — dashboard launched, query ignored
All 4 LLM agents reached for ctx_insight when asked to "synthesize
an answer about X from the indexed corpus". All 4 had to fall back
to ctx_search after seeing the dashboard URL.
──────────────────────────────────────────────
ISSUE B — ctx_stats observability gap
──────────────────────────────────────────────
Tool: ctx_stats
Input: {} (no args)
Output (current):
- events count
- sessions count
- tokens saved (lifetime + session)
- % reduction
- $ saved estimate
What's NOT exposed:
- cache_hit_rate (nominal hit ratio per query)
- cache_hits_count / cache_miss_count breakdown
- total_chunks_indexed (chunks, not MB)
- last_indexed_at timestamp
- per-query cache state on subsequent identical queries
Effect: optimizing usage is guesswork. We can only INFER cache
behavior by diffing `tokens saved` between two ctx_stats calls,
which is indirect and noisy. During our v1.0.151 verification of
#687, all 3 IDEs (Opus, GPT-5, Gemini) independently noted: when
calling ctx_index on the same path twice, the response does not
indicate whether the second call hit cache or re-indexed (Codex
measured a ~3% time delta, Opus and Gemini saw none).
Cross-platform verification — all 4 clients had to invent the same
workaround (subtract Fase 1 stats from Fase 6 stats to estimate
session savings, with no way to know hit rate per query).
──────────────────────────────────────────────
ISSUE C — ctx_search progressive throttling (undocumented)
──────────────────────────────────────────────
Tool: ctx_search
Pattern: 5+ sequential calls within a short window (~60s), each
with `limit: 3`.
Observed warning (literal, surfaces in response from call #3 onwards):
⚠ search call #N/8 in this window. Results limited to X/query.
Progression we measured:
Call 1: returns up to 3 results (limit honored)
Call 2: returns up to 3 results (limit honored)
Call 3: returns up to 2 results (limit silently cut)
Call 4: returns up to 1 result (limit silently cut)
Call 5+: returns up to 1 result (limit silently cut)
This is reasonable rate-limiting behavior, but:
1. It's nowhere in the tool's JSON-Schema description.
2. Agents have no signal until the warning appears in a response
(already too late — the truncated result already shipped).
3. The exact window size (8 calls) and reset period are not
documented anywhere we could find.
4. Workaround (ctx_search(queries: [...])) is excellent but only
discoverable by reading the warning text mid-session.
Cross-platform verification (3/4 hit the throttle on the same
test):
Claude Code (Opus 4.7) — throttle observed at call #3
Antigravity (Gemini 3.1 Pro) — throttle observed at call #3
Codex CLI (GPT-5) — did NOT hit throttle (suspected
internal cache absorbed repeats)
Antigravity (Gemini 3.5 Flash) — warning surfaced but reported
less explicitly than the others
The v1.0.149 → v1.0.151 changelog confirms none of these 3 code
paths were touched, so all behaviors persist on current latest.
Steps to reproduce (REQUIRED)
Prerequisites (same for all 3 issues):
npm install -g context-mode@1.0.151
Configure MCP in any client (Claude Code example):
claude mcp add --transport stdio --scope user context-mode -- bun x context-mode
──── ISSUE A — ctx_insight ignores query ────
From an agent session, call:
ctx_insight(query: "any natural language question")
Observe: dashboard URL is returned; query is ignored.
Confirm by re-running with no query param — identical output:
ctx_insight()
──── ISSUE B — ctx_stats missing fields ────
Call ctx_stats() — record output JSON shape.
Run any 5 ctx_search calls.
Call ctx_stats() again — note that the only things that changed
are events, tokens saved, % reduction, $ saved. There is
no cache_hit_rate, total_chunks, or last_indexed_at field
in either snapshot.
──── ISSUE C — ctx_search throttling ────
Call ctx_search 5 times in quick succession, each with
different queries, each passing limit: 3:
ctx_search(query: "alpha", limit: 3)
ctx_search(query: "beta", limit: 3)
ctx_search(query: "gamma", limit: 3)
ctx_search(query: "delta", limit: 3)
ctx_search(query: "epsilon", limit: 3)
Confirm the array-form bypasses the throttle:
ctx_search(queries: ["alpha", "beta", "gamma", "delta", "epsilon"], limit: 3)
→ returns 3 per query, no throttle warning.
What have you tried to fix it?
Context: these 3 issues were uncovered during the same cross-platform
audit that produced #687 (ctx_index directory, RESOLVED in v1.0.149)
and the ctx_batch_execute scope issue opened yesterday. The full audit
ran the same 6-phase test prompt across Claude Code (Opus 4.7), Codex
CLI (GPT-5), and Antigravity (Gemini 3.5 Flash + Gemini 3.1 Pro) on
v1.0.146 and v1.0.148, then re-verified bug #687 on v1.0.151. Total
of 5 structural findings; first 4 in flight, this issue covers the
remaining 3. We bundled them to avoid spamming the queue.
──── ISSUE A — ctx_insight ────
Investigation:
Tool surface (context-mode --help) shows no --mode flag or
subcommand for Q&A vs dashboard distinction.
Output text suggests this tool was designed exclusively as a
dashboard launcher; the query slot may be an artifact of an
earlier design or a placeholder.
The dashboard at :4747 is genuinely useful and works perfectly —
the bug is purely about the MCP contract.
Proposed fixes (in order of effort):
Option A1 — Tool description fix (5 LOC):
Update the JSON-Schema description of ctx_insight.query to:
"(Currently unused — reserved for future Q&A synthesis.) Calling
ctx_insight launches the analytics dashboard at localhost:4747.
For Q&A over the indexed corpus, use ctx_search."
Option A2 — Rename for clarity (deprecate gracefully):
Add ctx_dashboard() as an alias, keep ctx_insight as a deprecated
alias for one minor version, then remove the query param.
Option A3 — Implement Q&A (larger scope):
Wire query to a RAG-over-indexed-content path. Probably too
ambitious for a single PR; not recommended unless you already
have plans here.
──── ISSUE B — ctx_stats observability ────
Investigation:
The data needed to compute cache_hit_rate clearly exists internally
(since tokens saved and % reduction are accurate end-to-end).
The chunk count and last_indexed_at presumably live in the SQLite
store already.
Proposed fix (one PR, additive — no breaking changes):
Extend ctx_stats() output with new fields:
{
// ... existing fields ...
"cache_hit_rate": 0.74, // nominal: hits / (hits + miss)
"cache_hits_count": 14,
"cache_miss_count": 5,
"total_chunks_indexed": 4216,
"last_indexed_at": "2026-05-24T14:16:40Z",
"throttle_window_remaining": 5 // ← also covers Issue C below
}
All existing fields preserved. Clients can fall back gracefully if
fields are absent (the new fields are purely additive observability).
Bonus: surface the same data on ctx_doctor for one-shot inspection
without a dedicated stats call.
──── ISSUE C — ctx_search throttling ────
Investigation:
The throttle is clearly intentional (rate-limit / abuse protection).
The warning string proves the server already knows: "call #N/8 in
this window". That counter could be exposed to clients upfront.
The workaround (ctx_search with array of queries) is excellent and
should be the documented happy path for multi-query workloads.
Proposed fixes (in order of effort):
Option C1 — Schema description fix (5 LOC):
Update ctx_search JSON-Schema description to:
"Searches the global indexed corpus. Note: this server throttles
sequential calls — after the 2nd call in a short window, the
effective limit is progressively reduced (3 → 2 → 1 per query).
For multiple searches, prefer passing queries: [...] as an
array, which bypasses the throttle counter."
Option C2 — Expose counter in every response (low effort):
Add throttle_remaining: N to every ctx_search response so the
agent can pace itself. Pairs naturally with Issue B's exposed
fields.
Option C3 — Configurable throttle (later):
Env var CONTEXT_MODE_THROTTLE_WINDOW=8 or similar. Lowest
priority — most users will be fine with the default once it's
documented.
──── Combined PR offer ────
If approach is approved, I can PR all 3 fixes in a single branch:
Total: ~65 LOC + tests, single PR against next, fully backwards-
compatible (everything is additive or descriptive). Happy to
implement if you confirm the approach.
Apologies for the volume of issues this week — these all surfaced
in the same audit and we batched them as much as we could (one
PR, three logical fixes). Let me know if you'd prefer them split.
Pre-submission checklist
I have run the debug script and pasted the output above
Platform
Claude Code
context-mode version
1.0.151
Debug script output (REQUIRED)
{ "context_mode": { "version": "1.0.151", "binary_path": "C:\\Users\\<user>\\AppData\\Roaming\\npm\\node_modules\\context-mode\\", "available_commands": ["doctor", "upgrade", "hook", "statusline"] }, "runtime": { "node": "v25.9.0", "bun": "1.3.14", "npm": "11.12.1" }, "os": { "platform": "Windows 11 Pro", "version": "10.0.26200", "arch": "x64" }, "doctor_output": { "platform_detection": "Claude Code (high confidence)", "performance": "FAST (Bun detected)", "fts5_sqlite": "PASS", "plugin_cache_integrity": "PASS" } }Exact prompt that triggered the bug (REQUIRED)
Full error output (REQUIRED)
Steps to reproduce (REQUIRED)
Prerequisites (same for all 3 issues):
npm install -g context-mode@1.0.151
Configure MCP in any client (Claude Code example):
claude mcp add --transport stdio --scope user context-mode -- bun x context-mode
──── ISSUE A — ctx_insight ignores query ────
ctx_insight(query: "any natural language question")
queryis ignored.queryparam — identical output:ctx_insight()
──── ISSUE B — ctx_stats missing fields ────
are
events,tokens saved,% reduction,$ saved. There isno
cache_hit_rate,total_chunks, orlast_indexed_atfieldin either snapshot.
──── ISSUE C — ctx_search throttling ────
different queries, each passing
limit: 3:ctx_search(query: "alpha", limit: 3)
ctx_search(query: "beta", limit: 3)
ctx_search(query: "gamma", limit: 3)
ctx_search(query: "delta", limit: 3)
ctx_search(query: "epsilon", limit: 3)
⚠ search call #N/8 in this window. Results limited to X/query.
ctx_search(queries: ["alpha", "beta", "gamma", "delta", "epsilon"], limit: 3)
→ returns 3 per query, no throttle warning.
What have you tried to fix it?
Context: these 3 issues were uncovered during the same cross-platform
audit that produced #687 (ctx_index directory, RESOLVED in v1.0.149)
and the ctx_batch_execute scope issue opened yesterday. The full audit
ran the same 6-phase test prompt across Claude Code (Opus 4.7), Codex
CLI (GPT-5), and Antigravity (Gemini 3.5 Flash + Gemini 3.1 Pro) on
v1.0.146 and v1.0.148, then re-verified bug #687 on v1.0.151. Total
of 5 structural findings; first 4 in flight, this issue covers the
remaining 3. We bundled them to avoid spamming the queue.
──── ISSUE A — ctx_insight ────
Investigation:
context-mode --help) shows no--modeflag orsubcommand for Q&A vs dashboard distinction.
dashboard launcher; the
queryslot may be an artifact of anearlier design or a placeholder.
the bug is purely about the MCP contract.
Proposed fixes (in order of effort):
Option A1 — Tool description fix (5 LOC):
Update the JSON-Schema description of
ctx_insight.queryto:"(Currently unused — reserved for future Q&A synthesis.) Calling
ctx_insight launches the analytics dashboard at localhost:4747.
For Q&A over the indexed corpus, use ctx_search."
Option A2 — Rename for clarity (deprecate gracefully):
Add ctx_dashboard() as an alias, keep ctx_insight as a deprecated
alias for one minor version, then remove the
queryparam.Option A3 — Implement Q&A (larger scope):
Wire
queryto a RAG-over-indexed-content path. Probably tooambitious for a single PR; not recommended unless you already
have plans here.
──── ISSUE B — ctx_stats observability ────
Investigation:
(since
tokens savedand% reductionare accurate end-to-end).store already.
Proposed fix (one PR, additive — no breaking changes):
Extend ctx_stats() output with new fields:
{
// ... existing fields ...
"cache_hit_rate": 0.74, // nominal: hits / (hits + miss)
"cache_hits_count": 14,
"cache_miss_count": 5,
"total_chunks_indexed": 4216,
"last_indexed_at": "2026-05-24T14:16:40Z",
"throttle_window_remaining": 5 // ← also covers Issue C below
}
All existing fields preserved. Clients can fall back gracefully if
fields are absent (the new fields are purely additive observability).
Bonus: surface the same data on
ctx_doctorfor one-shot inspectionwithout a dedicated stats call.
──── ISSUE C — ctx_search throttling ────
Investigation:
this window". That counter could be exposed to clients upfront.
should be the documented happy path for multi-query workloads.
Proposed fixes (in order of effort):
Option C1 — Schema description fix (5 LOC):
Update
ctx_searchJSON-Schema description to:"Searches the global indexed corpus. Note: this server throttles
sequential calls — after the 2nd call in a short window, the
effective limit is progressively reduced (3 → 2 → 1 per query).
For multiple searches, prefer passing
queries: [...]as anarray, which bypasses the throttle counter."
Option C2 — Expose counter in every response (low effort):
Add
throttle_remaining: Nto every ctx_search response so theagent can pace itself. Pairs naturally with Issue B's exposed
fields.
Option C3 — Configurable throttle (later):
Env var
CONTEXT_MODE_THROTTLE_WINDOW=8or similar. Lowestpriority — most users will be fine with the default once it's
documented.
──── Combined PR offer ────
If approach is approved, I can PR all 3 fixes in a single branch:
Total: ~65 LOC + tests, single PR against
next, fully backwards-compatible (everything is additive or descriptive). Happy to
implement if you confirm the approach.
Apologies for the volume of issues this week — these all surfaced
in the same audit and we batched them as much as we could (one
PR, three logical fixes). Let me know if you'd prefer them split.
Pre-submission checklist
Operating System
macOS (Apple Silicon)
JS Runtime
Bun 1.3.14