Skip to content

[Bug]: 3 remaining issues from cross-platform audit — ctx_insight Q&A, ctx_stats observability, ctx_search throttling (companion to #687) #697

@matiasduartee

Description

@matiasduartee

Platform

Claude Code

context-mode version

1.0.151

Debug script output (REQUIRED)

{
  "context_mode": {
    "version": "1.0.151",
    "binary_path": "C:\\Users\\<user>\\AppData\\Roaming\\npm\\node_modules\\context-mode\\",
    "available_commands": ["doctor", "upgrade", "hook", "statusline"]
  },
  "runtime": {
    "node": "v25.9.0",
    "bun": "1.3.14",
    "npm": "11.12.1"
  },
  "os": {
    "platform": "Windows 11 Pro",
    "version": "10.0.26200",
    "arch": "x64"
  },
  "doctor_output": {
    "platform_detection": "Claude Code (high confidence)",
    "performance": "FAST (Bun detected)",
    "fts5_sqlite": "PASS",
    "plugin_cache_integrity": "PASS"
  }
}

Exact prompt that triggered the bug (REQUIRED)

This is a bundled report covering 3 distinct issues discovered during the
same cross-platform audit that produced #687 (resolved) and the
ctx_batch_execute scope issue (opened separately). Each issue below has
its own reproducer prompt and was verified across 4 client/model
combinations on v1.0.146, v1.0.148, and v1.0.151 (current).

────────────────────────────────────────────────────────────────────────
ISSUE A — ctx_insight ignores the `query` parameter (dashboard launcher only)
────────────────────────────────────────────────────────────────────────

Prompt that triggered:
   ctx_insight(query: "What is the practical difference between skill A
                       and skill B in this codebase?")

────────────────────────────────────────────────────────────────────────
ISSUE B — ctx_stats omits cache hit rate, chunk count, last_indexed_at
────────────────────────────────────────────────────────────────────────

Prompt that triggered:
   ctx_stats()
   (run twice — once cold, once after ~6 ctx_search calls — to observe
    what the second call reports vs what's available to compute.)

────────────────────────────────────────────────────────────────────────
ISSUE C — ctx_search applies undocumented progressive throttling (3→2→1)
────────────────────────────────────────────────────────────────────────

Prompt that triggered:
   Issue ctx_search sequentially 5+ times within ~60 seconds, each call
   passing `limit: 3`. Observe the response `limit` shrinking after the
   2nd call.

Full error output (REQUIRED)

═══════════════════════════════════════════════════════════════════════
CONTEXT: This bundles 3 issues from an audit that produced 5 findings.
#687 (ctx_index directory)        — RESOLVED in v1.0.149, thank you
#<bug2 number> (ctx_batch_execute) — opened separately yesterday
THIS issue                         — the 3 remaining items below

Bundling these instead of opening 3 separate issues to respect your
time as a solo maintainer. Happy to split into 3 if you prefer — just
say the word.
═══════════════════════════════════════════════════════════════════════

──────────────────────────────────────────────
ISSUE A — ctx_insight ignores `query` parameter
──────────────────────────────────────────────

Tool: ctx_insight
Input: { "query": "<any natural language question>" }
Output:
  - Copying source files...
  - Source files copied.
  - Installing dependencies (first run, ~30s)...
  - Dependencies installed.
  - Building dashboard...
  - Dashboard running at http://localhost:4747
  PID: <pid>

The `query` parameter is accepted by the JSON-Schema but completely
ignored at runtime. The tool only launches a React analytics dashboard
on port 4747. Naming + parameter shape strongly imply Q&A / RAG-style
synthesis over the indexed corpus — that expectation is wrong.

Cross-platform verification (4/4 ignored the parameter identically):
  Claude Code (Opus 4.7)         — dashboard launched, query ignored
  Codex CLI (GPT-5)              — dashboard launched, query ignored
  Antigravity (Gemini 3.5 Flash) — dashboard launched, query ignored
  Antigravity (Gemini 3.1 Pro)   — dashboard launched, query ignored

All 4 LLM agents reached for ctx_insight when asked to "synthesize
an answer about X from the indexed corpus". All 4 had to fall back
to ctx_search after seeing the dashboard URL.

──────────────────────────────────────────────
ISSUE B — ctx_stats observability gap
──────────────────────────────────────────────

Tool: ctx_stats
Input: {} (no args)
Output (current):
  - events count
  - sessions count
  - tokens saved (lifetime + session)
  - % reduction
  - $ saved estimate

What's NOT exposed:
  - cache_hit_rate (nominal hit ratio per query)
  - cache_hits_count / cache_miss_count breakdown
  - total_chunks_indexed (chunks, not MB)
  - last_indexed_at timestamp
  - per-query cache state on subsequent identical queries

Effect: optimizing usage is guesswork. We can only INFER cache
behavior by diffing `tokens saved` between two ctx_stats calls,
which is indirect and noisy. During our v1.0.151 verification of
#687, all 3 IDEs (Opus, GPT-5, Gemini) independently noted: when
calling ctx_index on the same path twice, the response does not
indicate whether the second call hit cache or re-indexed (Codex
measured a ~3% time delta, Opus and Gemini saw none).

Cross-platform verification — all 4 clients had to invent the same
workaround (subtract Fase 1 stats from Fase 6 stats to estimate
session savings, with no way to know hit rate per query).

──────────────────────────────────────────────
ISSUE C — ctx_search progressive throttling (undocumented)
──────────────────────────────────────────────

Tool: ctx_search
Pattern: 5+ sequential calls within a short window (~60s), each
with `limit: 3`.

Observed warning (literal, surfaces in response from call #3 onwards):
  ⚠ search call #N/8 in this window. Results limited to X/query.

Progression we measured:
  Call 1: returns up to 3 results (limit honored)
  Call 2: returns up to 3 results (limit honored)
  Call 3: returns up to 2 results (limit silently cut)
  Call 4: returns up to 1 result (limit silently cut)
  Call 5+: returns up to 1 result (limit silently cut)

This is reasonable rate-limiting behavior, but:
  1. It's nowhere in the tool's JSON-Schema description.
  2. Agents have no signal until the warning appears in a response
     (already too late — the truncated result already shipped).
  3. The exact window size (8 calls) and reset period are not
     documented anywhere we could find.
  4. Workaround (ctx_search(queries: [...])) is excellent but only
     discoverable by reading the warning text mid-session.

Cross-platform verification (3/4 hit the throttle on the same
test):
  Claude Code (Opus 4.7)         — throttle observed at call #3
  Antigravity (Gemini 3.1 Pro)   — throttle observed at call #3
  Codex CLI (GPT-5)              — did NOT hit throttle (suspected
                                   internal cache absorbed repeats)
  Antigravity (Gemini 3.5 Flash) — warning surfaced but reported
                                   less explicitly than the others

The v1.0.149 → v1.0.151 changelog confirms none of these 3 code
paths were touched, so all behaviors persist on current latest.

Steps to reproduce (REQUIRED)

Prerequisites (same for all 3 issues):
npm install -g context-mode@1.0.151
Configure MCP in any client (Claude Code example):
claude mcp add --transport stdio --scope user context-mode -- bun x context-mode

──── ISSUE A — ctx_insight ignores query ────

  1. From an agent session, call:
    ctx_insight(query: "any natural language question")
  2. Observe: dashboard URL is returned; query is ignored.
  3. Confirm by re-running with no query param — identical output:
    ctx_insight()

──── ISSUE B — ctx_stats missing fields ────

  1. Call ctx_stats() — record output JSON shape.
  2. Run any 5 ctx_search calls.
  3. Call ctx_stats() again — note that the only things that changed
    are events, tokens saved, % reduction, $ saved. There is
    no cache_hit_rate, total_chunks, or last_indexed_at field
    in either snapshot.

──── ISSUE C — ctx_search throttling ────

  1. Call ctx_search 5 times in quick succession, each with
    different queries, each passing limit: 3:
    ctx_search(query: "alpha", limit: 3)
    ctx_search(query: "beta", limit: 3)
    ctx_search(query: "gamma", limit: 3)
    ctx_search(query: "delta", limit: 3)
    ctx_search(query: "epsilon", limit: 3)
  2. Observe response from call feat: add Cursor Marketplace support #3 onwards:
    ⚠ search call #N/8 in this window. Results limited to X/query.
  3. Confirm the array-form bypasses the throttle:
    ctx_search(queries: ["alpha", "beta", "gamma", "delta", "epsilon"], limit: 3)
    → returns 3 per query, no throttle warning.

What have you tried to fix it?

Context: these 3 issues were uncovered during the same cross-platform
audit that produced #687 (ctx_index directory, RESOLVED in v1.0.149)
and the ctx_batch_execute scope issue opened yesterday. The full audit
ran the same 6-phase test prompt across Claude Code (Opus 4.7), Codex
CLI (GPT-5), and Antigravity (Gemini 3.5 Flash + Gemini 3.1 Pro) on
v1.0.146 and v1.0.148, then re-verified bug #687 on v1.0.151. Total
of 5 structural findings; first 4 in flight, this issue covers the
remaining 3. We bundled them to avoid spamming the queue.

──── ISSUE A — ctx_insight ────

Investigation:

  • Tool surface (context-mode --help) shows no --mode flag or
    subcommand for Q&A vs dashboard distinction.
  • Output text suggests this tool was designed exclusively as a
    dashboard launcher; the query slot may be an artifact of an
    earlier design or a placeholder.
  • The dashboard at :4747 is genuinely useful and works perfectly —
    the bug is purely about the MCP contract.

Proposed fixes (in order of effort):

Option A1 — Tool description fix (5 LOC):
Update the JSON-Schema description of ctx_insight.query to:
"(Currently unused — reserved for future Q&A synthesis.) Calling
ctx_insight launches the analytics dashboard at localhost:4747.
For Q&A over the indexed corpus, use ctx_search."

Option A2 — Rename for clarity (deprecate gracefully):
Add ctx_dashboard() as an alias, keep ctx_insight as a deprecated
alias for one minor version, then remove the query param.

Option A3 — Implement Q&A (larger scope):
Wire query to a RAG-over-indexed-content path. Probably too
ambitious for a single PR; not recommended unless you already
have plans here.

──── ISSUE B — ctx_stats observability ────

Investigation:

  • The data needed to compute cache_hit_rate clearly exists internally
    (since tokens saved and % reduction are accurate end-to-end).
  • The chunk count and last_indexed_at presumably live in the SQLite
    store already.

Proposed fix (one PR, additive — no breaking changes):

Extend ctx_stats() output with new fields:
{
// ... existing fields ...
"cache_hit_rate": 0.74, // nominal: hits / (hits + miss)
"cache_hits_count": 14,
"cache_miss_count": 5,
"total_chunks_indexed": 4216,
"last_indexed_at": "2026-05-24T14:16:40Z",
"throttle_window_remaining": 5 // ← also covers Issue C below
}

All existing fields preserved. Clients can fall back gracefully if
fields are absent (the new fields are purely additive observability).

Bonus: surface the same data on ctx_doctor for one-shot inspection
without a dedicated stats call.

──── ISSUE C — ctx_search throttling ────

Investigation:

  • The throttle is clearly intentional (rate-limit / abuse protection).
  • The warning string proves the server already knows: "call #N/8 in
    this window". That counter could be exposed to clients upfront.
  • The workaround (ctx_search with array of queries) is excellent and
    should be the documented happy path for multi-query workloads.

Proposed fixes (in order of effort):

Option C1 — Schema description fix (5 LOC):
Update ctx_search JSON-Schema description to:
"Searches the global indexed corpus. Note: this server throttles
sequential calls — after the 2nd call in a short window, the
effective limit is progressively reduced (3 → 2 → 1 per query).
For multiple searches, prefer passing queries: [...] as an
array, which bypasses the throttle counter."

Option C2 — Expose counter in every response (low effort):
Add throttle_remaining: N to every ctx_search response so the
agent can pace itself. Pairs naturally with Issue B's exposed
fields.

Option C3 — Configurable throttle (later):
Env var CONTEXT_MODE_THROTTLE_WINDOW=8 or similar. Lowest
priority — most users will be fine with the default once it's
documented.

──── Combined PR offer ────

If approach is approved, I can PR all 3 fixes in a single branch:

  • Option A1 (insight description fix): ~5 LOC
  • Issue B observability fields: ~40 LOC + tests
  • Option C1 + C2 (search description + throttle counter): ~20 LOC

Total: ~65 LOC + tests, single PR against next, fully backwards-
compatible (everything is additive or descriptive). Happy to
implement if you confirm the approach.

Apologies for the volume of issues this week — these all surfaced
in the same audit and we batched them as much as we could (one
PR, three logical fixes). Let me know if you'd prefer them split.

Pre-submission checklist

  • I have run the debug script and pasted the output above
  • I am using the latest version of context-mode
  • I have searched existing issues for duplicates
  • I have included steps to reproduce the issue

Operating System

macOS (Apple Silicon)

JS Runtime

Bun 1.3.14

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions