[Bug]: 3 remaining issues from cross-platform audit — ctx_insight Q&A, ctx_stats observability, ctx_search throttling (companion to #687)

### Platform

Claude Code

### context-mode version

1.0.151

### Debug script output (REQUIRED)

```json
{
  "context_mode": {
    "version": "1.0.151",
    "binary_path": "C:\\Users\\<user>\\AppData\\Roaming\\npm\\node_modules\\context-mode\\",
    "available_commands": ["doctor", "upgrade", "hook", "statusline"]
  },
  "runtime": {
    "node": "v25.9.0",
    "bun": "1.3.14",
    "npm": "11.12.1"
  },
  "os": {
    "platform": "Windows 11 Pro",
    "version": "10.0.26200",
    "arch": "x64"
  },
  "doctor_output": {
    "platform_detection": "Claude Code (high confidence)",
    "performance": "FAST (Bun detected)",
    "fts5_sqlite": "PASS",
    "plugin_cache_integrity": "PASS"
  }
}
```

### Exact prompt that triggered the bug (REQUIRED)

```text
This is a bundled report covering 3 distinct issues discovered during the
same cross-platform audit that produced #687 (resolved) and the
ctx_batch_execute scope issue (opened separately). Each issue below has
its own reproducer prompt and was verified across 4 client/model
combinations on v1.0.146, v1.0.148, and v1.0.151 (current).

────────────────────────────────────────────────────────────────────────
ISSUE A — ctx_insight ignores the `query` parameter (dashboard launcher only)
────────────────────────────────────────────────────────────────────────

Prompt that triggered:
   ctx_insight(query: "What is the practical difference between skill A
                       and skill B in this codebase?")

────────────────────────────────────────────────────────────────────────
ISSUE B — ctx_stats omits cache hit rate, chunk count, last_indexed_at
────────────────────────────────────────────────────────────────────────

Prompt that triggered:
   ctx_stats()
   (run twice — once cold, once after ~6 ctx_search calls — to observe
    what the second call reports vs what's available to compute.)

────────────────────────────────────────────────────────────────────────
ISSUE C — ctx_search applies undocumented progressive throttling (3→2→1)
────────────────────────────────────────────────────────────────────────

Prompt that triggered:
   Issue ctx_search sequentially 5+ times within ~60 seconds, each call
   passing `limit: 3`. Observe the response `limit` shrinking after the
   2nd call.
```

### Full error output (REQUIRED)

```text
═══════════════════════════════════════════════════════════════════════
CONTEXT: This bundles 3 issues from an audit that produced 5 findings.
#687 (ctx_index directory)        — RESOLVED in v1.0.149, thank you
#<bug2 number> (ctx_batch_execute) — opened separately yesterday
THIS issue                         — the 3 remaining items below

Bundling these instead of opening 3 separate issues to respect your
time as a solo maintainer. Happy to split into 3 if you prefer — just
say the word.
═══════════════════════════════════════════════════════════════════════

──────────────────────────────────────────────
ISSUE A — ctx_insight ignores `query` parameter
──────────────────────────────────────────────

Tool: ctx_insight
Input: { "query": "<any natural language question>" }
Output:
  - Copying source files...
  - Source files copied.
  - Installing dependencies (first run, ~30s)...
  - Dependencies installed.
  - Building dashboard...
  - Dashboard running at http://localhost:4747
  PID: <pid>

The `query` parameter is accepted by the JSON-Schema but completely
ignored at runtime. The tool only launches a React analytics dashboard
on port 4747. Naming + parameter shape strongly imply Q&A / RAG-style
synthesis over the indexed corpus — that expectation is wrong.

Cross-platform verification (4/4 ignored the parameter identically):
  Claude Code (Opus 4.7)         — dashboard launched, query ignored
  Codex CLI (GPT-5)              — dashboard launched, query ignored
  Antigravity (Gemini 3.5 Flash) — dashboard launched, query ignored
  Antigravity (Gemini 3.1 Pro)   — dashboard launched, query ignored

All 4 LLM agents reached for ctx_insight when asked to "synthesize
an answer about X from the indexed corpus". All 4 had to fall back
to ctx_search after seeing the dashboard URL.

──────────────────────────────────────────────
ISSUE B — ctx_stats observability gap
──────────────────────────────────────────────

Tool: ctx_stats
Input: {} (no args)
Output (current):
  - events count
  - sessions count
  - tokens saved (lifetime + session)
  - % reduction
  - $ saved estimate

What's NOT exposed:
  - cache_hit_rate (nominal hit ratio per query)
  - cache_hits_count / cache_miss_count breakdown
  - total_chunks_indexed (chunks, not MB)
  - last_indexed_at timestamp
  - per-query cache state on subsequent identical queries

Effect: optimizing usage is guesswork. We can only INFER cache
behavior by diffing `tokens saved` between two ctx_stats calls,
which is indirect and noisy. During our v1.0.151 verification of
#687, all 3 IDEs (Opus, GPT-5, Gemini) independently noted: when
calling ctx_index on the same path twice, the response does not
indicate whether the second call hit cache or re-indexed (Codex
measured a ~3% time delta, Opus and Gemini saw none).

Cross-platform verification — all 4 clients had to invent the same
workaround (subtract Fase 1 stats from Fase 6 stats to estimate
session savings, with no way to know hit rate per query).

──────────────────────────────────────────────
ISSUE C — ctx_search progressive throttling (undocumented)
──────────────────────────────────────────────

Tool: ctx_search
Pattern: 5+ sequential calls within a short window (~60s), each
with `limit: 3`.

Observed warning (literal, surfaces in response from call #3 onwards):
  ⚠ search call #N/8 in this window. Results limited to X/query.

Progression we measured:
  Call 1: returns up to 3 results (limit honored)
  Call 2: returns up to 3 results (limit honored)
  Call 3: returns up to 2 results (limit silently cut)
  Call 4: returns up to 1 result (limit silently cut)
  Call 5+: returns up to 1 result (limit silently cut)

This is reasonable rate-limiting behavior, but:
  1. It's nowhere in the tool's JSON-Schema description.
  2. Agents have no signal until the warning appears in a response
     (already too late — the truncated result already shipped).
  3. The exact window size (8 calls) and reset period are not
     documented anywhere we could find.
  4. Workaround (ctx_search(queries: [...])) is excellent but only
     discoverable by reading the warning text mid-session.

Cross-platform verification (3/4 hit the throttle on the same
test):
  Claude Code (Opus 4.7)         — throttle observed at call #3
  Antigravity (Gemini 3.1 Pro)   — throttle observed at call #3
  Codex CLI (GPT-5)              — did NOT hit throttle (suspected
                                   internal cache absorbed repeats)
  Antigravity (Gemini 3.5 Flash) — warning surfaced but reported
                                   less explicitly than the others

The v1.0.149 → v1.0.151 changelog confirms none of these 3 code
paths were touched, so all behaviors persist on current latest.
```

### Steps to reproduce (REQUIRED)

Prerequisites (same for all 3 issues):
  npm install -g context-mode@1.0.151
  Configure MCP in any client (Claude Code example):
    claude mcp add --transport stdio --scope user context-mode -- bun x context-mode

──── ISSUE A — ctx_insight ignores query ────
1. From an agent session, call:
     ctx_insight(query: "any natural language question")
2. Observe: dashboard URL is returned; `query` is ignored.
3. Confirm by re-running with no `query` param — identical output:
     ctx_insight()

──── ISSUE B — ctx_stats missing fields ────
1. Call ctx_stats() — record output JSON shape.
2. Run any 5 ctx_search calls.
3. Call ctx_stats() again — note that the only things that changed
   are `events`, `tokens saved`, `% reduction`, `$ saved`. There is
   no `cache_hit_rate`, `total_chunks`, or `last_indexed_at` field
   in either snapshot.

──── ISSUE C — ctx_search throttling ────
1. Call ctx_search 5 times in quick succession, each with
   different queries, each passing `limit: 3`:
     ctx_search(query: "alpha", limit: 3)
     ctx_search(query: "beta",  limit: 3)
     ctx_search(query: "gamma", limit: 3)
     ctx_search(query: "delta", limit: 3)
     ctx_search(query: "epsilon", limit: 3)
2. Observe response from call #3 onwards:
     ⚠ search call #N/8 in this window. Results limited to X/query.
3. Confirm the array-form bypasses the throttle:
     ctx_search(queries: ["alpha", "beta", "gamma", "delta", "epsilon"], limit: 3)
   → returns 3 per query, no throttle warning.


### What have you tried to fix it?

Context: these 3 issues were uncovered during the same cross-platform
audit that produced #687 (ctx_index directory, RESOLVED in v1.0.149)
and the ctx_batch_execute scope issue opened yesterday. The full audit
ran the same 6-phase test prompt across Claude Code (Opus 4.7), Codex
CLI (GPT-5), and Antigravity (Gemini 3.5 Flash + Gemini 3.1 Pro) on
v1.0.146 and v1.0.148, then re-verified bug #687 on v1.0.151. Total
of 5 structural findings; first 4 in flight, this issue covers the
remaining 3. We bundled them to avoid spamming the queue.

──── ISSUE A — ctx_insight ────

Investigation:
- Tool surface (`context-mode --help`) shows no `--mode` flag or
  subcommand for Q&A vs dashboard distinction.
- Output text suggests this tool was designed exclusively as a
  dashboard launcher; the `query` slot may be an artifact of an
  earlier design or a placeholder.
- The dashboard at :4747 is genuinely useful and works perfectly —
  the bug is purely about the MCP contract.

Proposed fixes (in order of effort):

  Option A1 — Tool description fix (5 LOC):
    Update the JSON-Schema description of `ctx_insight.query` to:
    "(Currently unused — reserved for future Q&A synthesis.) Calling
     ctx_insight launches the analytics dashboard at localhost:4747.
     For Q&A over the indexed corpus, use ctx_search."

  Option A2 — Rename for clarity (deprecate gracefully):
    Add ctx_dashboard() as an alias, keep ctx_insight as a deprecated
    alias for one minor version, then remove the `query` param.

  Option A3 — Implement Q&A (larger scope):
    Wire `query` to a RAG-over-indexed-content path. Probably too
    ambitious for a single PR; not recommended unless you already
    have plans here.

──── ISSUE B — ctx_stats observability ────

Investigation:
- The data needed to compute cache_hit_rate clearly exists internally
  (since `tokens saved` and `% reduction` are accurate end-to-end).
- The chunk count and last_indexed_at presumably live in the SQLite
  store already.

Proposed fix (one PR, additive — no breaking changes):

  Extend ctx_stats() output with new fields:
    {
      // ... existing fields ...
      "cache_hit_rate": 0.74,           // nominal: hits / (hits + miss)
      "cache_hits_count": 14,
      "cache_miss_count": 5,
      "total_chunks_indexed": 4216,
      "last_indexed_at": "2026-05-24T14:16:40Z",
      "throttle_window_remaining": 5    // ← also covers Issue C below
    }

  All existing fields preserved. Clients can fall back gracefully if
  fields are absent (the new fields are purely additive observability).

  Bonus: surface the same data on `ctx_doctor` for one-shot inspection
  without a dedicated stats call.

──── ISSUE C — ctx_search throttling ────

Investigation:
- The throttle is clearly intentional (rate-limit / abuse protection).
- The warning string proves the server already knows: "call #N/8 in
  this window". That counter could be exposed to clients upfront.
- The workaround (ctx_search with array of queries) is excellent and
  should be the documented happy path for multi-query workloads.

Proposed fixes (in order of effort):

  Option C1 — Schema description fix (5 LOC):
    Update `ctx_search` JSON-Schema description to:
    "Searches the global indexed corpus. Note: this server throttles
     sequential calls — after the 2nd call in a short window, the
     effective limit is progressively reduced (3 → 2 → 1 per query).
     For multiple searches, prefer passing `queries: [...]` as an
     array, which bypasses the throttle counter."

  Option C2 — Expose counter in every response (low effort):
    Add `throttle_remaining: N` to every ctx_search response so the
    agent can pace itself. Pairs naturally with Issue B's exposed
    fields.

  Option C3 — Configurable throttle (later):
    Env var `CONTEXT_MODE_THROTTLE_WINDOW=8` or similar. Lowest
    priority — most users will be fine with the default once it's
    documented.

──── Combined PR offer ────

If approach is approved, I can PR all 3 fixes in a single branch:
  - Option A1 (insight description fix): ~5 LOC
  - Issue B observability fields:        ~40 LOC + tests
  - Option C1 + C2 (search description + throttle counter): ~20 LOC

Total: ~65 LOC + tests, single PR against `next`, fully backwards-
compatible (everything is additive or descriptive). Happy to
implement if you confirm the approach.

Apologies for the volume of issues this week — these all surfaced
in the same audit and we batched them as much as we could (one
PR, three logical fixes). Let me know if you'd prefer them split.


### Pre-submission checklist

- [x] I have run the debug script and pasted the output above
- [x] I am using the latest version of context-mode
- [x] I have searched existing issues for duplicates
- [x] I have included steps to reproduce the issue

### Operating System

macOS (Apple Silicon)

### JS Runtime

Bun 1.3.14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: 3 remaining issues from cross-platform audit — ctx_insight Q&A, ctx_stats observability, ctx_search throttling (companion to #687) #697

Platform

context-mode version

Debug script output (REQUIRED)

Exact prompt that triggered the bug (REQUIRED)

Full error output (REQUIRED)

Steps to reproduce (REQUIRED)

What have you tried to fix it?

Pre-submission checklist

Operating System

JS Runtime

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: 3 remaining issues from cross-platform audit — ctx_insight Q&A, ctx_stats observability, ctx_search throttling (companion to #687) #697

Description

Platform

context-mode version

Debug script output (REQUIRED)

Exact prompt that triggered the bug (REQUIRED)

Full error output (REQUIRED)

Steps to reproduce (REQUIRED)

What have you tried to fix it?

Pre-submission checklist

Operating System

JS Runtime

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions