feat(mcp): eager tools/list drift verification + real-world-cache pricing fix#6
Conversation
loadMcpToolCacheEager awaits verifyDriftAsync before building the prefix so a changed server's latest tools land in the current session instead of the next. On drift it deletes the cache and returns null, reusing the existing bridgeMcpTools live-listTools rebuild + re-save path (zero new rebuild logic). Per-server timeout (REASONIX_MCP_EAGER_DRIFT_TIMEOUT_MS, default 3000ms) falls back to fire-and-forget; never blocks the first user message. Default on; set REASONIX_MCP_EAGER_DRIFT=0 to opt out. Implements scheme 10 of the token-cache plan (Slice 2).
README flash cache-hit was 0.028, 10x the real 0.0028 in src/telemetry/stats.ts, understating savings. Real figures: $59.69 / ~97.75% off (not $48.29 / ~80%). Corrects both models' prices, the cost table, the savings line, and the ImmutablePrefix path (src/memory.ts -> src/memory/runtime.ts). Also fixes the long-doc-qa probe question (four -> five pillars).
Pure-CPU microbenchmark: local tokenize p50 ~2.8ms on a 23K-token session, ~0.14-1.4% of an API round-trip, confirming token-estimate caching would be premature optimization (scheme 4 of the token-cache plan).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1df6e24c65
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const DOC_BYTES = 8192; | ||
| const QUESTIONS = [ | ||
| "Summarize the four Reasonix architecture pillars in three bullets.", | ||
| "Summarize the five Reasonix architecture pillars in three bullets.", |
There was a problem hiding this comment.
Keep the probe question within the supplied slice
With the default DOC_BYTES = 8192, this benchmark sends only the first 8 KiB of docs/ARCHITECTURE.md; that slice includes Pillars 1–3 but stops before the Pillar 4/Pillar 5 sections. When the first turn now asks for all five pillars, the model has to rely on prior/contextual knowledge rather than the provided document, so this real-world cache probe no longer validates doc-grounded QA for its default run. Either include enough bytes to cover all five sections or ask only about content present in the slice.
Useful? React with 👍 / 👎.
WebFetchCache keeps validators (ETag / Last-Modified) past the fresh TTL in a stale window (soft ttlMs + hard staleTtlMs via LRU allowStale + noDeleteOnStaleGet). webFetch sends If-None-Match / If-Modified-Since for a stale-but-kept entry; a 304 reuses the cached body (no re-download) and refreshes it, a 200 replaces it. Tunable via REASONIX_WEB_FETCH_CACHE_STALE_MS (default 60min). Implements scheme 6 / Open Q-5 of the token-cache plan.
REASONIX_FOLD_CACHE=1 (default off) enables a content-addressed LRU around the flash fold summarizer, keyed by sha256 of the head's role+content. A same-process repeat fold of byte-identical turns reuses the summary instead of re-calling. Memory-only (no SQLite, honours C-001); off by default since a live session's fold is single-directional. Implements scheme 7 / Open Q-2 of the token-cache plan.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 50dfdcfbe9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (process.env.REASONIX_FOLD_CACHE !== "1") { | ||
| return this.summarizeForFold(messagesToSummarize); | ||
| } | ||
| const basis = JSON.stringify(messagesToSummarize.map((m) => [m.role, m.content ?? ""])); |
There was a problem hiding this comment.
Key fold summaries on the full messages
When REASONIX_FOLD_CACHE=1, this key only includes each message's role and content, so two fold heads with the same visible text but different tool_calls, tool_call_id, name, prefix, or reasoning_content collide. In tool-heavy sessions, assistant tool-call arguments can be the only place a command/path or MCP call is recorded; a later fold can then reuse a summary generated for a different tool sequence and persist a misleading conversation recap. Include the full fields that summarizeForFold() passes through healLoadedMessages() in the cache basis.
Useful? React with 👍 / 👎.
| if (resp.status === 304 && revalidation) { | ||
| opts.cache?.markRevalidated(url, maxChars); | ||
| return revalidation.page; |
There was a problem hiding this comment.
Honor no-store on 304 revalidation responses
When a stale entry is revalidated and the server replies 304 with Cache-Control: no-store or private, this path marks the old body fresh and returns it before applying shouldCacheWebFetchResponse(). That means a page that has changed from cacheable to non-cacheable can keep serving the previously cached body for another TTL window instead of being evicted/refetched. Check the 304 response's cache directives before reusing the stale entry.
Useful? React with 👍 / 👎.
Summary
Lands the token-cache optimization plan: Slice 2 (MCP eager drift) plus all six follow-up Open Questions (Q-1…Q-6) — three confirmed fixes and two opt-in features explicitly requested.
feat(mcp): eager tools/list drift check (scheme 10)loadMcpToolCacheEagerawaits the async drift check before building the prefix, so a changed MCP server's latest tools land in the current session instead of the next. On drift it returnsnull, reusing the existingbridgeMcpToolslive-listToolsrebuild + re-save path — zero new rebuild logic. Per-server timeout falls back to fire-and-forget. Default on;REASONIX_MCP_EAGER_DRIFT=0to opt out.feat(web): web-fetch conditional revalidation (Q-5)WebFetchCachekeeps validators (ETag / Last-Modified) past the fresh TTL in a stale window (softttlMs+ hardstaleTtlMs, LRUallowStale+noDeleteOnStaleGet).webFetchsendsIf-None-Match/If-Modified-Sincefor a stale-but-kept entry; a 304 reuses the cached body (no re-download) and refreshes it, a 200 replaces it. Tunable viaREASONIX_WEB_FETCH_CACHE_STALE_MS(default 60min).feat(context): opt-in fold-summary cache (Q-2)REASONIX_FOLD_CACHE=1(default off) enables a content-addressed memory LRU around the flash fold summarizer, keyed by sha256 of the head's role+content. A same-process repeat fold of byte-identical turns reuses the summary. Memory-only (honours C-001); off by default since a live session's fold is single-directional, so it only helps repeated in-process folds.fix(bench): real-world-cache pricing tableREADME flash cache-hit was
0.028— 10× the real0.0028insrc/telemetry/stats.ts— which understated savings. Corrected to $59.69 / ~97.75% (not$48.29 / ~80%), plus pro pricing, the cost table, and theImmutablePrefixpath (src/memory.ts→src/memory/runtime.ts).chore(bench): estimateRequestTokens profiling probePure-CPU microbenchmark: local tokenize ~2.8ms p50 on a 23K-token session — confirms token-estimate caching would be premature.
Test
tsc --noEmit+biome checkgreen; pre-push full suite green