Skip to content

feat(mcp): eager tools/list drift verification + real-world-cache pricing fix#6

Merged
ChasLui merged 5 commits into
mainfrom
feat/mcp-eager-drift
Jun 1, 2026
Merged

feat(mcp): eager tools/list drift verification + real-world-cache pricing fix#6
ChasLui merged 5 commits into
mainfrom
feat/mcp-eager-drift

Conversation

@ChasLui

@ChasLui ChasLui commented May 31, 2026

Copy link
Copy Markdown
Owner

Summary

Lands the token-cache optimization plan: Slice 2 (MCP eager drift) plus all six follow-up Open Questions (Q-1…Q-6) — three confirmed fixes and two opt-in features explicitly requested.

feat(mcp): eager tools/list drift check (scheme 10)

loadMcpToolCacheEager awaits the async drift check before building the prefix, so a changed MCP server's latest tools land in the current session instead of the next. On drift it returns null, reusing the existing bridgeMcpTools live-listTools rebuild + re-save path — zero new rebuild logic. Per-server timeout falls back to fire-and-forget. Default on; REASONIX_MCP_EAGER_DRIFT=0 to opt out.

feat(web): web-fetch conditional revalidation (Q-5)

WebFetchCache keeps validators (ETag / Last-Modified) past the fresh TTL in a stale window (soft ttlMs + hard staleTtlMs, LRU allowStale + noDeleteOnStaleGet). webFetch sends If-None-Match / If-Modified-Since for a stale-but-kept entry; a 304 reuses the cached body (no re-download) and refreshes it, a 200 replaces it. Tunable via REASONIX_WEB_FETCH_CACHE_STALE_MS (default 60min).

feat(context): opt-in fold-summary cache (Q-2)

REASONIX_FOLD_CACHE=1 (default off) enables a content-addressed memory LRU around the flash fold summarizer, keyed by sha256 of the head's role+content. A same-process repeat fold of byte-identical turns reuses the summary. Memory-only (honours C-001); off by default since a live session's fold is single-directional, so it only helps repeated in-process folds.

fix(bench): real-world-cache pricing table

README flash cache-hit was 0.028 — 10× the real 0.0028 in src/telemetry/stats.ts — which understated savings. Corrected to $59.69 / ~97.75% (not $48.29 / ~80%), plus pro pricing, the cost table, and the ImmutablePrefix path (src/memory.tssrc/memory/runtime.ts).

chore(bench): estimateRequestTokens profiling probe

Pure-CPU microbenchmark: local tokenize ~2.8ms p50 on a 23K-token session — confirms token-estimate caching would be premature.

Test

  • MCP eager drift: 5 unit + 69 MCP regression
  • web-fetch ETag: 3 unit + 23 cache + 59 web-tools
  • fold-cache: 2 unit + 24 context-manager regression
  • tsc --noEmit + biome check green; pre-push full suite green

ChasLui added 3 commits June 1, 2026 02:07
loadMcpToolCacheEager awaits verifyDriftAsync before building the prefix so a changed server's latest tools land in the current session instead of the next. On drift it deletes the cache and returns null, reusing the existing bridgeMcpTools live-listTools rebuild + re-save path (zero new rebuild logic). Per-server timeout (REASONIX_MCP_EAGER_DRIFT_TIMEOUT_MS, default 3000ms) falls back to fire-and-forget; never blocks the first user message. Default on; set REASONIX_MCP_EAGER_DRIFT=0 to opt out. Implements scheme 10 of the token-cache plan (Slice 2).
README flash cache-hit was 0.028, 10x the real 0.0028 in src/telemetry/stats.ts, understating savings. Real figures: $59.69 / ~97.75% off (not $48.29 / ~80%). Corrects both models' prices, the cost table, the savings line, and the ImmutablePrefix path (src/memory.ts -> src/memory/runtime.ts). Also fixes the long-doc-qa probe question (four -> five pillars).
Pure-CPU microbenchmark: local tokenize p50 ~2.8ms on a 23K-token session, ~0.14-1.4% of an API round-trip, confirming token-estimate caching would be premature optimization (scheme 4 of the token-cache plan).

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1df6e24c65

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

const DOC_BYTES = 8192;
const QUESTIONS = [
"Summarize the four Reasonix architecture pillars in three bullets.",
"Summarize the five Reasonix architecture pillars in three bullets.",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep the probe question within the supplied slice

With the default DOC_BYTES = 8192, this benchmark sends only the first 8 KiB of docs/ARCHITECTURE.md; that slice includes Pillars 1–3 but stops before the Pillar 4/Pillar 5 sections. When the first turn now asks for all five pillars, the model has to rely on prior/contextual knowledge rather than the provided document, so this real-world cache probe no longer validates doc-grounded QA for its default run. Either include enough bytes to cover all five sections or ask only about content present in the slice.

Useful? React with 👍 / 👎.

ChasLui added 2 commits June 1, 2026 02:39
WebFetchCache keeps validators (ETag / Last-Modified) past the fresh TTL in a stale window (soft ttlMs + hard staleTtlMs via LRU allowStale + noDeleteOnStaleGet). webFetch sends If-None-Match / If-Modified-Since for a stale-but-kept entry; a 304 reuses the cached body (no re-download) and refreshes it, a 200 replaces it. Tunable via REASONIX_WEB_FETCH_CACHE_STALE_MS (default 60min). Implements scheme 6 / Open Q-5 of the token-cache plan.
REASONIX_FOLD_CACHE=1 (default off) enables a content-addressed LRU around the flash fold summarizer, keyed by sha256 of the head's role+content. A same-process repeat fold of byte-identical turns reuses the summary instead of re-calling. Memory-only (no SQLite, honours C-001); off by default since a live session's fold is single-directional. Implements scheme 7 / Open Q-2 of the token-cache plan.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 50dfdcfbe9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/context-manager.ts
if (process.env.REASONIX_FOLD_CACHE !== "1") {
return this.summarizeForFold(messagesToSummarize);
}
const basis = JSON.stringify(messagesToSummarize.map((m) => [m.role, m.content ?? ""]));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Key fold summaries on the full messages

When REASONIX_FOLD_CACHE=1, this key only includes each message's role and content, so two fold heads with the same visible text but different tool_calls, tool_call_id, name, prefix, or reasoning_content collide. In tool-heavy sessions, assistant tool-call arguments can be the only place a command/path or MCP call is recorded; a later fold can then reuse a summary generated for a different tool sequence and persist a misleading conversation recap. Include the full fields that summarizeForFold() passes through healLoadedMessages() in the cache basis.

Useful? React with 👍 / 👎.

Comment thread src/tools/web.ts
Comment on lines +704 to +706
if (resp.status === 304 && revalidation) {
opts.cache?.markRevalidated(url, maxChars);
return revalidation.page;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor no-store on 304 revalidation responses

When a stale entry is revalidated and the server replies 304 with Cache-Control: no-store or private, this path marks the old body fresh and returns it before applying shouldCacheWebFetchResponse(). That means a page that has changed from cacheable to non-cacheable can keep serving the previously cached body for another TTL window instead of being evicted/refetched. Check the 304 response's cache directives before reusing the stale entry.

Useful? React with 👍 / 👎.

@ChasLui ChasLui merged commit 9aa2dc5 into main Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant