feat(mcp): eager tools/list drift verification + real-world-cache pricing fix by ChasLui · Pull Request #6 · ChasLui/DeepSeek-Reasonix

ChasLui · 2026-05-31T18:12:02Z

Summary

Lands the token-cache optimization plan: Slice 2 (MCP eager drift) plus all six follow-up Open Questions (Q-1…Q-6) — three confirmed fixes and two opt-in features explicitly requested.

`feat(mcp)`: eager tools/list drift check (scheme 10)

loadMcpToolCacheEager awaits the async drift check before building the prefix, so a changed MCP server's latest tools land in the current session instead of the next. On drift it returns null, reusing the existing bridgeMcpTools live-listTools rebuild + re-save path — zero new rebuild logic. Per-server timeout falls back to fire-and-forget. Default on; REASONIX_MCP_EAGER_DRIFT=0 to opt out.

`feat(web)`: web-fetch conditional revalidation (Q-5)

WebFetchCache keeps validators (ETag / Last-Modified) past the fresh TTL in a stale window (soft ttlMs + hard staleTtlMs, LRU allowStale + noDeleteOnStaleGet). webFetch sends If-None-Match / If-Modified-Since for a stale-but-kept entry; a 304 reuses the cached body (no re-download) and refreshes it, a 200 replaces it. Tunable via REASONIX_WEB_FETCH_CACHE_STALE_MS (default 60min).

`feat(context)`: opt-in fold-summary cache (Q-2)

REASONIX_FOLD_CACHE=1 (default off) enables a content-addressed memory LRU around the flash fold summarizer, keyed by sha256 of the head's role+content. A same-process repeat fold of byte-identical turns reuses the summary. Memory-only (honours C-001); off by default since a live session's fold is single-directional, so it only helps repeated in-process folds.

`fix(bench)`: real-world-cache pricing table

README flash cache-hit was 0.028 — 10× the real 0.0028 in src/telemetry/stats.ts — which understated savings. Corrected to $59.69 / ~97.75% (not $48.29 / ~80%), plus pro pricing, the cost table, and the ImmutablePrefix path (src/memory.ts → src/memory/runtime.ts).

`chore(bench)`: estimateRequestTokens profiling probe

Pure-CPU microbenchmark: local tokenize ~2.8ms p50 on a 23K-token session — confirms token-estimate caching would be premature.

Test

MCP eager drift: 5 unit + 69 MCP regression
web-fetch ETag: 3 unit + 23 cache + 59 web-tools
fold-cache: 2 unit + 24 context-manager regression
tsc --noEmit + biome check green; pre-push full suite green

loadMcpToolCacheEager awaits verifyDriftAsync before building the prefix so a changed server's latest tools land in the current session instead of the next. On drift it deletes the cache and returns null, reusing the existing bridgeMcpTools live-listTools rebuild + re-save path (zero new rebuild logic). Per-server timeout (REASONIX_MCP_EAGER_DRIFT_TIMEOUT_MS, default 3000ms) falls back to fire-and-forget; never blocks the first user message. Default on; set REASONIX_MCP_EAGER_DRIFT=0 to opt out. Implements scheme 10 of the token-cache plan (Slice 2).

README flash cache-hit was 0.028, 10x the real 0.0028 in src/telemetry/stats.ts, understating savings. Real figures: $59.69 / ~97.75% off (not $48.29 / ~80%). Corrects both models' prices, the cost table, the savings line, and the ImmutablePrefix path (src/memory.ts -> src/memory/runtime.ts). Also fixes the long-doc-qa probe question (four -> five pillars).

Pure-CPU microbenchmark: local tokenize p50 ~2.8ms on a 23K-token session, ~0.14-1.4% of an API round-trip, confirming token-estimate caching would be premature optimization (scheme 4 of the token-cache plan).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1df6e24c65

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-31T18:17:47Z

 const DOC_BYTES = 8192;
 const QUESTIONS = [
-  "Summarize the four Reasonix architecture pillars in three bullets.",
+  "Summarize the five Reasonix architecture pillars in three bullets.",


Keep the probe question within the supplied slice

With the default DOC_BYTES = 8192, this benchmark sends only the first 8 KiB of docs/ARCHITECTURE.md; that slice includes Pillars 1–3 but stops before the Pillar 4/Pillar 5 sections. When the first turn now asks for all five pillars, the model has to rely on prior/contextual knowledge rather than the provided document, so this real-world cache probe no longer validates doc-grounded QA for its default run. Either include enough bytes to cover all five sections or ask only about content present in the slice.

Useful? React with 👍 / 👎.

WebFetchCache keeps validators (ETag / Last-Modified) past the fresh TTL in a stale window (soft ttlMs + hard staleTtlMs via LRU allowStale + noDeleteOnStaleGet). webFetch sends If-None-Match / If-Modified-Since for a stale-but-kept entry; a 304 reuses the cached body (no re-download) and refreshes it, a 200 replaces it. Tunable via REASONIX_WEB_FETCH_CACHE_STALE_MS (default 60min). Implements scheme 6 / Open Q-5 of the token-cache plan.

REASONIX_FOLD_CACHE=1 (default off) enables a content-addressed LRU around the flash fold summarizer, keyed by sha256 of the head's role+content. A same-process repeat fold of byte-identical turns reuses the summary instead of re-calling. Memory-only (no SQLite, honours C-001); off by default since a live session's fold is single-directional. Implements scheme 7 / Open Q-2 of the token-cache plan.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 50dfdcfbe9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-01T02:06:33Z

+    if (process.env.REASONIX_FOLD_CACHE !== "1") {
+      return this.summarizeForFold(messagesToSummarize);
+    }
+    const basis = JSON.stringify(messagesToSummarize.map((m) => [m.role, m.content ?? ""]));


Key fold summaries on the full messages

When REASONIX_FOLD_CACHE=1, this key only includes each message's role and content, so two fold heads with the same visible text but different tool_calls, tool_call_id, name, prefix, or reasoning_content collide. In tool-heavy sessions, assistant tool-call arguments can be the only place a command/path or MCP call is recorded; a later fold can then reuse a summary generated for a different tool sequence and persist a misleading conversation recap. Include the full fields that summarizeForFold() passes through healLoadedMessages() in the cache basis.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-01T02:06:33Z

+  if (resp.status === 304 && revalidation) {
+    opts.cache?.markRevalidated(url, maxChars);
+    return revalidation.page;


Honor no-store on 304 revalidation responses

When a stale entry is revalidated and the server replies 304 with Cache-Control: no-store or private, this path marks the old body fresh and returns it before applying shouldCacheWebFetchResponse(). That means a page that has changed from cacheable to non-cacheable can keep serving the previously cached body for another TTL window instead of being evicted/refetched. Check the 304 response's cache directives before reusing the stale entry.

Useful? React with 👍 / 👎.

ChasLui added 3 commits June 1, 2026 02:07

chore(bench): add estimateRequestTokens profiling probe

1df6e24

Pure-CPU microbenchmark: local tokenize p50 ~2.8ms on a 23K-token session, ~0.14-1.4% of an API round-trip, confirming token-estimate caching would be premature optimization (scheme 4 of the token-cache plan).

chatgpt-codex-connector Bot reviewed May 31, 2026

View reviewed changes

ChasLui added 2 commits June 1, 2026 02:39

chatgpt-codex-connector Bot reviewed Jun 1, 2026

View reviewed changes

ChasLui merged commit 9aa2dc5 into main Jun 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mcp): eager tools/list drift verification + real-world-cache pricing fix#6

feat(mcp): eager tools/list drift verification + real-world-cache pricing fix#6
ChasLui merged 5 commits into
mainfrom
feat/mcp-eager-drift

ChasLui commented May 31, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 31, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ChasLui commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

feat(mcp): eager tools/list drift check (scheme 10)

feat(web): web-fetch conditional revalidation (Q-5)

feat(context): opt-in fold-summary cache (Q-2)

fix(bench): real-world-cache pricing table

chore(bench): estimateRequestTokens profiling probe

Test

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ChasLui commented May 31, 2026 •

edited

Loading

`feat(mcp)`: eager tools/list drift check (scheme 10)

`feat(web)`: web-fetch conditional revalidation (Q-5)

`feat(context)`: opt-in fold-summary cache (Q-2)

`fix(bench)`: real-world-cache pricing table

`chore(bench)`: estimateRequestTokens profiling probe