perf: eliminate sync I/O from agent loop hot paths#2162
Conversation
Three independent cache layers that together remove redundant disk reads and deep-cloning from the per-iteration fast path: 1. readConfig() mtime cache — 37+ calls per session each did readFileSync + JSON.parse; now statSync (~0.005ms) short-circuits on mtime hit. writeConfig() invalidates so writes are immediately visible. 2. ImmutablePrefix.tools() — every loop iteration deep-cloned the full tool-spec array via structuredClone (~hundreds of KB for 20+ tools). Now returns a frozen shallow copy, invalidated only by addTool/removeTool. 3. AppendOnlyLog.toFullHistory() — buildMessages() calls this 2-3x per iteration; when the log window doesn't cover full history, each call did sync openSync/readSync/closeSync in 64KB chunks. Now caches the parsed result keyed by totalLength, invalidated by append/compactInPlace.
esengine
left a comment
There was a problem hiding this comment.
Reviewed all three caches for invalidation completeness and mutation safety — they're correct. Verified:
-
readConfig mtime cache:
atomicWriteSyncis only ever called fromwriteConfig(config.ts:540), and everysave*/clear*helper persists throughwriteConfig, which_configCache.delete(path)s — so there's no in-process write path that leaves the cache stale.statSyncsits inside the existing try, so a missing config still falls through to{}exactly as before. -
tools() frozen snapshot:
prefix.tools()has a single caller (loop.ts:708) which only reads it — passes it straight to the model-calltoolsfield where client.ts:218 assigns it by reference and JSON-serializes it. Nothing mutates it, so neither the freeze (top-level) nor the sharedfunction.parameters(nested) can bite today. Invalidated correctly on addTool/removeTool. -
toFullHistory cache: every mutator invalidates —
initWindow,append,compactInPlace, andextend(which delegates toappend). The shallow-copy-on-return matches the pre-existing{ ...e }semantics, so no new aliasing exposure versus before.
Good, well-scoped optimization. Two small asks (non-blocking, your call):
- A one-line caveat comment on each cache about the immutability contract. Cache 2's frozen copy is shallow —
function.parametersis shared by reference with_toolSpecs, so a future caller that mutates a schema would silently corrupt it. A note like 'result is deeply immutable; do not mutate' guards the next person, since the type (ToolSpec[]viaas unknown as) hides the readonly-ness. Consider typing itreadonly ToolSpec[]so the compiler enforces it. - readConfig external-edit caveat: the cache trusts
mtimeMs, so an out-of-band edit that preserves mtime (or collides within the FS granularity) would serve stale config. Negligible on modern FS, but worth a one-line comment so it's a known tradeoff.
CI hasn't run (fork PR) — I approved the workflow. Since this touches config.ts (read on basically every frame), I'd like CI green before merging, then it's good to land.
esengine
left a comment
There was a problem hiding this comment.
Following up on my approval — CI came back and CodeQL flagged a high-severity alert in this PR's change, so it's not green after all. Holding until it's resolved.
CodeQL: 'Potential file system race condition' at src/config.ts:465. The new cache introduces a TOCTOU pattern:
const st = statSync(path); // time-of-check (mtime)
const cached = _configCache.get(path);
if (cached && cached.mtimeMs === st.mtimeMs) return cached.cfg;
const raw = readFileSync(path, "utf8")… // time-of-use (content)The old code read the file directly with no preceding statSync, so this pattern is new to the PR. The real-world severity here is low — it's the user's own ~/.reasonix/config.json, not a privilege boundary — but it's a legitimate finding and it's a hard CI gate, so it has to be addressed one way or the other:
- Preferred fix (removes the race): stat the same file descriptor you read, so check and use are atomic —
openSync→fstatSync(fd)formtimeMs→readSync/readFileSync(fd)→closeSync. fstat-on-fd can't race the subsequent read of that same fd. This keeps the cache-hit fast path (you'd stillstatSyncfor the hit check; only the miss path opens the fd) — or restructure so the hit check uses the fd's fstat too. - Alternative: if you'd rather keep
statSync, dismiss the CodeQL alert with an explicit justification ('config file is owned by the user, not a security boundary; stale window is bounded by mtime and self-corrects next read'). That's defensible given the context, but the fstat-on-fd version is cleaner and makes the gate pass on its own.
Everything else in my earlier review still stands — the three caches are correct and invalidation is complete. Just resolve the CodeQL finding (fix or justified dismissal) and re-run, plus the two optional immutability/caveat comments, and this is good to land.
…read Use openSync → fstatSync(fd) → readFileSync(fd) → closeSync so the mtime check and content read operate on the same file descriptor, removing the CodeQL-flagged race window. Also: - Add immutability + mtime caveat comments on _configCache - Type tools() return as readonly ToolSpec[]; widen ChatRequestOptions and StreamModelOptions to accept readonly arrays Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Addressed both review comments: 1. TOCTOU race fix (CodeQL)Replaced
Since 2. Immutability caveats + readonly typing
|
| } finally { | ||
| if (fd !== undefined) { | ||
| try { | ||
| closeSync(fd); |
Replace predictable temp file names (based on process.pid, Date.now, Math.random) with crypto.randomBytes in all 3 atomic-write call sites to prevent symlink-following attacks on temp files. Fixes CodeQL "Insecure temporary file" alert on PR esengine#2162. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Fixed the CodeQL "Insecure temporary file" alert. Root cause: All 3 Fix: Replaced with
The random hex string is 16 characters (64 bits of entropy), making the temp file name unpredictable. All checks pass (tsc + biome). |
esengine
left a comment
There was a problem hiding this comment.
Re-reviewed — the TOCTOU fix is exactly right: openSync → fstatSync(fd) → readFileSync(fd) → closeSync(fd) on all paths (cache-hit, post-read, error), so check and use happen on the same descriptor. CodeQL now passes (the high-severity file-race alert is gone) and full CI is green. The three caches and their invalidation were verified correct on the first pass. Merging.
Rebased onto main post-esengine#2162 merge. Adds _dirty flag alongside _fullHistoryCache — both coexist and are invalidated together in all mutation paths (initWindow/append/compactInPlace). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace per-key `new RegExp(\`\\{${k}\\}\`, "g")` with native
`replaceAll(\`{${k}}\`, String(v))` — avoids regex construction overhead
on every translation lookup. ReadConfig cache already merged in esengine#2162.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace per-key `new RegExp(\`\\{${k}\\}\`, "g")` with native
`replaceAll(\`{${k}}\`, String(v))` — avoids regex construction overhead
on every translation lookup. ReadConfig cache already merged in #2162.
Co-authored-by: HUQIANTAO <HUQIANTAO@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Three independent cache layers that together remove redundant synchronous disk reads and deep-cloning from the per-iteration fast path. Each addresses a distinct bottleneck in the agent loop's critical section.
1.
readConfig()mtime cache (src/config.ts)readConfig()performsreadFileSync+JSON.parseon every call. A grep shows 37+ call sites per session —loadModel,loadLanguage,loadTheme,loadEditMode,loadReasoningEffort,loadDashboardEnabled, and many more all default-argreadConfig().Fix: Module-level
Map<string, { mtimeMs, cfg }>keyed by(path, mtimeMs). On hit,statSync(~0.005ms) replacesreadFileSync+JSON.parse(~0.05ms).writeConfig()deletes the cache entry so writes are immediately visible.Impact: ~37 redundant disk reads per session → ~2 (one per unique config path). On network-mounted home directories or slow disks, this is a measurable latency reduction per UI frame.
2.
ImmutablePrefix.tools()— eliminatestructuredClone(src/memory/runtime.ts)Every loop iteration calls
this.prefix.tools()which deep-clones the entire tool-spec array viastructuredClone. With 20+ registered tools (filesystem, shell, memory, plan, web, MCP tools), each having a JSON schema, this copies hundreds of KB of schema data on every single API round-trip.Fix: Cache a frozen shallow copy (
Object.freeze+ shallow{ ...t, function: { ...t.function } }). Invalidated only byaddTool()/removeTool()— the only mutation paths. Callers inclient.ts(buildPayload) only read specs, never mutate.Impact: Eliminates ~200-500KB of deep-cloning per API call. The frozen reference costs nothing to return on subsequent calls.
3.
AppendOnlyLog.toFullHistory()— cache disk reads (src/memory/runtime.ts)buildMessages()callshealActiveLogBeforeSend()which callstoFullHistory()— 2-3 times per loop iteration (turn-start fold estimate, pre-API call, post-steer injection). When the log window doesn't cover full history, each call does synchronousopenSync/readSync/closeSyncin 64KB chunks, parsing each line withJSON.parse.Fix: Cache the parsed
ChatMessage[]keyed bytotalLength. Invalidated byappend(),compactInPlace(), andinitWindow(). Cache hits return shallow copies to prevent caller mutation.Impact: In a typical 3-buildMessages iteration, 2 of 3 calls now return from cache instead of doing disk I/O. For long sessions where the log window is smaller than total history, this eliminates redundant synchronous file reads that block the event loop.
Test plan
tests/config.test.ts— 95 tests pass (readConfig round-trip, writeConfig atomicity, saveApiKey, loadEndpoint, etc.)tests/loop.test.ts— 69 tests pass (buildMessages, heal pipeline, streaming, tool dispatch)tests/tokenizer.test.ts— 33 tests passbiome checkpasses with no lint issues