Skip to content

feat(bootstrap): cache session's bootstrap files so we don't invalidate prompt cache when modifying MEMORY and friends#22220

Closed
anisoptera wants to merge 11 commits intoopenclaw:mainfrom
anisoptera:bootstrap-caching
Closed

feat(bootstrap): cache session's bootstrap files so we don't invalidate prompt cache when modifying MEMORY and friends#22220
anisoptera wants to merge 11 commits intoopenclaw:mainfrom
anisoptera:bootstrap-caching

Conversation

@anisoptera
Copy link
Contributor

@anisoptera anisoptera commented Feb 20, 2026

Summary

  • Problem: Bootstrap files (SOUL.md, AGENTS.md, MEMORY.md, etc.) are re-read from disk on every agent turn. Memory flush itself writes new content to memory/YYYY-MM-DD.md. If these files are modified they will bust the cache.
  • Why it matters: Each prefix cache miss forces full prompt re-processing on every subsequent turn. Local models feel this significantly. Additionally, a lot of the time we modify these files, we're writing out lessons learned from a long context life. Not a great time to invalidate the cache.
  • What changed: New bootstrap-cache.ts module snapshots bootstrap files on first load per session key. All subsequent calls within the same session — including compaction's resolveBootstrapContextForRun() and the memory flush sub-run — return the snapshot instead of re-reading disk.
  • What did NOT change: Memory flush still writes to disk normally (available to next session). Post-compaction context (readPostCompactionContext) still reads AGENTS.md fresh from disk — intentional, it's re-orienting the agent after compaction, and we're already busting the cache pretty hard. No config changes, no new hooks, no changes to the hook system.

This change was mostly planned by Opus and executed by Sonnet, and then the human went and told them to delete 2/3 of what they wrote, so, at this point, it's her problem, not theirs.

Change Type

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

Memory written during a session (via memory flush or tool calls to memory/*.md) will not appear in the system prompt until the next session. Neither will anything written to SOUL, AGENTS, etc.

Additionally, models all seem to think this stuff doesn't update live. For example, just in this section alone I had to delete a line the model wrote about how it "was already true in most cases." It's uncanny.

Anyway, if you think about it, right now (before this change), when we ask a model to write something in one of these files, we're then killing that context and replacing it with one where it always thought that. It must be very weird to read a conversation about something you know the ending of already.

Security Impact

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

Locally tested before/after and validated that caching behavior is much nicer when working with memory documents.

Environment

  • OS: Linux (Debian)
  • Runtime/container: Node.js
  • Model/provider: Anthropic (claude-* models with prefix caching)
  • Integration/channel: any
  • Relevant config: default

Steps

  1. Work in a long session.
  2. Ask the agent to modify its memory.
  3. Send another message.

Expected

  • System prompt same in this session for next turn

Actual (before this PR)

  • System prompt changed (updated MEMORY) so cache is invalidated

Evidence

  • Perf numbers (if relevant)
    100% cache hit rate after mod of memory, well worth it especially on local.

Human Verification

  • Verified scenarios: cache serves stable content across multiple calls with same session key; different session keys are isolated; long haul test doesn't leak memory all over the floor
  • Edge cases checked: restart of gateway with pending change causes cache invalidation (expected result; not a regression from today's behavior anyway)
  • What you did not verify: Massively parallel long sessions. But I have done a couple shorter parallel ones.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No

Failure Recovery

  • How to disable/revert this change quickly: revert bootstrap-files.ts to use loadWorkspaceBootstrapFiles() directly (one line)
  • Files/config to restore: src/agents/bootstrap-files.ts
  • Known bad symptoms to watch for: The initial onboarding experience might get weird.

Risks and Mitigations

  • Risk: Cache entries accumulate in long-running processes with many sessions
    • Mitigation: None currently. A previous version had a TTL system but this was discarded in favor of simplicity. If the sessions leave memory then so do the cache entries, and we delete the cache when we reset a session or delete it too.

Greptile Summary

Implements session-scoped caching for bootstrap files (AGENTS.md, MEMORY.md, SOUL.md, etc.) to preserve prompt cache when these files are modified during a session.

Key changes:

  • New bootstrap-cache.ts module provides simple Map-based caching keyed by sessionKey
  • resolveBootstrapFilesForRun in bootstrap-files.ts uses cached files when sessionKey is provided
  • Cache is cleared in sessions.ts during session reset/delete via clearBootstrapSnapshot
  • Post-compaction context (readPostCompactionContext) continues to read AGENTS.md fresh from disk, as mentioned in PR description

Impact:

  • Memory writes during a session (e.g., via memory flush) will not appear in the system prompt until the next session
  • Maintains 100% cache hit rate after memory modifications within a session
  • Memory accumulates in the cache Map for the lifetime of the gateway process, cleared only on session reset/delete

Confidence Score: 4/5

  • Safe to merge with careful monitoring of cache behavior in production
  • Implementation is clean and straightforward with good test coverage. The cache invalidation is properly wired to session lifecycle events (reset/delete). However, the cache uses only sessionKey without workspaceDir, which relies on the assumption that a session key is always bound to a single workspace - this was an intentional simplification per commit de20b01. The PR explicitly documents the behavior change where memory modifications won't appear until next session. One minor concern is long-running processes with many sessions could accumulate cache entries, but the PR acknowledges this and notes that cache entries are cleared when sessions are deleted or reset.
  • No files require special attention - implementation is clean and well-tested

Last reviewed commit: aa82281

(4/5) You can add custom instructions or style guidelines for the agent here!

anisoptera and others added 5 commits February 20, 2026 01:34
Remove redundant mockLoad setup (beforeEach handles it) and reword
the comment to make explicit that the third getOrLoadBootstrapFiles
call is what proves sk2's cache survived clearing sk1.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Post-compaction context is intentionally fresh — it re-orients the
agent after compaction, so reading the latest AGENTS.md from disk
is the right behavior. Remove the cache lookup, the opts param, and
the related tests. Also unexport resolveBootstrapCacheKey and drop
getBootstrapFileContent which are no longer needed externally.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- clear snapshot in ensureSessionRuntimeCleanup (fires on both reset and delete)
- drop TTL sweep and sessionId fallback — sessionKey is always present at call sites
- simplify getOrLoadBootstrapFiles to require sessionKey; bootstrap-files.ts falls back to direct disk read when absent

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@openclaw-barnacle openclaw-barnacle bot added gateway Gateway runtime agents Agent runtime and tooling size: S labels Feb 20, 2026
anisoptera and others added 2 commits February 20, 2026 14:27
A session key is bound to a single workspace, so the cache entry
never needs to track or validate workspaceDir. Remove the wrapper
type, the comparison branch, and its test.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@HenryLoenwind
Copy link
Contributor

A big YES for the idea, but I have one issue with the implementation:

The way you're storing the cache---in memory--doesn't survive gateway restarts, but sessions do. It would be better to store the cache in the session, so it survives. There are still plenty of reasons to restart the gateway, and getting an update every second day is just one of them.

@anisoptera
Copy link
Contributor Author

A big YES for the idea, but I have one issue with the implementation:

The way you're storing the cache---in memory--doesn't survive gateway restarts, but sessions do. It would be better to store the cache in the session, so it survives. There are still plenty of reasons to restart the gateway, and getting an update every second day is just one of them.

I dunno. Copying the content of those files into every session json sounds like kind of a lot. All that just to protect against a single cache inval in the corner case where you have a long running session, wrote to your soul, and then restarted the gateway?

@HenryLoenwind
Copy link
Contributor

It's more the memory than the soul I'm concerned with.

And if it's about file size...I see that, and I have a solution: Put it into a second file and drop that once the session gets closed.

@anisoptera
Copy link
Contributor Author

It's more the memory than the soul I'm concerned with.

And if it's about file size...I see that, and I have a solution: Put it into a second file and drop that once the session gets closed.

Again sessions aren’t that long lived that it matters I don’t think. And you’d have to have a separate file for every session and then clean up those files on some regular basis because the gateway restart won’t ask you nicely.

I again think that it’s rare enough to restart your gateway - even once a day is not very often - to make it worth introducing this complexity.

@anisoptera
Copy link
Contributor Author

@HenryLoenwind Thinking about it more, I can see an argument for putting it in the session files, but it’s less of a caching argument and more a consistency argument. It’d be nice to be able to reconstruct the original prompts that the session was run with.

I don’t know if it’s worth blocking this PR over - I’d rather see this get merged than wait for a tweak that stores it in the file. But I might revisit this while waiting and change the behavior.

@anisoptera
Copy link
Contributor Author

@greptileai review

@steipete
Copy link
Contributor

Land note: this is now covered on main.

Implemented via 40db3fef4:

  • Added per-session bootstrap snapshot cache in src/agents/bootstrap-cache.ts.
  • Wired bootstrap-files to reuse cached snapshots when sessionKey is present.
  • Added cleanup invalidation (clearBootstrapSnapshot) during session runtime cleanup.
  • Added dedicated tests in src/agents/bootstrap-cache.test.ts.

Behavior/result: editing dynamic files like MEMORY no longer invalidates cached bootstrap payload for the active session, so prompt cache hit rates stay stable.

Changelog credit is included ("Thanks @anisoptera").

@steipete
Copy link
Contributor

Closing as covered by 40db3fe on main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling gateway Gateway runtime size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants