feat(memory): add Auto Memory inbox flow with canonical-patch contract#26338
feat(memory): add Auto Memory inbox flow with canonical-patch contract#26338SandyTao520 merged 8 commits intomainfrom
Conversation
Adds an experimental background extraction agent (`experimental.autoMemory`,
default off) that scans past conversation sessions on session startup and
proposes durable memory updates as unified-diff `.patch` files in a project-
local inbox. Nothing is applied automatically — the user reviews each entry
in `/memory inbox` and approves or dismisses it.
Memory tiers and storage
- private -> <projectMemoryDir>/.inbox/private/extraction.patch
applied to <projectMemoryDir>/MEMORY.md (and sibling .md
topic files; sibling creations get an auto-bundled MEMORY.md
pointer using the absolute path)
- global -> <projectMemoryDir>/.inbox/global/extraction.patch
applied to ~/.gemini/GEMINI.md (single-file allowlist;
no other files in ~/.gemini/ are reachable)
- skills -> reuses the existing skill inbox flow
Project-shared `<projectRoot>/GEMINI.md` is intentionally excluded from auto-
extraction; the agent prompt forbids it and the rollback safety net enforces
it on top.
Inbox UX
- One consolidated entry per kind (Private memory / Global memory) even when
multiple source patches accumulate across sessions.
- Apply runs each underlying source patch atomically in lexical order with
aggregated success/failure reporting; Dismiss removes them all.
- Listing pre-filters patches whose targets escape the kind's allowed root
so only actionable items surface.
- Preview groups hunks by target file (a file updated by N source patches
shows once with N changes, not N separate sections).
- ESC from the apply/dismiss dialog restores the previous list selection
instead of jumping back to row 0.
Agent contract
- Single canonical filename per kind: `extraction.patch`. Pending inbox
contents are surfaced into the agent's initial context so it rewrites
the existing patch incrementally rather than creating new files each
session.
- All patches use unified diff with absolute paths in headers; `--- /dev/
null` for creations; `--- ` and `+++ ` paths must be identical for
updates. Sibling creations should pair with a MEMORY.md hunk; if the
agent forgets, the inbox apply step auto-bundles a generic pointer
(always with the absolute path) so the new sibling is discoverable.
- Pointer paths in MEMORY.md are absolute so future agents can `read_file`
the sibling directly without resolving relative paths.
Runtime safety net
- `MemoryService` snapshots active memory before/after the extraction
agent runs and rolls back any direct writes to MEMORY.md / sibling .md
files, plus single-file guards for <projectRoot>/GEMINI.md and
~/.gemini/GEMINI.md (covers the case where the agent ignores the
prompt and writes outside the inbox).
- `isPathAllowed` denies main-agent writes to <projectMemoryDir>/.inbox/
so the model can't bypass review by dropping its own patch files.
- Patch headers must reference paths inside the kind's allowed root after
canonical resolution; bad-target patches are filtered from listing and
rejected on apply.
Other
- Drop dead `Storage.getGlobalMemoryFilePath()` static (returned a stale
`~/.gemini/memory.md` path that no live caller used).
- Remove the never-shipped `project-instructions` kind and the autoApply
mode plumbing; auto-memory is review-only.
Tests
- 65 unit tests in `commands/memory.test.ts`, ~58 in
`services/memoryService.test.ts`, 13 in `SkillInboxDialog.test.tsx`,
plus prompt + config tests.
- 1 deterministic eval (`evals/auto_memory_modes.eval.ts`).
- 4 new live-LLM evals in `evals/auto_memory_contract.eval.ts` covering
canonical filename, incremental merge, absolute-path pointers, and
project-root protection.
- `scripts/seed-test-inbox.js` and `scripts/check-inbox.js` for manual
end-to-end testing.
|
Hi @SandyTao520, thank you so much for your contribution to Gemini CLI! We really appreciate the time and effort you've put into this. We're making some updates to our contribution process to improve how we track and review changes. Please take a moment to review our recent discussion post: Improving Our Contribution Process & Introducing New Guidelines. Key Update: Starting January 26, 2026, the Gemini CLI project will require all pull requests to be associated with an existing issue. Any pull requests not linked to an issue by that date will be automatically closed. Thank you for your understanding and for being a part of our community! |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces an experimental 'Auto Memory' feature designed to capture durable project knowledge and reusable workflows from past conversation sessions. By surfacing these as reviewable patches in a local inbox, it provides a safe, human-in-the-loop mechanism for updating project documentation and global preferences without risking unauthorized or automated file modifications. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request implements a review-based 'inbox' workflow for the autoMemory feature, replacing direct memory writes with unified diff .patch files that require user approval. Key changes include updating the extraction agent's prompt, enhancing the /memory inbox UI with focus preservation, and implementing path protection and rollback mechanisms to ensure memory integrity. A critical security concern was raised regarding the buildPendingInboxSummary function, which is vulnerable to indirect prompt injection by including unsanitized patch contents in the agent's system prompt.
|
Size Change: +40.9 kB (+0.12%) Total Size: 34 MB
ℹ️ View Unchanged
|
Samee24
left a comment
There was a problem hiding this comment.
Left some comments.
Please also attach either:
- a recording
- screenshots demonstrating the main CUJs
| * | ||
| * Returns an empty string if the inbox is empty. | ||
| */ | ||
| async function buildPendingInboxSummary(memoryDir: string): Promise<string> { |
There was a problem hiding this comment.
are we persisting dismissals? won't they keep resurfacing/regenerating unless we keep some kind of deny list?
There was a problem hiding this comment.
Real gap, not addressed in this PR. Today there is no persisted deny list — dismissed content is gone from disk, so the agent has no signal to skip it.
Mitigations that exist today:
- Session-processing dedup (
processedSessionKeysinextraction-state.json) prevents re-scanning the same transcripts a dismissed extraction came from, so the same fact will not auto-regenerate from the same source. - The agent prompt encourages no-op runs (
Default to no-op. Prefer 0–5 memory patches per run.). - The new 30-min throttle (also in dd25562) reduces frequency.
A long-running project will still see periodic re-extraction of facts the user dismissed (e.g. from a different session that surfaces the same fact). The proper fix is a persisted dismiss store with content fingerprinting, which is non-trivial — deferring to a follow-up. Leaving open as the tracker.
| if (!isSupportedSessionFile(file)) continue; | ||
| const filePath = path.join(chatsDir, file); | ||
| try { | ||
| const stat = await fs.stat(filePath); |
There was a problem hiding this comment.
we are running fs.stat (which can be slow) on startup for every file. Performance will tank for folks with a lot of history...
We can:
- persist last-scan time and short-circuit
- add a min-interval (~30 min) between extraction runs in addition to the lock.
There was a problem hiding this comment.
Adopted option (2) in dd25562: added MIN_EXTRACTION_INTERVAL_MS = 30 * 60 * 1000 and an early-return throttle in startMemoryService after readExtractionState. If the most recent run’s runAt is < 30 minutes ago, the function logs [MemoryService] Skipped: last run was Xm ago and returns. Pairs with the existing advisory lock (lock prevents concurrent runs; throttle prevents back-to-back runs across short CLI sessions).
Option (1) (persisted last-scan time + short-circuit before stat-ing) deferred — the fs.stat cost is bounded today by MAX_SESSION_INDEX_SIZE = 50 plus the readdir/stat happening on a background promise, so the throttle alone covers the realistic case. Leaving open if you want option (1) tracked separately, or feel free to resolve.
| ); | ||
| }); | ||
|
|
||
| it('rolls back direct active memory writes (patches are the only path)', async () => { |
There was a problem hiding this comment.
we need more robust evals here. Seems too happy path focused.
e.g would like more around concurrency, such as concurrent applies (during bg run(s)), rollback of a legit user edit racing with the agent, ALS-off path-allowlist denial, shell-based escape etc.
There was a problem hiding this comment.
Triaging the four examples:
-
ALS-off path-allowlist denial — already covered by the test added in 5b2c782:
config/config.test.ts > should NOT allow isPathAllowed to write into the auto-memory inbox(verifies denial outside the scope) and the companion test that allows the canonical filename only insiderunWithScopedMemoryInboxAccess. Plusagents/local-executor.test.tsasserts the scope is set iffdefinition.memoryInboxAccessis true. -
Rollback racing a legit user edit — real gap. Added as
it.todoin dd25562 (memoryService.test.ts) so it shows up in test output. The proper fix needs to distinguish agent writes from external edits (mtime windowing or a WriteFile hook), which is an architectural change deferred to a follow-up. -
Concurrent applies during a bg run —
tryAcquireLockalready covers concurrent extraction runs (existing tests inmemoryService.test.ts > tryAcquireLock). For concurrent apply (user clicks Apply while a bg run is in-flight): the apply path uses temp-file-stage + atomic rename; an in-progress bg run cannot interleave a half-written file. Worth one explicit test, can add in the deferred follow-up alongside (2). -
Shell-based escape — could you elaborate? Shell tool calls go through the same
Config.isPathAllowed, which denies<projectMemoryDir>/.inbox/writes for the main agent and is workspace-scoped. Happy to add a concrete test if you have a specific scenario in mind.
Leaving this thread open as the tracker for (2) + (3).
- Default the memory-patch action to Dismiss so a stray Enter cannot apply durable on-disk changes (private MEMORY.md, ~/.gemini/GEMINI.md). - Aggregate apply now reports success=false on any failure so the dialog keeps the inbox entry visible for retry instead of auto-removing it. Successful sub-patches were already committed and removed from disk; the next listing surfaces only the failures. - Harden buildPendingInboxSummary against indirect prompt injection: size the markdown fence to one more backtick than the longest run in the patch content so a crafted closing fence cannot break out of the block. - Throttle startMemoryService: skip the run if the most recent extraction finished less than 30 minutes ago. Pairs with the existing advisory lock (lock prevents concurrent runs; throttle prevents back-to-back runs across short CLI sessions). - Rename the user-facing inbox group label "Memory Patches" -> "Memory Updates" for consistency with the existing "Skill Updates" label. - Rename SkillInboxDialog -> InboxDialog (the component now covers skills, skill-update patches, and memory updates). - Document the rollback-vs-user-edit race as it.todo so the gap stays visible. The proper fix needs to track agent-vs-external writes and is deferred to a follow-up.
The previous regex `/^- See \/.+\/orphan-topic\.md /m` was Unix-only and broke on Windows where the absolute pointer path is e.g. `C:\Users\runneradmin\…\orphan-topic.md`. Capture the path with a separator-agnostic regex and validate it via path.isAbsolute, which works on both POSIX and Windows.
|
✅ 69 tests passed successfully on gemini-3-flash-preview. 🧠 Model Steering GuidanceThis PR modifies files that affect the model's behavior (prompts, tools, or instructions).
This is an automated guidance message triggered by steering logic signatures. |
Summary
Adds an experimental Auto Memory inbox flow (
experimental.autoMemory, default off). A background extraction agent scans past conversation sessions on session startup and proposes durable memory updates as unified-diff.patchfiles in a project-local inbox. Nothing is applied automatically — the user reviews each entry in/memory inboxand approves or dismisses it.Details
Storage tiers
private→<projectMemoryDir>/.inbox/private/extraction.patch→ applied to<projectMemoryDir>/MEMORY.md(and sibling.mdtopic files; sibling creations get an auto-bundledMEMORY.mdpointer using the absolute path).global→<projectMemoryDir>/.inbox/global/extraction.patch→ applied to~/.gemini/GEMINI.md(single-file allowlist; no other files under~/.gemini/are reachable).skills→ reuses the existing skill inbox flow.<projectRoot>/GEMINI.mdis intentionally excluded from auto-extraction — the agent prompt forbids it and the runtime rollback safety net enforces it on top.Inbox UX
Unnamed.screencast.webm
Agent contract
extraction.patch. Pending inbox contents are surfaced into the agent's initial context so it rewrites the existing patch incrementally rather than creating new files each session.--- /dev/nullfor creations;---and+++paths must be identical for updates.MEMORY.mdhunk; if the agent forgets, the inbox apply step auto-bundles a generic pointer (always with the absolute path) so the new sibling is discoverable.MEMORY.mdare absolute so future agents canread_filethe sibling directly without resolving relative paths.Runtime safety net
MemoryServicesnapshots active memory before/after the extraction agent runs and rolls back any direct writes toMEMORY.md/ sibling.mdfiles, plus single-file guards for<projectRoot>/GEMINI.mdand~/.gemini/GEMINI.md(covers the case where the agent ignores the prompt and writes outside the inbox).isPathAlloweddenies main-agent writes to<projectMemoryDir>/.inbox/so the model can't bypass review by dropping its own patch files.memoryInboxAccessflag on the agent definition and arunWithScopedMemoryInboxAccessAsyncLocalStorage. Only the literal canonical paths<inboxRoot>/{private,global}/extraction.patchare reachable, only while the agent is running.Other
Storage.getGlobalMemoryFilePath()static (returned a stale~/.gemini/memory.mdpath that no live caller used).project-instructionskind and theautoApplymode plumbing; auto-memory is review-only.Related Issues
Related to #18007
How to Validate
~/.gemini/settings.json:{ "experimental": { "autoMemory": true } }/memory inbox. Expected:Private memory(2 hunks from 1 source patch) andGlobal memory(1 hunk from 1 source patch).<projectMemoryDir>/MEMORY.mdwas updated with the new fact and an absolute-path sibling pointer.<projectMemoryDir>/verify-workflow.mdwas created.~/.gemini/GEMINI.mdwas created with the seeded content.Edge cases worth poking
autoMemoryON but no eligible sessions, the inbox is empty and no errors are logged.+++header points outside the allowed root (e.g.<projectRoot>/GEMINI.mdfromprivate) must NOT appear in the listing.[MemoryService]lines in the debug log for the rollback message.Applied N of M ...; 1 failed: ...instead of failing the whole batch.Pre-Merge Checklist