Summary
_spawn_background_review() appears vulnerable to stale review writes.
The background review agent is forked from a messages_snapshot and then writes directly to the shared memory/skill stores, but unlike the gateway reset-flush path, it does not inject the current live memory state or any overwrite guard before asking the model to update memory/skills.
This means a delayed background review can operate on an outdated conversation snapshot and potentially overwrite / replace / remove newer memory entries that were created by:
- a later foreground turn
- another concurrent session
- cron / scheduled jobs
- another background review thread
Why I think this is a real race
1) Background review is explicitly forked from a snapshot
In run_agent.py, _spawn_background_review() creates a new AIAgent and runs it against:
review_agent.run_conversation(
user_message=prompt,
conversation_history=messages_snapshot,
)
So the reviewer is not looking at current state at execution time; it's looking at a historical snapshot.
2) The review agent writes directly to shared stores
The same function documents:
Writes directly to the shared memory/skill stores.
and it reuses the same memory store:
review_agent._memory_store = self._memory_store
So we have snapshot-based reasoning + live shared writes.
3) Normal background review prompt has no stale-state guard
Current prompts are essentially:
- "Review the conversation above and consider saving to memory..."
- "...save using the memory tool..."
- "...create a new skill..."
But they do not include the current live memory/skill state, nor a warning like "do not overwrite/remove entries unless the conversation genuinely supersedes them".
4) The gateway reset flush already contains exactly this guard
gateway/run.py builds a flush prompt that includes current live memory and says:
Do NOT overwrite or remove entries unless the conversation above reveals something that genuinely supersedes them. Only add new information that is not already captured below.
There is also a regression test for this behavior in:
tests/gateway/test_flush_memory_stale_guard.py
So the project already acknowledges this stale-write class of bug for reset-flush, but the normal per-turn background review path seems to lack the same protection.
Expected behavior
Background review should be append-safe / merge-safe by default, and should not blindly mutate memory or skills based only on an old conversation snapshot.
Actual risk
A stale reviewer may:
- duplicate memory entries
- re-add data the user already changed later
- overwrite newer preference updates with older ones
- remove/replace entries that were correct at the time of execution but not present in the old snapshot
- produce skill churn from outdated context
Suggested fixes
I think any of the following would help (they can be combined):
- Inject current live memory/skill state into the review prompt before the background review agent runs.
- Reuse the same stale-write guard wording already used by gateway flush.
- Prefer additive / idempotent memory operations in background review unless an explicit supersession condition is met.
- Add a lightweight revision/version check on memory/skill store writes.
- Add a regression test that simulates:
- turn A spawning background review
- turn B updating memory before review executes
- background review from turn A must not overwrite B's newer state
Code pointers
run_agent.py:
_MEMORY_REVIEW_PROMPT
_SKILL_REVIEW_PROMPT
_COMBINED_REVIEW_PROMPT
_spawn_background_review()
gateway/run.py:
- reset flush prompt with current-memory stale guard
tests/gateway/test_flush_memory_stale_guard.py
Notes
This seems different from provider-instantiation / routing issues such as the recent background-review bugs, because this one is about semantic staleness and last-writer-wins corruption risk, not just provider lifecycle.
Summary
_spawn_background_review()appears vulnerable to stale review writes.The background review agent is forked from a
messages_snapshotand then writes directly to the shared memory/skill stores, but unlike the gateway reset-flush path, it does not inject the current live memory state or any overwrite guard before asking the model to update memory/skills.This means a delayed background review can operate on an outdated conversation snapshot and potentially overwrite / replace / remove newer memory entries that were created by:
Why I think this is a real race
1) Background review is explicitly forked from a snapshot
In
run_agent.py,_spawn_background_review()creates a newAIAgentand runs it against:So the reviewer is not looking at current state at execution time; it's looking at a historical snapshot.
2) The review agent writes directly to shared stores
The same function documents:
and it reuses the same memory store:
So we have snapshot-based reasoning + live shared writes.
3) Normal background review prompt has no stale-state guard
Current prompts are essentially:
But they do not include the current live memory/skill state, nor a warning like "do not overwrite/remove entries unless the conversation genuinely supersedes them".
4) The gateway reset flush already contains exactly this guard
gateway/run.pybuilds a flush prompt that includes current live memory and says:There is also a regression test for this behavior in:
tests/gateway/test_flush_memory_stale_guard.pySo the project already acknowledges this stale-write class of bug for reset-flush, but the normal per-turn background review path seems to lack the same protection.
Expected behavior
Background review should be append-safe / merge-safe by default, and should not blindly mutate memory or skills based only on an old conversation snapshot.
Actual risk
A stale reviewer may:
Suggested fixes
I think any of the following would help (they can be combined):
Code pointers
run_agent.py:_MEMORY_REVIEW_PROMPT_SKILL_REVIEW_PROMPT_COMBINED_REVIEW_PROMPT_spawn_background_review()gateway/run.py:tests/gateway/test_flush_memory_stale_guard.pyNotes
This seems different from provider-instantiation / routing issues such as the recent background-review bugs, because this one is about semantic staleness and last-writer-wins corruption risk, not just provider lifecycle.