Skip to content

fix(agent): persist repaired current turn history#24211

Open
qWaitCrypto wants to merge 1 commit into
NousResearch:mainfrom
qWaitCrypto:fix/sessiondb-repair-current-turn
Open

fix(agent): persist repaired current turn history#24211
qWaitCrypto wants to merge 1 commit into
NousResearch:mainfrom
qWaitCrypto:fix/sessiondb-repair-current-turn

Conversation

@qWaitCrypto

Copy link
Copy Markdown
Contributor

What does this PR do?

Fixes a SessionDB persistence bug where the current turn can be silently skipped after message-sequence repair shortens the live conversation history.

AIAgent._repair_message_sequence() mutates the live messages list in place before the API call. That repair can drop orphan tool messages or merge consecutive user messages. _flush_messages_to_session_db() then used the original len(conversation_history) and _last_flushed_db_idx as append-only boundaries. If repair shortened the list, persistence could slice from the old boundary and write nothing.

The important edge case is consecutive user-message repair: the current user turn can be merged into a prior user message that was already persisted in SQLite. In that case, appending from a later index is not enough because the existing DB row needs to be rewritten.

This PR marks the SessionDB transcript for rewrite when _repair_message_sequence() changes the live history. The next flush uses the existing SessionDB.replace_messages() path to atomically resync the stored transcript from the repaired in-memory history, then updates _last_flushed_db_idx.

Related Issue

Fixes #24187

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • updated run_agent.py to track when message-sequence repair requires a SessionDB transcript rewrite
  • updated _repair_message_sequence() to set the rewrite flag after it mutates messages
  • updated _flush_messages_to_session_db() to call SessionDB.replace_messages() when the repaired live transcript must replace the stored transcript
  • added regression coverage in tests/run_agent/test_message_sequence_repair.py for the case where repair merges the current user turn into an already persisted user message

How to Test

  1. Run the focused regression and flush-dedup tests:
    HOME=/tmp/hermes-agent-pytest HERMES_HOME=/tmp/hermes-agent-pytest/.hermes XDG_CACHE_HOME=/tmp/hermes-agent-pytest/.cache PYTHONPATH=. /home/cyt/miniconda3/envs/meta_workflow_py312/bin/python -m pytest --override-ini=addopts='' -p no:cacheprovider tests/run_agent/test_message_sequence_repair.py tests/run_agent/test_860_dedup.py -q
  2. Run adjacent persistence coverage:
    HOME=/tmp/hermes-agent-pytest HERMES_HOME=/tmp/hermes-agent-pytest/.hermes XDG_CACHE_HOME=/tmp/hermes-agent-pytest/.cache PYTHONPATH=. /home/cyt/miniconda3/envs/meta_workflow_py312/bin/python -m pytest --override-ini=addopts='' -p no:cacheprovider tests/run_agent/test_compression_persistence.py -q
    HOME=/tmp/hermes-agent-pytest HERMES_HOME=/tmp/hermes-agent-pytest/.hermes XDG_CACHE_HOME=/tmp/hermes-agent-pytest/.cache PYTHONPATH=. /home/cyt/miniconda3/envs/meta_workflow_py312/bin/python -m pytest --override-ini=addopts='' -p no:cacheprovider tests/run_agent/test_run_agent.py -q -k 'PersistUserMessageOverride or repair'
  3. Reproduce the original failure shape with consecutive user-message repair and confirm the DB row contains the current turn after flush.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: WSL2 Ubuntu / local Linux pytest via /home/cyt/miniconda3/envs/meta_workflow_py312/bin/python

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

Screenshots / Logs

Before fix, local minimal reproduction showed that repair merged the current user turn into the live message list, but SessionDB append was skipped:

repairs= 1
len_history= 2
len_messages_after_repair= 2
messages_after_repair= [{'role': 'assistant', 'content': 'prior answer'}, {'role': 'user', 'content': 'stale user tail\n\nCURRENT TURN SHOULD PERSIST'}]
append_calls= 0
last_flushed_idx= 2

After fix, the same reproduction rewrites the stored SessionDB transcript and preserves the current turn:

Could not import tool module tools.browser_dialog_tool: No module named 'websockets'
repairs= 1
len_history= 2
len_messages_after_repair= 2
db_rows= [('assistant', 'prior answer'), ('user', 'stale user tail\n\nCURRENT TURN SHOULD PERSIST')]
last_flushed_idx= 2

Real local pytest results:

$ HOME=/tmp/hermes-agent-pytest HERMES_HOME=/tmp/hermes-agent-pytest/.hermes XDG_CACHE_HOME=/tmp/hermes-agent-pytest/.cache PYTHONPATH=. /home/cyt/miniconda3/envs/meta_workflow_py312/bin/python -m pytest --override-ini=addopts='' -p no:cacheprovider tests/run_agent/test_message_sequence_repair.py tests/run_agent/test_860_dedup.py -q
.....................                                                    [100%]
21 passed in 20.42s

$ HOME=/tmp/hermes-agent-pytest HERMES_HOME=/tmp/hermes-agent-pytest/.hermes XDG_CACHE_HOME=/tmp/hermes-agent-pytest/.cache PYTHONPATH=. /home/cyt/miniconda3/envs/meta_workflow_py312/bin/python -m pytest --override-ini=addopts='' -p no:cacheprovider tests/run_agent/test_compression_persistence.py -q
......                                                                   [100%]
6 passed in 8.87s

$ HOME=/tmp/hermes-agent-pytest HERMES_HOME=/tmp/hermes-agent-pytest/.hermes XDG_CACHE_HOME=/tmp/hermes-agent-pytest/.cache PYTHONPATH=. /home/cyt/miniconda3/envs/meta_workflow_py312/bin/python -m pytest --override-ini=addopts='' -p no:cacheprovider tests/run_agent/test_run_agent.py -q -k 'PersistUserMessageOverride or repair'
.                                                                        [100%]
1 passed, 324 deselected in 7.06s

What does this PR do?

Fixes a SessionDB persistence bug where the current turn can be silently skipped after message-sequence repair shortens the live conversation history.

AIAgent._repair_message_sequence() mutates the live messages list in place before the API call. That repair can drop orphan tool messages or merge consecutive user messages. _flush_messages_to_session_db() then used the original len(conversation_history) and _last_flushed_db_idx as append-only boundaries. If repair shortened the list, persistence could slice from the old boundary and write nothing.

The important edge case is consecutive user-message repair: the current user turn can be merged into a prior user message that was already persisted in SQLite. In that case, appending from a later index is not enough because the existing DB row needs to be rewritten.

This PR marks the SessionDB transcript for rewrite when _repair_message_sequence() changes the live history. The next flush uses the existing SessionDB.replace_messages() path to atomically resync the stored transcript from the repaired in-memory history, then updates _last_flushed_db_idx.

Related Issue

Fixes #24187

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • updated run_agent.py to track when message-sequence repair requires a SessionDB transcript rewrite
  • updated _repair_message_sequence() to set the rewrite flag after it mutates messages
  • updated _flush_messages_to_session_db() to call SessionDB.replace_messages() when the repaired live transcript must replace the stored transcript
  • added regression coverage in tests/run_agent/test_message_sequence_repair.py for the case where repair merges the current user turn into an already persisted user message

How to Test

  1. Run the focused regression and flush-dedup tests:
    HOME=/tmp/hermes-agent-pytest HERMES_HOME=/tmp/hermes-agent-pytest/.hermes XDG_CACHE_HOME=/tmp/hermes-agent-pytest/.cache PYTHONPATH=. /home/cyt/miniconda3/envs/meta_workflow_py312/bin/python -m pytest --override-ini=addopts='' -p no:cacheprovider tests/run_agent/test_message_sequence_repair.py tests/run_agent/test_860_dedup.py -q
  2. Run adjacent persistence coverage:
    HOME=/tmp/hermes-agent-pytest HERMES_HOME=/tmp/hermes-agent-pytest/.hermes XDG_CACHE_HOME=/tmp/hermes-agent-pytest/.cache PYTHONPATH=. /home/cyt/miniconda3/envs/meta_workflow_py312/bin/python -m pytest --override-ini=addopts='' -p no:cacheprovider tests/run_agent/test_compression_persistence.py -q
    HOME=/tmp/hermes-agent-pytest HERMES_HOME=/tmp/hermes-agent-pytest/.hermes XDG_CACHE_HOME=/tmp/hermes-agent-pytest/.cache PYTHONPATH=. /home/cyt/miniconda3/envs/meta_workflow_py312/bin/python -m pytest --override-ini=addopts='' -p no:cacheprovider tests/run_agent/test_run_agent.py -q -k 'PersistUserMessageOverride or repair'
  3. Reproduce the original failure shape with consecutive user-message repair and confirm the DB row contains the current turn after flush.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: WSL2 Ubuntu / local Linux pytest via /home/cyt/miniconda3/envs/meta_workflow_py312/bin/python

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

Screenshots / Logs

Before fix, local minimal reproduction showed that repair merged the current user turn into the live message list, but SessionDB append was skipped:

repairs= 1
len_history= 2
len_messages_after_repair= 2
messages_after_repair= [{'role': 'assistant', 'content': 'prior answer'}, {'role': 'user', 'content': 'stale user tail\n\nCURRENT TURN SHOULD PERSIST'}]
append_calls= 0
last_flushed_idx= 2

After fix, the same reproduction rewrites the stored SessionDB transcript and preserves the current turn:

Could not import tool module tools.browser_dialog_tool: No module named 'websockets'
repairs= 1
len_history= 2
len_messages_after_repair= 2
db_rows= [('assistant', 'prior answer'), ('user', 'stale user tail\n\nCURRENT TURN SHOULD PERSIST')]
last_flushed_idx= 2

Real local pytest results:

$ HOME=/tmp/hermes-agent-pytest HERMES_HOME=/tmp/hermes-agent-pytest/.hermes XDG_CACHE_HOME=/tmp/hermes-agent-pytest/.cache PYTHONPATH=. /home/cyt/miniconda3/envs/meta_workflow_py312/bin/python -m pytest --override-ini=addopts='' -p no:cacheprovider tests/run_agent/test_message_sequence_repair.py tests/run_agent/test_860_dedup.py -q
.....................                                                    [100%]
21 passed in 20.42s

$ HOME=/tmp/hermes-agent-pytest HERMES_HOME=/tmp/hermes-agent-pytest/.hermes XDG_CACHE_HOME=/tmp/hermes-agent-pytest/.cache PYTHONPATH=. /home/cyt/miniconda3/envs/meta_workflow_py312/bin/python -m pytest --override-ini=addopts='' -p no:cacheprovider tests/run_agent/test_compression_persistence.py -q
......                                                                   [100%]
6 passed in 8.87s

$ HOME=/tmp/hermes-agent-pytest HERMES_HOME=/tmp/hermes-agent-pytest/.hermes XDG_CACHE_HOME=/tmp/hermes-agent-pytest/.cache PYTHONPATH=. /home/cyt/miniconda3/envs/meta_workflow_py312/bin/python -m pytest --override-ini=addopts='' -p no:cacheprovider tests/run_agent/test_run_agent.py -q -k 'PersistUserMessageOverride or repair'
.                                                                        [100%]
1 passed, 324 deselected in 7.06s

@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder labels May 12, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Related: Competing fix for #24187. See also #24196 which targets the same bug with a similar approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P1 High — major feature broken, no workaround type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: SessionDB silently skips current turn when message repair shortens conversation history

2 participants