Skip to content

fix(session): prefer longer source in load_transcript to prevent legacy truncation#3221

Closed
Mibayy wants to merge 1 commit into
NousResearch:mainfrom
Mibayy:fix/load-transcript-legacy-jsonl-3212
Closed

fix(session): prefer longer source in load_transcript to prevent legacy truncation#3221
Mibayy wants to merge 1 commit into
NousResearch:mainfrom
Mibayy:fix/load-transcript-legacy-jsonl-3212

Conversation

@Mibayy

@Mibayy Mibayy commented Mar 26, 2026

Copy link
Copy Markdown
Contributor

Summary

Closes #3212

Fixes conversation context silently truncating to 4 messages after a gateway restart for sessions with long JSONL history.

Root Cause

_flush_messages_to_session_db deliberately skips messages already in conversation_history to avoid duplicate writes (the #860 fix). The assumption is that those messages are already persisted in SQLite. That assumption is false for:

  • Sessions created before the SQLite layer was introduced
  • Sessions after a deployment that reset or replaced the DB
  • Any session where the DB was unavailable during earlier turns

Exact failure trace for the reporter's session (994 JSONL messages):

Turn N  (first after restart):
  load_transcript(id)          → SQLite: 0 rows → falls back to JSONL: 994 ✓
  _flush_messages_to_session_db:
    start_idx = len(conversation_history) = 994
    flush_from = max(994, 0) = 994
    writes messages[994:] = [user_msg, assistant_msg]  → SQLite now has 2 rows

Turn N+1:
  load_transcript(id)          → SQLite: 2 rows → returns immediately ✗
  agent sees 2 messages of history
  _save_session_log writes messages=[2 history + 2 new] = 4 to session JSON

This matches the reporter's exact evidence: JSONL 994 ✅, session JSON 4 ❌.

Fix

load_transcript now loads both SQLite and JSONL, then returns whichever has more messages.

if len(jsonl_messages) > len(db_messages):
    # Legacy session not yet fully migrated — use JSONL
    return jsonl_messages
return db_messages

For a fully-migrated session, SQLite will always be ≥ JSONL (JSONL is append-only and stops being written once SQLite takes over). There is no regression for normal sessions.

The JSONL read is a sequential scan; for very large transcripts it adds a small I/O cost, but load_transcript is called once per gateway turn and the file is already in the OS page cache for active sessions. The trade-off is acceptable.

Note on a related bug

This is a separate fix from #3210 / PR #3220 (cached agent session_id mismatch after session reset). That PR covers the case where a session reset causes the agent to write to the wrong session. This PR covers the case where a legacy JSONL session is never bootstrapped into SQLite, causing subsequent turns to see only the most recent 2 messages.

Tests

1453 gateway tests + 456 session tests pass. No regressions.

…cy truncation

When a long-lived session pre-dates SQLite storage (e.g. sessions
created before the DB layer was introduced, or after a clean
deployment that reset the DB), _flush_messages_to_session_db only
writes the *new* messages from the current turn to SQLite — it skips
messages already present in conversation_history, assuming they are
already persisted.

That assumption fails for legacy JSONL-only sessions:

  Turn N (first after DB migration):
    load_transcript(id)       → SQLite: 0  → falls back to JSONL: 994 ✓
    _flush_messages_to_session_db: skip first 994, write 2 new → SQLite: 2

  Turn N+1:
    load_transcript(id)       → SQLite: 2  → returns immediately ✗
    Agent sees 2 messages of history instead of 996

The same pattern causes the reported symptom: session JSON truncated
to 4 messages (_save_session_log writes agent.messages which only has
2 history + 2 new = 4).

Fix: always load both sources and return whichever is longer.  For a
fully-migrated session SQLite will always be ≥ JSONL, so there is no
regression.  For a legacy session that hasn't been bootstrapped yet,
JSONL wins and the full history is restored.

Closes NousResearch#3212
@teknium1

Copy link
Copy Markdown
Contributor

Merged via PR #3249. Your substantive commit (a9466c46) was cherry-picked onto current main with authorship preserved, plus we added 5 new tests covering all edge cases for the load_transcript source preference. Thanks @Mibayy!

@teknium1 teknium1 closed this Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Session context lost mid-conversation - session JSON file truncated to 4 messages

2 participants