Skip to content

[codex] canonicalize gateway transcript storage#10560

Open
g-guthrie wants to merge 11 commits into
NousResearch:mainfrom
g-guthrie:codex/canonicalize-session-mirror
Open

[codex] canonicalize gateway transcript storage#10560
g-guthrie wants to merge 11 commits into
NousResearch:mainfrom
g-guthrie:codex/canonicalize-session-mirror

Conversation

@g-guthrie

@g-guthrie g-guthrie commented Apr 15, 2026

Copy link
Copy Markdown
Contributor

What changed

This PR removes duplicate gateway transcript persistence paths and makes gateway/session.py the single owner of transcript storage behavior.

Specifically:

  • gateway/mirror.py now routes transcript writes through the canonical session helpers instead of maintaining its own JSONL and SQLite write logic.
  • SQLite is now the canonical gateway transcript store.
  • Legacy JSONL transcripts are no longer live-written when SQLite is available.
  • Legacy JSONL files are only used as a one-time migration or fallback path.
  • Legacy transcript migration now preserves SQLite-only tails and JSONL-only tails when their ordering is knowable.
  • Ambiguous transcript divergence is treated conservatively: the code avoids destructive migration and leaves the legacy JSONL file in place as a read-only fallback.
  • session_meta payloads now survive the canonical SQLite transcript path instead of being flattened away.
  • The old runtime "whichever source is longer wins" heuristic is gone.

Why it changed

The gateway had multiple independent ways to persist the same transcript data:

  • normal session traffic through SessionStore
  • mirrored delivery traffic through gateway/mirror.py
  • a permanent SQLite + JSONL dual-write/read model in gateway/session.py

That split ownership increased drift risk and bug surface in one of the most state-sensitive parts of the gateway.

User and developer impact

Gateway session history now has one canonical persistence path when SQLite is available. That makes transcript behavior easier to reason about and reduces the chance that mirrored messages or resumed sessions diverge from normal session behavior.

For older sessions, legacy JSONL history is imported into SQLite only when it can be merged safely. If the histories diverge ambiguously, the code leaves the JSONL transcript in place rather than risking data loss.

The canonical SQLite path also now preserves transcript-only session_meta payloads, so the replacement backend is closer to the behavior of the legacy JSONL transcript.

Root cause

Transcript persistence had grown into a multi-path design where JSONL stopped being just a migration shim and effectively became a second live backend. That forced reconciliation logic, duplicate writes, and sidecar behavior outside the main session store.

Validation

Focused validation passed:

  • source venv/bin/activate && venv/bin/python -m pytest -o addopts='' tests/gateway/test_session.py tests/gateway/test_mirror.py tests/run_agent/test_860_dedup.py -q
  • source venv/bin/activate && venv/bin/python -m pytest -o addopts='' tests/gateway/test_retry_replacement.py tests/gateway/test_session_dm_thread_seeding.py tests/run_agent/test_session_meta_filtering.py -q

@g-guthrie g-guthrie marked this pull request as ready for review April 15, 2026 22:55
@g-guthrie g-guthrie changed the title [codex] canonicalize gateway transcript mirroring [codex] canonicalize gateway transcript storage Apr 16, 2026
@g-guthrie g-guthrie marked this pull request as draft April 16, 2026 02:45
@g-guthrie g-guthrie marked this pull request as ready for review April 16, 2026 11:51
@alt-glitch alt-glitch added type/refactor Code restructuring, no behavior change P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery labels Apr 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/refactor Code restructuring, no behavior change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants