Summary
A gateway session can enter an infinite preflight compression loop after a successful compression-induced session split.
Compression writes a small child transcript, and the turn ends with the child session id. However, the next inbound gateway turn can still receive the parent-sized history. That immediately triggers preflight compression again, creating another child, and the loop repeats.
This is related to #20470, but the observed failure is broader than a stale Telegram topic binding. In the failing trace, the next turn is logged as running under the compressed child session id while still receiving the original parent-sized history.
It is also related to #25242, but this issue is specifically about compression route/session/history publication after a split, not auto-continue tool-tail replay.
Environment shape
Configuration shape:
compression:
enabled: true
threshold: 0.5
auxiliary:
compression:
provider: openai-codex
model: gpt-5.5
timeout: 240
Using an explicit compression provider/model still reproduces the loop, so auxiliary.compression.provider: auto fallback and GLM are not the direct cause.
Anonymized observed trace
Anonymized log shape from a gateway session:
T0 [S0] conversation turn: history=147
T0 [S0] Preflight compression: ~138k tokens >= threshold
T0 [S0] context compression done: session=S1 messages=148->8 tokens=~32k
T0 [S0] Turn ended ... session=S1
T0 gateway.run: Session split detected: S0 -> S1 (compression)
T1 [S1] conversation turn: history=147
T1 [S1] Preflight compression: ~138k tokens >= threshold
T1 [S1] context compression done: session=S2 messages=148->8 tokens=~33k
T1 [S1] Turn ended ... session=S2
T1 gateway.run: Session split detected: S0 -> S2 (compression)
T2 [S2] conversation turn: history=147
T2 [S2] Preflight compression: ~138k tokens >= threshold
On disk, the compressed child transcripts were small, roughly:
S1.jsonl: ~19 messages
S2.jsonl: ~11 messages
The key anomaly is:
[S1] conversation turn: history=147
while the S1 transcript on disk is compact.
The second anomaly is:
Session split detected: S0 -> S2 (compression)
after the previous split was already S0 -> S1. The old side of the second split should be S1, not the original ancestor S0.
Reproduction steps
No personal data is required.
- Enable gateway-managed sessions with preflight compression.
- Use interrupt-capable gateway input mode, but do not rely on interrupts for the core reproduction.
- Create a gateway session with transcript size above the compression threshold.
- Send an inbound gateway message so preflight compression creates child session
S1.
- Send another inbound message on the same gateway route.
- Observe that the gateway may log the child session id while still passing the parent-sized history into the agent.
- Preflight compression fires again and creates
S2.
- Repeat messages; each turn compresses again instead of using the compact child transcript.
A deterministic regression test can stub compression so S0 always compresses to a small S1, then assert the next gateway turn loads S1 history rather than S0 history.
Actual behavior
- Compression succeeds and writes a compact child transcript.
- The next gateway turn can still receive parent-sized history.
- The route/session split log can continue to treat the original parent as the old session even after the route should have advanced to the child.
- The gateway repeatedly preflight-compresses already-compressed turns.
Expected behavior
After S0 -> S1 compression:
- the canonical gateway route/session entry points to
S1;
- any channel/topic binding that pointed to
S0 is advanced to S1 or resolved through the compression tip;
- cached agent state is either consistent with
S1 or evicted/rebuilt;
- the next inbound turn loads compact
S1 history;
- if
S1 later compresses again, the split is logged and applied as S1 -> S2.
Suspected root cause
Compression advances AIAgent.session_id, but the gateway does not publish the compression child as the canonical route/session/history source atomically.
Likely interacting problems:
- gateway route/session entry can lag behind
agent.session_id;
- channel-specific bindings can point at an old compression ancestor;
- cached
AIAgent reuse can pair one session id with another session's loaded history;
- split handling mutates only part of the routing state after
agent.run_conversation(...);
- inbound route resolution does not consistently follow compression descendants before loading transcript history.
Fix direction
Add a single gateway-side compression route publication path.
When a turn starts with canonical route session old_session_id, and after agent.run_conversation(...) the agent has agent.session_id == new_session_id != old_session_id, and the old session ended by compression, the gateway should atomically:
- update the canonical
SessionStore entry to new_session_id;
- update channel/topic bindings that still point to
old_session_id;
- update or invalidate cached
AIAgent state so the next turn cannot combine new_session_id with old history;
- ensure the next transcript load uses
new_session_id;
- log the split as
old_session_id -> new_session_id, where old_session_id is the actual route session for the completed turn, not the original ancestor.
Also harden inbound route resolution:
- if a binding points to a session with a compression child/tip, advance to the tip before loading history;
- do not call explicit session-switch logic merely to follow compression lineage;
- reuse a cached agent only when
cached_agent.session_id == canonical_session_id, otherwise evict/rebuild and warn.
Regression tests requested
Please add tests for:
- next turn after
S0 -> S1 compression receives compact S1 history;
- a later
S1 -> S2 split advances from S1, not S0;
- channel/topic binding follows the compression tip before transcript load;
- cached agent/session mismatch cannot pair one session id with another session's history;
- route publication works with a stub compressor and does not depend on a specific auxiliary provider/model.
Summary
A gateway session can enter an infinite preflight compression loop after a successful compression-induced session split.
Compression writes a small child transcript, and the turn ends with the child session id. However, the next inbound gateway turn can still receive the parent-sized history. That immediately triggers preflight compression again, creating another child, and the loop repeats.
This is related to #20470, but the observed failure is broader than a stale Telegram topic binding. In the failing trace, the next turn is logged as running under the compressed child session id while still receiving the original parent-sized history.
It is also related to #25242, but this issue is specifically about compression route/session/history publication after a split, not auto-continue tool-tail replay.
Environment shape
Configuration shape:
Using an explicit compression provider/model still reproduces the loop, so
auxiliary.compression.provider: autofallback and GLM are not the direct cause.Anonymized observed trace
Anonymized log shape from a gateway session:
On disk, the compressed child transcripts were small, roughly:
The key anomaly is:
while the
S1transcript on disk is compact.The second anomaly is:
after the previous split was already
S0 -> S1. The old side of the second split should beS1, not the original ancestorS0.Reproduction steps
No personal data is required.
S1.S2.A deterministic regression test can stub compression so
S0always compresses to a smallS1, then assert the next gateway turn loadsS1history rather thanS0history.Actual behavior
Expected behavior
After
S0 -> S1compression:S1;S0is advanced toS1or resolved through the compression tip;S1or evicted/rebuilt;S1history;S1later compresses again, the split is logged and applied asS1 -> S2.Suspected root cause
Compression advances
AIAgent.session_id, but the gateway does not publish the compression child as the canonical route/session/history source atomically.Likely interacting problems:
agent.session_id;AIAgentreuse can pair one session id with another session's loaded history;agent.run_conversation(...);Fix direction
Add a single gateway-side compression route publication path.
When a turn starts with canonical route session
old_session_id, and afteragent.run_conversation(...)the agent hasagent.session_id == new_session_id != old_session_id, and the old session ended by compression, the gateway should atomically:SessionStoreentry tonew_session_id;old_session_id;AIAgentstate so the next turn cannot combinenew_session_idwith old history;new_session_id;old_session_id -> new_session_id, whereold_session_idis the actual route session for the completed turn, not the original ancestor.Also harden inbound route resolution:
cached_agent.session_id == canonical_session_id, otherwise evict/rebuild and warn.Regression tests requested
Please add tests for:
S0 -> S1compression receives compactS1history;S1 -> S2split advances fromS1, notS0;