-
-
Notifications
You must be signed in to change notification settings - Fork 57.2k
Description
Problem
When the gateway process is killed (kill -9, OOM, power loss) while the AI is generating a reply, the in-flight message is silently dropped from the conversation. On restart, when the user sends their next message, the orphaned message is actively removed from the session tree — the user's original question is never answered and disappears from the AI's context.
Steps to reproduce
- Send a message via Telegram that triggers a long AI generation (e.g. "write a 1000-word essay about the history of databases")
kill -9the gateway process while the AI is generating- Restart the gateway
- Send a new message (e.g. "hello")
What happens internally
The crash leaves the session transcript with a trailing "orphan" user message — a user turn with no corresponding assistant reply. When the next message arrives, the embedded agent detects this orphan via sessionManager.getLeafEntry() (in pi-embedded-runner/run/attempt.ts). Since consecutive user messages violate LLM role ordering, the agent calls sessionManager.branch(leafEntry.parentId) to rewind the session tree to the state before the orphan. The new message is then appended as a fresh leaf.
The result: the orphaned user message is structurally severed from the session's parentId chain. The AI processes the new message without any knowledge of the orphan. The user's original question is never answered, and no error or notification is sent to the user or operator.
Specific scenarios
- Telegram DM: User asks a question, AI starts generating, gateway crashes → on restart, user sends another message → the original question is silently branched out of the session tree, never answered
- Any channel: The orphan removal is channel-agnostic — any channel's messages are subject to the same silent loss
- No visibility: No log entry warns the operator that a user message was dropped; the
log.warnabout "removed orphaned user message" only fires when the next message arrives, and doesn't identify what the lost message was
Expected behavior
On restart, the gateway should detect orphaned in-flight turns and either re-process them or notify the user that their message was lost. A persistent turn-tracking layer would enable the gateway to know which turns were in-progress at crash time without relying on transcript tree structure.