-
-
Notifications
You must be signed in to change notification settings - Fork 79.2k
Session lane starvation: followup drain monopolizes session lane, blocks inbound dispatch for 20-30min #54488
Copy link
Copy link
Open
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.clawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.ClawSweeper found a clear likely implementation shape for this issue.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.ClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:needs-product-decisionClawSweeper marked this issue as needing a product or behavior decision.ClawSweeper marked this issue as needing a product or behavior decision.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.ClawSweeper does not recommend queueing a new automated fix PR for this issue.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.ClawSweeper found a high-confidence source-level issue reproduction.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.Channel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.Very strong issue quality with high-confidence source-level or clear reproduction.staleMarked as stale due to inactivityMarked as stale due to inactivity
Metadata
Metadata
Assignees
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.clawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.ClawSweeper found a clear likely implementation shape for this issue.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.ClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:needs-product-decisionClawSweeper marked this issue as needing a product or behavior decision.ClawSweeper marked this issue as needing a product or behavior decision.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.ClawSweeper does not recommend queueing a new automated fix PR for this issue.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.ClawSweeper found a high-confidence source-level issue reproduction.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.Channel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.Very strong issue quality with high-confidence source-level or clear reproduction.staleMarked as stale due to inactivityMarked as stale due to inactivity
Type
Fields
Give feedbackNo fields configured for issues without a type.
Bug: Followup drain monopolizes session lane — causes indefinite inbound dispatch stall
Version: 2026.3.23-2 (not present in 2026.3.13)
Symptoms
sessions_senddoes not fix it (queues behind same backlog)SIGUSR1(gateway restart /resetAllLanes()) resolves it immediatelyRoot Cause
scheduleFollowupDrain(inpi-embedded-CbCYZxIb.js:94509) starts an unbounded async loop after every turn. Each queued item (system events, subagent announces, WhatsApp reconnects) callsrunEmbeddedPiAgent→enqueueSession(() => enqueueGlobal(...)), holding the session lane (maxConcurrent: 1) for the full turn duration including compaction + context engine maintenance. New user messages queue behind all followup turns with no preemption.Observed lane wait times
From diagnostic logs on
session:agent:main:main:Log pattern:
lane wait exceeded: waitedMs=1814033 queueAhead=1Contributing factors
compaction.memoryFlush.enabled: true)Setup context
session.dmScope: "main"(Discord DM + WhatsApp share main session)ollama/qwen2.5-14b-agent, timeouts blocked global lanes)contextPruning.mode: "cache-ttl"(custom events at end of turn correlated with stall, but turning it off did not fix it — lane starvation is the real cause)Suggested fixes
afterTurn/maintain) OUTSIDE the session lane task — the lane should be released afterclearActiveEmbeddedRun, not after post-turn cleanupWorkarounds (config-level mitigations)
These reduce lane occupation time but do not fix the root cause:
Reproduction
dmScope: "main"and multiple active channels (WhatsApp + Discord)Environment