Skip to content

Commit facafff

Browse files
committed
fix(agents): surface user-visible error when embedded session is stuck or overflows context
Fixes #84536. Two root causes addressed: 1. Dead-code guard in agent-runner-execution.ts The hasPayloadText guard on the context overflow fallback made the branch unreachable in the common terminal-overflow path, because run.ts always includes an error payload when it reaches the terminal overflow return. This silently fell through to the success path where the payload might still get delivered, but the fallback never fired for aborted sessions. Fix: remove the guard so the friendly overflow message is always surfaced when meta.error is a context overflow error. 2. Stuck-session recovery produces no user notification recoverStuckDiagnosticSession aborts the embedded run and releases the lane, but the abort result had empty payloads so the user saw nothing. Fix: thread the abort reason (stuck_recovery) from abortAndDrainEmbeddedPiRun through handle.abort(reason) to AbortController.abort(reason), expose it as EmbeddedRunAttemptResult.abortReason, and in run.ts synthesize a user-visible error payload when abortReason is stuck_recovery and no other payload was generated.
1 parent 48a14e4 commit facafff

6 files changed

Lines changed: 63 additions & 7 deletions

File tree

src/agents/pi-embedded-runner/run-state.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ export type EmbeddedPiQueueHandle = {
1313
isCompacting: () => boolean;
1414
supportsTranscriptCommitWait?: boolean;
1515
cancel?: (reason?: "user_abort" | "restart" | "superseded") => void;
16-
abort: () => void;
16+
abort: (reason?: unknown) => void;
1717
sourceReplyDeliveryMode?: SourceReplyDeliveryMode;
1818
};
1919

src/agents/pi-embedded-runner/run.ts

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3157,12 +3157,29 @@ export async function runEmbeddedPiAgent(
31573157
: attempt.yieldDetected
31583158
? "end_turn"
31593159
: (sessionLastAssistant?.stopReason as string | undefined);
3160+
// When a stuck-session recovery forcibly aborts the run, synthesize
3161+
// a user-visible error payload so the user knows to retry instead of
3162+
// seeing a silent empty response after hours of waiting.
3163+
// See #84536: preemptive context overflow silently kills embedded sessions.
3164+
const isStuckRecoveryAbort = aborted && attempt.abortReason === "stuck_recovery";
3165+
const stuckRecoveryPayload =
3166+
isStuckRecoveryAbort && !payloadsForTerminalPath?.length
3167+
? {
3168+
text: "⚠️ Your session was stuck and has been automatically recovered. Please try again.",
3169+
isError: true as const,
3170+
}
3171+
: undefined;
31603172
const terminalPayloads = emptyAssistantReplyIsSilent
31613173
? [{ text: SILENT_REPLY_TOKEN }]
3162-
: payloadsForTerminalPath;
3174+
: stuckRecoveryPayload
3175+
? [stuckRecoveryPayload]
3176+
: payloadsForTerminalPath;
3177+
const terminalLivenessState: EmbeddedRunLivenessState = stuckRecoveryPayload
3178+
? "blocked"
3179+
: livenessState;
31633180
attempt.setTerminalLifecycleMeta?.({
31643181
replayInvalid,
3165-
livenessState,
3182+
livenessState: terminalLivenessState,
31663183
stopReason,
31673184
yielded: attempt.yieldDetected === true,
31683185
});
@@ -3180,7 +3197,7 @@ export async function runEmbeddedPiAgent(
31803197
finalAssistantVisibleText,
31813198
finalAssistantRawText,
31823199
replayInvalid,
3183-
livenessState,
3200+
livenessState: terminalLivenessState,
31843201
agentHarnessResultClassification: attempt.agentHarnessResultClassification,
31853202
...(attempt.yieldDetected ? { yielded: true } : {}),
31863203
...(emptyAssistantReplyIsSilent

src/agents/pi-embedded-runner/run/attempt.ts

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4757,11 +4757,18 @@ export async function runEmbeddedAttempt(
47574757
});
47584758
trajectoryEndRecorded = true;
47594759

4760+
const abortReason: string | undefined = aborted
4761+
? typeof runAbortController.signal.reason === "string"
4762+
? runAbortController.signal.reason
4763+
: undefined
4764+
: undefined;
4765+
47604766
return {
47614767
replayMetadata,
47624768
itemLifecycle: getItemLifecycle(),
47634769
setTerminalLifecycleMeta,
47644770
aborted,
4771+
abortReason,
47654772
externalAbort,
47664773
timedOut,
47674774
idleTimedOut,

src/agents/pi-embedded-runner/run/types.ts

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,12 @@ export type EmbeddedRunAttemptParams = EmbeddedRunAttemptBase & {
7070

7171
export type EmbeddedRunAttemptResult = {
7272
aborted: boolean;
73+
/**
74+
* When the run was aborted externally, the string reason passed to
75+
* `handle.abort(reason)` if one was provided (e.g. "stuck_recovery").
76+
* Undefined when the abort had no string reason or the run was not aborted.
77+
*/
78+
abortReason?: string;
7379
/** True when the abort originated from the caller-provided abortSignal. */
7480
externalAbort: boolean;
7581
timedOut: boolean;

src/agents/pi-embedded-runner/runs.ts

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -508,7 +508,28 @@ export async function abortAndDrainEmbeddedPiRun(params: {
508508
reason?: string;
509509
}): Promise<AbortAndDrainEmbeddedPiRunResult> {
510510
const settleMs = params.settleMs ?? 15_000;
511-
const aborted = abortEmbeddedPiRun(params.sessionId);
511+
// If a reason is provided (e.g. "stuck_recovery"), abort the handle
512+
// directly so the reason flows through to the abort signal. This allows
513+
// downstream code (run.ts) to detect why the session was aborted and
514+
// synthesize a user-visible error message.
515+
let aborted: boolean;
516+
if (params.reason) {
517+
const handle = ACTIVE_EMBEDDED_RUNS.get(params.sessionId);
518+
if (handle) {
519+
diag.debug(`aborting run with reason: sessionId=${params.sessionId} reason=${params.reason}`);
520+
try {
521+
handle.abort(params.reason);
522+
aborted = true;
523+
} catch (err) {
524+
diag.warn(`abort failed: sessionId=${params.sessionId} err=${String(err)}`);
525+
aborted = false;
526+
}
527+
} else {
528+
aborted = abortEmbeddedPiRun(params.sessionId);
529+
}
530+
} else {
531+
aborted = abortEmbeddedPiRun(params.sessionId);
532+
}
512533
const drained = aborted ? await waitForEmbeddedPiRunEnd(params.sessionId, settleMs) : false;
513534
const forceCleared =
514535
params.forceClear === true && (!aborted || !drained)

src/auto-reply/reply/agent-runner-execution.ts

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2423,9 +2423,14 @@ export async function runAgentTurnWithFallback(params: {
24232423
// the error to the user instead of silently returning an empty response.
24242424
// See #26905: Slack DM sessions silently swallowed messages when context
24252425
// overflow errors were returned as embedded error payloads.
2426+
// NOTE: The `!hasPayloadText` guard was intentionally removed. When run.ts
2427+
// reaches the terminal overflow path it always includes an error payload;
2428+
// that `!hasPayloadText` guard made this branch dead code in the common
2429+
// case and silently allowed overflow errors to bypass this user-visible
2430+
// notification when the payload was delivered through the "success" path
2431+
// but the user never saw a useful message (e.g. stuck/aborted sessions).
24262432
const finalEmbeddedError = runResult?.meta?.error;
2427-
const hasPayloadText = runResult?.payloads?.some((p) => normalizeOptionalString(p.text));
2428-
if (finalEmbeddedError && !hasPayloadText) {
2433+
if (finalEmbeddedError) {
24292434
const errorMsg = finalEmbeddedError.message ?? "";
24302435
if (isContextOverflowError(errorMsg)) {
24312436
params.replyOperation?.fail("run_failed", finalEmbeddedError);

0 commit comments

Comments
 (0)