You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the terminal tool uses the rich execute strategy, commands can hang indefinitely if shell integration sequences stop arriving mid-command. This caused 19 out of 45 eval failures (42%) in eval run 24966302375 — and 15 of those 19 tasks passed in the CLI with the same model (gpt-5.4).
Root Cause
In trackIdleOnPrompt() (executeStrategy.ts):
Command starts → terminal data events arrive (command output)
If a C/D shell integration sequence is detected, state transitions to Executing
In Executing state, promptFallbackScheduler is also cancelled
Shell integration breaks → no A (prompt) sequence ever arrives
Both fallback schedulers are cancelled → the Promise never resolves → permanent hang
The initialFallbackScheduler from #312387 only helps when no data arrives at all. When data arrives but shell integration sequences are incomplete, the hang persists.
Evidence
Every timed-out task in the eval shows this pattern in terminal.log:
20:59:25 [info] Using `rich` execute strategy for command ...
20:59:35 [warning] Shell integration failed to add capabilities within 10 seconds
[nothing for 60 minutes until timeout]
Metric
Value
Tasks that timed out in VS Code
19
Same tasks that passed in CLI
15
CLI avg completion time
3-20 min
VS Code active time before hang
typically <1 min
Tasks where VS Code never completed a single command
5
Proposed Fix
When shell integration capabilities fail to load within 10 seconds (the warning we already log), downgrade from rich to basic strategy mid-flight. The basic strategy uses data-idle polling rather than shell integration sequences, so it will not hang.
Problem
When the terminal tool uses the
richexecute strategy, commands can hang indefinitely if shell integration sequences stop arriving mid-command. This caused 19 out of 45 eval failures (42%) in eval run 24966302375 — and 15 of those 19 tasks passed in the CLI with the same model (gpt-5.4).Root Cause
In
trackIdleOnPrompt()(executeStrategy.ts):initialFallbackScheduler(the 10s safety net from fix: schedule promptFallbackScheduler immediately in trackIdleOnPrompt #312387)C/Dshell integration sequence is detected, state transitions toExecutingExecutingstate,promptFallbackScheduleris also cancelledA(prompt) sequence ever arrivesThe
initialFallbackSchedulerfrom #312387 only helps when no data arrives at all. When data arrives but shell integration sequences are incomplete, the hang persists.Evidence
Every timed-out task in the eval shows this pattern in
terminal.log:Proposed Fix
When shell integration capabilities fail to load within 10 seconds (the warning we already log), downgrade from
richtobasicstrategy mid-flight. The basic strategy uses data-idle polling rather than shell integration sequences, so it will not hang.Related
initialFallbackScheduler(partial fix, only covers no-data case)onExitrace (fixes process-exit case, but not long-running commands)