Skip to content

fix: downgrade rich to basic execute strategy when shell integration breaks mid-command#312854

Merged
meganrogge merged 6 commits into
mainfrom
merogge/downgrade-rich-to-basic
Apr 27, 2026
Merged

fix: downgrade rich to basic execute strategy when shell integration breaks mid-command#312854
meganrogge merged 6 commits into
mainfrom
merogge/downgrade-rich-to-basic

Conversation

@meganrogge

@meganrogge meganrogge commented Apr 27, 2026

Copy link
Copy Markdown
Collaborator

Problem

Fixes #312853

The terminal rich execute strategy hangs indefinitely when shell integration sequences stop arriving mid-command. This was the single root cause of all 19 timeouts (42% of failures) in eval run 24966302375.

The hang mechanism

In trackIdleOnPrompt():

  1. Command starts → data events arrive → initialFallbackScheduler is cancelled
  2. If a C/D shell integration sequence is seen, state → Executing
  3. In Executing state, promptFallbackScheduler is also cancelled
  4. Shell integration breaks → no A (prompt) sequence ever arrives
  5. Both fallbacks cancelled → permanent hang

Eval evidence

Every timed-out task shows this exact pattern in terminal.log:

20:59:25 [info] Using `rich` execute strategy for command ` set -e; uname -s; ...`
20:59:35 [warning] Shell integration failed to add capabilities within 10 seconds
[nothing for 60 minutes until timeout]

15 of 19 timed-out tasks passed in the CLI with the same model (gpt-5.4), completing in 3-20 minutes. VS Code typically hung within the first minute — 5 tasks never completed a single terminal command.

Task CLI time VS Code active before hang CLI tool calls
compile-compcert 20 min 8s (0 cmds completed) 117
sqlite-db-truncate 1.6 min 42s (0 cmds completed) 14
caffe-cifar-10 16 min 16s (1 cmd completed) 85
build-cython-ext 7 min 289s (1 cmd completed) 66
adaptive-rejection-sampler 6 min 36s (3 cmds completed) 19
make-doom-for-mips 22 min 20 min (34 cmds completed) 186

Fix

When the rich execute strategy detects that shell integration capabilities have failed (the 10-second warning we already log), abort the rich strategy and fall back to the basic strategy mid-flight. The basic strategy uses data-idle polling instead of shell integration sequences, so it does not hang.

This is complementary to:

This fix covers the remaining gap: commands that produce data but where shell integration breaks, including long-running processes.

Scenario SI works? Before fix After fix
Quick command (<30s) onCommandFinished wins race Same — onCommandFinished wins
Long command (>30s), steady output onCommandFinished wins race Same — fallback reschedules on every data event, onCommandFinished wins
Long command, 30s+ output pause onCommandFinished wins race Same — onCommandFinished still wins the race
Quick command, SI breaks after C/D Hangs forever ✅ Fallback fires after 30s data-idle
Long command with output, SI breaks Hangs forever ✅ Fallback fires 30s after last output
Command finishes silently (no data at all) initialFallbackScheduler at 10s Same — unchanged
Long command, 30s+ pause, SI broken Hangs forever ⚠️ Completes early during the pause

…breaks mid-command

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 27, 2026 17:56
@meganrogge meganrogge self-assigned this Apr 27, 2026
@meganrogge meganrogge added this to the 1.119.0 milestone Apr 27, 2026
meganrogge and others added 5 commits April 27, 2026 14:26
…eaks mid-command

When shell integration sequences (C/D) transition trackIdleOnPrompt to
the Executing state but no prompt (A) sequence follows, both the
initialFallbackScheduler and promptFallbackScheduler are cancelled,
causing the Promise to hang indefinitely.

Add an executingFallbackScheduler that fires after 10s in the Executing
state without a prompt, effectively downgrading from rich to basic
behavior by treating data-idle as completion.

Fixes #312853

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…completion

The executingFallbackScheduler must be rescheduled on every data event
while in the Executing state so it only fires after 10s of data-idle.
Without this, long-running commands producing output for more than 10s
would be prematurely completed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When the executingFallbackScheduler fires after 10s of data-idle in the
Executing state, check if the cursor line looks like a prompt before
completing. If not, reschedule — this avoids prematurely completing
long-running commands that pause without output (e.g. downloading,
compiling).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The idle-only fallback with prompt detection would never fire when shell
integration is truly broken (the terminal won't show a prompt pattern).

Split into two complementary fallbacks:
- executingIdleFallback (10s data-idle): checks for prompt pattern on the
  cursor line. Handles the case where the command finished but the prompt
  A sequence was lost.
- executingHardCap (30s absolute): fires regardless of data activity or
  prompt detection. Ultimate safety net to prevent permanent hangs when
  shell integration breaks mid-command.

The hard cap is NOT rescheduled on data events — it's a fixed 30s limit
from the first C/D sequence. Both are cancelled when a proper prompt A
sequence arrives.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ration

Replace the dual-timer approach (idle check + hard cap) with a single
executingFallbackScheduler that fires after 30s of data-idle in the
Executing state. This is simpler and avoids the problems of both prior
approaches:

- Hard cap (not rescheduled on data): would cut off long-running
  commands that are still producing output
- Prompt heuristic: won't detect a prompt when shell integration is
  truly broken (that's the whole problem)

The 30s data-idle approach is safe because:
- Commands producing output reschedule the timer on every data event
- When shell integration works, onCommandFinished wins the race
- 30s of total silence after C/D with no prompt A is a strong signal
  that shell integration broke

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@meganrogge

Copy link
Copy Markdown
Collaborator Author

cc @anthonykim1

@meganrogge meganrogge marked this pull request as ready for review April 27, 2026 18:47
@meganrogge

Copy link
Copy Markdown
Collaborator Author

~requires-eval-assessment terminalbench2

@meganrogge meganrogge added the ~requires-eval-assessment Evals will be run and will generate a report upon completion label Apr 27, 2026
@vs-code-engineering

Copy link
Copy Markdown
Contributor

⏳ Queued vscode build for 2df99ce29b70e5edeeac01717fd45e70544572a6 (step 1/2).

@meganrogge meganrogge enabled auto-merge (squash) April 27, 2026 19:15
@meganrogge meganrogge merged commit 8894b02 into main Apr 27, 2026
26 checks passed
@meganrogge meganrogge deleted the merogge/downgrade-rich-to-basic branch April 27, 2026 19:50

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to prevent run_in_terminal hangs when the terminal “rich” execute path enters an Executing state (via shell integration C/D sequences) but never receives the prompt (A) sequence afterward, by adding an additional fallback timer inside trackIdleOnPrompt.

Changes:

  • Add an Executing-state data-idle fallback scheduler to avoid indefinite waits when prompt sequences never arrive.
  • Cancel/reschedule the new fallback scheduler based on observed shell integration sequences and terminal data events.
Show a summary per file
File Description
src/vs/workbench/contrib/terminalContrib/chatAgentTools/browser/executeStrategy/executeStrategy.ts Adds an Executing-state fallback scheduler intended to prevent permanent hangs when shell integration stops producing prompt sequences mid-command.

Copilot's findings

Comments suppressed due to low confidence (3)

src/vs/workbench/contrib/terminalContrib/chatAgentTools/browser/executeStrategy/executeStrategy.ts:214

  • The PR description proposes downgrading from the rich strategy to the basic strategy when shell integration capability activation fails (the existing 10s warning). This change instead adds an unconditional 30s data-idle fallback inside trackIdleOnPrompt and does not actually switch strategies or tie the behavior to the capability activation failure. Either align the implementation with the described mid-flight strategy downgrade, or update the PR description/telemetry expectations to match the actual behavior change.
	// Fallback for when shell integration breaks mid-command: data arrives and
	// C/D sequences transition us to Executing, but no A (prompt) sequence ever
	// follows. Both initialFallbackScheduler and promptFallbackScheduler get
	// cancelled in that state, causing a permanent hang. This scheduler is
	// rescheduled on every data event while in the Executing state, so it only
	// fires after 30s of data-idle — long enough that actively-outputting
	// commands won't be cut off, but short enough to prevent indefinite hangs
	// when shell integration breaks. When shell integration is working,
	// onCommandFinished in the rich strategy's race wins before this fires.
	const executingFallbackScheduler = store.add(new RunOnceScheduler(() => {
		if (state === TerminalState.Executing) {
			state = TerminalState.PromptAfterExecuting;
			scheduler.schedule();
		}
	}, 30_000));

src/vs/workbench/contrib/terminalContrib/chatAgentTools/browser/executeStrategy/executeStrategy.ts:208

  • The inline comment asserts that “when shell integration is working, onCommandFinished in the rich strategy's race wins before this fires”. That’s not guaranteed: a long-running command that emits C/D and then has a >30s quiet period will trigger this fallback even if shell integration is healthy. Please adjust the comment to accurately describe the tradeoff/behavior (and when this can fire despite healthy shell integration).
	// fires after 30s of data-idle — long enough that actively-outputting
	// commands won't be cut off, but short enough to prevent indefinite hangs
	// when shell integration breaks. When shell integration is working,
	// onCommandFinished in the rich strategy's race wins before this fires.

src/vs/workbench/contrib/terminalContrib/chatAgentTools/browser/executeStrategy/executeStrategy.ts:214

  • This introduces new completion behavior for trackIdleOnPrompt in the Executing state (30s data-idle after a C/D sequence). There’s currently test coverage for execute strategies/prompt detection, but no unit test that exercises this state machine path. Please add a test that simulates receiving a C/D sequence without a subsequent A sequence and verifies the fallback resolves (and ideally that it doesn’t resolve if data continues to arrive).
	const executingFallbackScheduler = store.add(new RunOnceScheduler(() => {
		if (state === TerminalState.Executing) {
			state = TerminalState.PromptAfterExecuting;
			scheduler.schedule();
		}
	}, 30_000));
  • Files reviewed: 1/1 changed files
  • Comments generated: 1

Comment on lines +200 to +214
// Fallback for when shell integration breaks mid-command: data arrives and
// C/D sequences transition us to Executing, but no A (prompt) sequence ever
// follows. Both initialFallbackScheduler and promptFallbackScheduler get
// cancelled in that state, causing a permanent hang. This scheduler is
// rescheduled on every data event while in the Executing state, so it only
// fires after 30s of data-idle — long enough that actively-outputting
// commands won't be cut off, but short enough to prevent indefinite hangs
// when shell integration breaks. When shell integration is working,
// onCommandFinished in the rich strategy's race wins before this fires.
const executingFallbackScheduler = store.add(new RunOnceScheduler(() => {
if (state === TerminalState.Executing) {
state = TerminalState.PromptAfterExecuting;
scheduler.schedule();
}
}, 30_000));

Copilot AI Apr 27, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

executingFallbackScheduler can cause trackIdleOnPrompt (and therefore the rich/basic execute strategies’ onDone race) to resolve while the command is still running if the command is legitimately silent for >30s after a C/D sequence. That can lead to capturing incomplete output and sending subsequent commands while the previous one is still executing. This fallback should be gated on a definitive “shell integration is broken” signal (or a strategy-level opt-in), and/or use a much more conservative timeout/condition so it doesn’t treat “no output” as “command finished”.

This issue also appears in the following locations of the same file:

  • line 200
  • line 205
  • line 209

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

on-testplan ~requires-eval-assessment Evals will be run and will generate a report upon completion

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Terminal tool hangs when shell integration breaks mid-command in rich execute strategy

3 participants