fix(core): allow intentional foreground sleep for backoff#4708
Conversation
| if (separator === null) return null; | ||
|
|
||
| const rest = separator.rest.trim(); | ||
| if (!rest && hasIntentionalSleepComment(comment)) return null; |
There was a problem hiding this comment.
[Suggestion] Two design concerns with the escape hatch:
-
No duration cap —
sleep 86400 # intentional-sleep: deliberate long backoff waitpasses through unbounded. The original guard existed to prevent foreground blocking; the escape hatch removes the ceiling entirely. Consider capping at e.g. 60s and requiringis_background: truefor anything longer:if (!rest && hasIntentionalSleepComment(comment)) { if (secs <= MAX_INTENTIONAL_SLEEP_SECS) return null; // Fall through — too long even with escape hatch }
-
No observability — This return path is completely silent (no log, no telemetry, no span event). When debugging whether a multi-minute pause was intentional or a bug, there's no way to distinguish. A single structured log with the duration and reason would be valuable operationally.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Thanks, agreed on keeping the escape hatch bounded. I added a duration cap, but set it to 10 minutes instead of 60 seconds because rate-limit reset windows are often minute-scale and this matches the existing shell timeout ceiling. Sleeps above the cap now keep falling through to the normal blocked path.
I did not add a new telemetry/span event in this PR. With the 10-minute cap, the risk of silent multi-hour foreground blocking is removed, and the intentional reason remains visible in the shell command/tool invocation. I would rather keep this validator-only fix narrow unless maintainers want a broader observability event for all sleep-interception decisions.
| } | ||
|
|
||
| const INTENTIONAL_SLEEP_COMMENT_PREFIX = 'intentional-sleep:'; | ||
| const MIN_INTENTIONAL_SLEEP_REASON_LENGTH = 8; |
There was a problem hiding this comment.
[Suggestion] The magic number 8 has no rationale comment and no boundary test. The tests cover 0 chars (intentional-sleep:) and 4 chars (intentional-sleep: wait), but not the exact boundary (7 vs 8 chars). A future maintainer cannot determine whether changing to 6 or 12 would break anything.
Suggested additions:
// Require ~2+ words to prevent trivially short opt-outs (e.g. "wait")
const MIN_INTENTIONAL_SLEEP_REASON_LENGTH = 8;And a boundary test:
expect(detectBlockedSleepPattern('sleep 5 # intentional-sleep: 1234567')).toBe('standalone sleep 5');
expect(detectBlockedSleepPattern('sleep 5 # intentional-sleep: 12345678')).toBeNull();— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Done. I added a short rationale comment for the minimum reason length and covered the 7/8 character boundary in the focused shell tests.
wenshao
left a comment
There was a problem hiding this comment.
[Suggestion] The intentional-sleep escape hatch has zero integration/E2E coverage. integration-tests/cli/sleep-interception.test.ts has tests for sleep blocking and < 2s allowed sleeps, but no test for the escape hatch path. The unit tests validate detectBlockedSleepPattern directly, but the end-to-end flow — model receives the error mentioning the escape hatch, then retries with the correct comment format — is unverified. Consider adding an integration test that prompts the model to run sleep 5, expects it to be blocked, then verifies the model retries with an # intentional-sleep: comment and succeeds.
— qwen3.7-max via Qwen Code /review
| secs <= MAX_INTENTIONAL_SLEEP_SECONDS && | ||
| hasIntentionalSleepComment(comment) | ||
| ) { | ||
| return null; |
There was a problem hiding this comment.
[Suggestion] When the escape hatch fires, this returns null silently — no log, no telemetry, no annotation. The file has 41+ uses of debugLogger for notable events. Consider adding debugLogger.debug('intentional sleep allowed', { duration: secs, reason: comment }) before the return. This is the only breadcrumb available when investigating why an agent session was blocked in a foreground sleep for up to 10 minutes.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Done in e95fa62. The allowed escape hatch path now logs debugLogger.debug("intentional sleep allowed", { durationSeconds, reason }) before returning, so a foreground delay up to the 10-minute cap has a session-log breadcrumb.
| ) { | ||
| return null; | ||
| } | ||
| return rest |
There was a problem hiding this comment.
[Suggestion] When the escape hatch is attempted but fails (duration over cap or reason too short), this fallthrough produces the same error string as when no escape hatch was attempted at all. A user who wrote sleep 700 # intentional-sleep: wait for MCP rate limit reset sees "add a trailing comment like # intentional-sleep: ..." — exactly what they already did. This creates retry loops. Consider having detectBlockedSleepPattern distinguish "no escape hatch attempted" from "escape hatch rejected: duration over cap / reason too short" so the error message can be contextual.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Done in e95fa62. The validator now uses internal sleep-pattern details so rejected escape hatch attempts get contextual guidance: too-short reasons ask for a clearer reason, and sleeps over 10 minutes tell the model to use is_background: true or Monitor instead of repeating the same comment hint.
| 'Run blocking commands in the background with is_background: true. ' + | ||
| 'For streaming events (watching logs, polling APIs), use the Monitor tool. ' + | ||
| 'If you genuinely need a delay (rate limiting, deliberate pacing), keep it under 2 seconds.' | ||
| 'If you genuinely need a standalone delay (rate limiting, deliberate pacing), ' + |
There was a problem hiding this comment.
[Suggestion] This user-facing error message — which introduces the # intentional-sleep: escape hatch syntax to the model — has no test coverage anywhere in the codebase. If the hint text is changed or removed, no test will catch the regression. Consider adding a test that invokes the pre-validation check with a blocked sleep command and asserts the returned string contains intentional-sleep:.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Done in e95fa62. Added shell validation coverage asserting the blocked-sleep error introduces intentional-sleep:, plus coverage that rejected intentional comments do not fall back to the generic “add a trailing comment” hint.
|
Thanks, addressed the new review pass in e95fa62. I added the sleep-interception integration case that first asks the model to run |
| expect(overCapError).toContain('foreground sleeps over 10 minutes'); | ||
| expect(overCapError).not.toContain('add a trailing comment like'); | ||
| }); | ||
|
|
There was a problem hiding this comment.
[Suggestion] The validateToolParams tests cover escape hatch rejection (error contains 'intentional-sleep:', short reason, over-cap duration), but there is no test asserting that a valid intentional sleep comment returns null from validateToolParams. The feature's primary happy path — allowing the command through — has no integration-boundary assertion. A regression that always rejects would not be caught at this level.
| it('should explain rejected intentional sleep comments', async () => { | |
| const shortReasonError = shellTool.validateToolParams({ | |
| command: 'sleep 5 # intentional-sleep: wait', | |
| is_background: false, | |
| }); | |
| const overCapError = shellTool.validateToolParams({ | |
| command: | |
| 'sleep 601s # intentional-sleep: wait for MCP rate limit reset', | |
| is_background: false, | |
| }); | |
| expect(shortReasonError).toContain('reason is too short'); | |
| expect(shortReasonError).not.toContain('add a trailing comment like'); | |
| expect(overCapError).toContain('foreground sleeps over 10 minutes'); | |
| expect(overCapError).not.toContain('add a trailing comment like'); | |
| }); | |
| it('should allow sleep with a valid intentional sleep comment', async () => { | |
| const error = shellTool.validateToolParams({ | |
| command: 'sleep 5 # intentional-sleep: wait for MCP rate limit reset', | |
| is_background: false, | |
| }); | |
| expect(error).toBeNull(); | |
| }); |
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Done in dcece9f. Added a validator-boundary happy-path test that asserts a valid standalone sleep 5 # intentional-sleep: wait for MCP rate limit reset returns null from validateToolParams.
| return { command, comment: null }; | ||
| } | ||
|
|
||
| function trimTrailingShellComment(command: string): string { |
There was a problem hiding this comment.
[Suggestion] The refactoring of trimTrailingShellComment into a wrapper around splitTrailingShellComment silently changes behavior for multi-line commands with mid-stream comments. Previously trimTrailingShellComment('ls # note\nsleep 5 &') returned 'ls ' (everything from # onward discarded). Now it returns 'ls \nsleep 5 &' (post-newline content preserved). The sole caller hasTopLevelTrailingBackgroundOperator (line 1299) will now detect a trailing & that was previously invisible, causing new validation errors for a class of inputs that previously passed silently.
This is arguably more correct, but it is undocumented and has no test. Consider adding a test for hasTopLevelTrailingBackgroundOperator with multi-line commented input, or preserving the old semantics in trimTrailingShellComment and only using splitTrailingShellComment's richer return value inside detectBlockedSleepPatternDetails.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Done in dcece9f. I preserved the pre-existing trimTrailingShellComment behavior for managed background validation by making only the sleep detector keep post-comment newline commands. Added a regression test for the multi-line commented input so this behavior is explicit.
| 'add a trailing comment like `# intentional-sleep: wait for MCP rate limit reset` (up to 10 minutes).'; | ||
| return ( | ||
| `Blocked: ${sleepPattern}. ` + | ||
| `Blocked: ${sleepPattern.description}. ` + |
There was a problem hiding this comment.
[Suggestion] This error message advertises the exact escape hatch syntax to the model. In auto/yolo approval mode, a prompt injection that tricks the model into running sleep 600 will first be blocked, then receive a step-by-step tutorial on how to retry successfully. The second attempt — armed with the error message's own instructions — succeeds, unlocking up to 10 minutes of foreground blocking.
Before this PR, the guard had no bypass path discoverable from the error alone. After this PR, the guard teaches its own defeat in a single retry round-trip. Consider either (a) not advertising the escape hatch syntax in the error (require the model to discover it from documentation), or (b) adding a user-visible notification when the escape hatch is used.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
I am leaving the standalone error hint in place. The escape hatch is meant to be discoverable to the agent at the moment it hits the validator; hiding the syntax would make the feature much less useful for the rate-limit backoff case this PR is addressing. The bypass remains bounded to standalone sleeps, requires a reason, is capped at 10 minutes, logs a debug breadcrumb, and chained sleeps no longer receive the concrete syntax hint after dcece9f. I think adding a broader user-visible notification would be a separate UX decision rather than part of this narrow validator fix.
| const intentionalSleepGuidance = | ||
| sleepPattern.intentionalSleepRejection ?? | ||
| 'If you genuinely need a standalone delay (rate limiting, deliberate pacing), ' + | ||
| 'add a trailing comment like `# intentional-sleep: wait for MCP rate limit reset` (up to 10 minutes).'; |
There was a problem hiding this comment.
[Suggestion] When intentionalSleepRejection is undefined (no escape hatch was attempted), this fallback guidance is shown for all blocked sleeps, including chained commands like sleep 5 && echo ok. But the escape hatch only applies to standalone sleeps (!rest in detectBlockedSleepPatternDetails). For chained commands, the model receives actionable advice (add a trailing comment like ...) that structurally cannot work — adding the comment will get blocked again with the same generic guidance, creating a potential retry loop.
Consider scoping the escape-hatch hint to standalone sleeps only. When rest is non-empty, use a different message explaining that the escape hatch comment only applies to standalone sleep commands.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Done in dcece9f. Sleep-pattern details now track whether the blocked sleep is standalone. Chained sleeps get standalone-only guidance instead of the concrete # intentional-sleep: syntax, and the validator test asserts that the chained case does not include the structurally invalid comment hint.
| // Require a real reason, not a trivial opt-out like "wait". | ||
| const MIN_INTENTIONAL_SLEEP_REASON_LENGTH = 8; | ||
|
|
||
| function hasIntentionalSleepCommentPrefix(comment: string | null): boolean { |
There was a problem hiding this comment.
[Suggestion] hasIntentionalSleepCommentPrefix and getIntentionalSleepReason duplicate the same null-check, trim, and startsWith(INTENTIONAL_SLEEP_COMMENT_PREFIX) logic. Since getIntentionalSleepReason already returns null when the prefix is absent, the separate predicate adds a layer of indirection without improving clarity. The call site (line 1078–1099) calls hasIntentionalSleepCommentPrefix first, then immediately calls getIntentionalSleepReason — both performing identical work on the same input.
Consider consolidating into a single function:
function getIntentionalSleepReason(comment: string | null): string | null {
if (comment === null) return null;
const trimmed = comment.trim();
if (!trimmed.startsWith(INTENTIONAL_SLEEP_COMMENT_PREFIX)) return null;
const reason = trimmed.slice(INTENTIONAL_SLEEP_COMMENT_PREFIX.length).trim();
return reason.length >= MIN_INTENTIONAL_SLEEP_REASON_LENGTH ? reason : null;
}Then the call site can distinguish "no prefix" from "prefix but short reason" by checking the comment string directly, eliminating the need for hasIntentionalSleepCommentPrefix.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Done in dcece9f. Removed the duplicate prefix helper and consolidated reason parsing so the call site trims/checks the prefix once, then passes the trimmed comment into getIntentionalSleepReason.
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — qwen3.7-max via Qwen Code /review
Local Verification ReportPR: #4708 — fix(core): allow intentional foreground sleep for backoff SummaryAdds a narrow escape hatch for foreground Changes span 3 files (+231/−10 lines):
Build & Type Check
Tests
Code ReviewSecurity — escape hatch is narrow and well-guarded:
Refactoring quality:
No issues found. Verdict✅ PASS — Safe to merge. Well-scoped feature with comprehensive test coverage and multiple safety guards against abuse. No regressions in existing tests. Verified by wenshao (local tmux parallel execution) |
tanzhenxin
left a comment
There was a problem hiding this comment.
Approving (re-reviewed at dcece9f).
The escape hatch is secure — smuggled commands in the trailing comment never execute, and chained/newline commands stay blocked (the detector keeps the reassembling split path). The June 4 commit also fixed the trailing-& background-guard regression cleanly: trimTrailingShellComment now truncates at the comment (keepCommandsAfterCommentNewline=false) while the detector keeps reassembly, so node server & # a\n# b is blocked again, matching main. Verified the two call sites get the right variant.
One non-blocking follow-up worth a later fix: the escape hatch — and its own block-message guidance — advertises 'up to 10 minutes', but the default foreground timeout is 120s, so sleep 300 # intentional-sleep: ... passes the guard yet is killed at 2 minutes unless the caller also sets timeout. Since the model only discovers this marker via that block message (it isn't in the tool description), it's actively told 'up to 10 minutes' and may rely on it. Suggest either lowering the advertised cap to the 120s default, or auto-raising the foreground timeout to the sleep duration when an intentional-sleep is detected. Not blocking — merging now.
Summary
# intentional-sleep: <reason>, capped at 10 minutes.sleep >= 2sinterception prevents accidental agent blocking, but it also blocks legitimate MCP/tool rate-limit backoff delays.Validation
sleep 5 # intentional-sleep: wait for MCP rate limit resetsleep 10m # intentional-sleep: wait for MCP rate limit resetsleep 601s # intentional-sleep: wait for MCP rate limit resetsleep 5 # waitsleep 5 && echo ok # intentional-sleep: wait for rate limit resetsleep 5 # intentional-sleep: wait for rate limit reset\necho ok# intentional-sleep:syntax hint; they explain that the escape hatch is standalone-only.packages/core/src/tools/shell.test.tspassed with 213 tests.packages/vscode-ide-companion, but no errors.integration-tests/cli/sleep-interception.test.tsnow includes a retry-with-intentional-sleepintegration case. The local run was attempted, but all four cases failed before reaching assertions because the headless CLI hit[API Error: Connection error. (cause: fetch failed)]; the existing three cases failed the same way.npm run preflightcompleted format, lint:ci, build, and typecheck, then failed in full test suites outside this change area. The touched shell test passed during the full core run.detectBlockedSleepPatternplus shell validation cases for valid intentional comments, rejected reasons, ordinary comments, chained sleeps, newline-hidden commands, duration cap, reason-length boundary, and preserved managed-background comment trimming.Scope / Risk
npm run preflightdid not pass locally because unrelated full-suite tests failed in this headless/worktree environment. The sleep-interception integration file could not complete locally because the headless CLI could not reach the model API. No screenshot or video is included because this is a shell validator behavior change with unit coverage.Testing Matrix
Testing matrix notes:
npm run preflightwas attempted but failed in unrelated full-suite tests as described above.Linked Issues / Bugs
Fixes #4707
Related: #3684, #3634