Skip to content

fix(core): allow intentional foreground sleep for backoff#4708

Merged
tanzhenxin merged 4 commits into
QwenLM:mainfrom
kkhomej33-netizen:fix/intentional-sleep-comments
Jun 8, 2026
Merged

fix(core): allow intentional foreground sleep for backoff#4708
tanzhenxin merged 4 commits into
QwenLM:mainfrom
kkhomej33-netizen:fix/intentional-sleep-comments

Conversation

@kkhomej33-netizen

@kkhomej33-netizen kkhomej33-netizen commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Summary

  • What changed: Added an explicit standalone sleep escape hatch for shell commands with a top-level trailing comment in the form # intentional-sleep: <reason>, capped at 10 minutes.
  • Why it changed: Foreground sleep >= 2s interception prevents accidental agent blocking, but it also blocks legitimate MCP/tool rate-limit backoff delays.
  • Reviewer focus: Please check the escape hatch remains narrow: ordinary comments still block, sleep chains still block with chain-specific guidance, commands after a comment newline are still detected for sleep interception, existing managed-background comment trimming behavior is preserved, rejected escape hatch attempts get contextual guidance, and sleeps above 10 minutes remain blocked.

Validation

  • Commands run:
    cd packages/core && npx vitest run src/tools/shell.test.ts
    cd packages/core && npm run typecheck
    cd packages/core && npm run build
    npx prettier --check packages/core/src/tools/shell.ts packages/core/src/tools/shell.test.ts integration-tests/cli/sleep-interception.test.ts
    npm run build
    npm run bundle
    npm run typecheck
    npx cross-env QWEN_SANDBOX=false npx vitest run cli/sleep-interception.test.ts
    npm run preflight
  • Prompts / inputs used:
    • sleep 5 # intentional-sleep: wait for MCP rate limit reset
    • sleep 10m # intentional-sleep: wait for MCP rate limit reset
    • sleep 601s # intentional-sleep: wait for MCP rate limit reset
    • sleep 5 # wait
    • sleep 5 && echo ok # intentional-sleep: wait for rate limit reset
    • sleep 5 # intentional-sleep: wait for rate limit reset\necho ok
  • Expected result:
    • Standalone intentional sleep up to 10 minutes is allowed.
    • Ordinary comments, chained commands, newline-hidden commands, short reasons, and intentional sleeps above 10 minutes remain blocked by sleep interception.
    • Rejected intentional sleep comments explain whether the reason is too short or the sleep exceeds the foreground cap instead of asking for the same comment again.
    • Chained sleeps do not receive a structurally invalid # intentional-sleep: syntax hint; they explain that the escape hatch is standalone-only.
  • Observed result:
    • Focused shell tests passed: packages/core/src/tools/shell.test.ts passed with 213 tests.
    • Core typecheck passed.
    • Core build passed.
    • Root build passed. It emitted existing lint warnings in packages/vscode-ide-companion, but no errors.
    • Root bundle passed.
    • Root typecheck passed.
    • Prettier check for the touched files passed.
    • integration-tests/cli/sleep-interception.test.ts now includes a retry-with-intentional-sleep integration case. The local run was attempted, but all four cases failed before reaching assertions because the headless CLI hit [API Error: Connection error. (cause: fetch failed)]; the existing three cases failed the same way.
    • npm run preflight completed format, lint:ci, build, and typecheck, then failed in full test suites outside this change area. The touched shell test passed during the full core run.
  • Quickest reviewer verification path:
    • Run the focused shell test file and inspect the detectBlockedSleepPattern plus shell validation cases for valid intentional comments, rejected reasons, ordinary comments, chained sleeps, newline-hidden commands, duration cap, reason-length boundary, and preserved managed-background comment trimming.
  • Evidence (output, logs, screenshots, video, JSON, before/after, etc.):
    cd packages/core && npx vitest run src/tools/shell.test.ts
    Test Files  1 passed (1)
    Tests  213 passed (213)
    
    cd packages/core && npm run typecheck
    tsc --noEmit
    exit 0
    
    cd packages/core && npm run build
    Successfully copied files.
    
    npx prettier --check packages/core/src/tools/shell.ts packages/core/src/tools/shell.test.ts integration-tests/cli/sleep-interception.test.ts
    All matched files use Prettier code style!
    
    npm run build
    exit 0
    
    npm run bundle
    All bundle assets copied to dist/
    
    npm run typecheck
    exit 0
    
    npx cross-env QWEN_SANDBOX=false npx vitest run cli/sleep-interception.test.ts
    failed before assertions: [API Error: Connection error. (cause: fetch failed)]
    
    npm run preflight
    format: completed
    lint:ci: completed
    build: completed
    typecheck: completed
    test:ci: failed outside touched files
      - packages/cli src/serve/server.test.ts: many server port null failures, plus NSPasteboard headless crash
      - packages/core enter-worktree.session.integ.test.ts / exit-worktree.session.integ.test.ts: 5 WorktreeSession sidecar assertions failed
      - packages/vscode-ide-companion src/ide-server.test.ts: 12 server-not-running / undefined port failures
    touched shell test passed during the full core run: src/tools/shell.test.ts (207 tests at that time; now 213 after review follow-ups)
    

Scope / Risk

  • Main risk or tradeoff: The shell guard now has an explicit comment-based escape hatch, so the parser must keep treating only standalone sleeps as eligible and cap the maximum foreground wait.
  • Not covered / not validated: Full npm run preflight did not pass locally because unrelated full-suite tests failed in this headless/worktree environment. The sleep-interception integration file could not complete locally because the headless CLI could not reach the model API. No screenshot or video is included because this is a shell validator behavior change with unit coverage.
  • Breaking changes / migration notes: None.

Testing Matrix

🍏 🪟 🐧
npm run ⚠️ ⚠️
npx ⚠️ ⚠️
Docker N/A ⚠️ ⚠️
Podman N/A N/A N/A
Seatbelt ⚠️ N/A N/A

Testing matrix notes:

  • macOS npm/npx focused validation passed for the touched core shell behavior.
  • macOS full npm run preflight was attempted but failed in unrelated full-suite tests as described above.
  • Windows/Linux/Docker were not tested locally.
  • Seatbelt-specific integration was not run; this change is pure shell command validation.

Linked Issues / Bugs

Fixes #4707

Related: #3684, #3634

Comment thread packages/core/src/tools/shell.ts Outdated
if (separator === null) return null;

const rest = separator.rest.trim();
if (!rest && hasIntentionalSleepComment(comment)) return null;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Two design concerns with the escape hatch:

  1. No duration capsleep 86400 # intentional-sleep: deliberate long backoff wait passes through unbounded. The original guard existed to prevent foreground blocking; the escape hatch removes the ceiling entirely. Consider capping at e.g. 60s and requiring is_background: true for anything longer:

    if (!rest && hasIntentionalSleepComment(comment)) {
      if (secs <= MAX_INTENTIONAL_SLEEP_SECS) return null;
      // Fall through — too long even with escape hatch
    }
  2. No observability — This return path is completely silent (no log, no telemetry, no span event). When debugging whether a multi-minute pause was intentional or a bug, there's no way to distinguish. A single structured log with the duration and reason would be valuable operationally.

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, agreed on keeping the escape hatch bounded. I added a duration cap, but set it to 10 minutes instead of 60 seconds because rate-limit reset windows are often minute-scale and this matches the existing shell timeout ceiling. Sleeps above the cap now keep falling through to the normal blocked path.

I did not add a new telemetry/span event in this PR. With the 10-minute cap, the risk of silent multi-hour foreground blocking is removed, and the intentional reason remains visible in the shell command/tool invocation. I would rather keep this validator-only fix narrow unless maintainers want a broader observability event for all sleep-interception decisions.

}

const INTENTIONAL_SLEEP_COMMENT_PREFIX = 'intentional-sleep:';
const MIN_INTENTIONAL_SLEEP_REASON_LENGTH = 8;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The magic number 8 has no rationale comment and no boundary test. The tests cover 0 chars (intentional-sleep:) and 4 chars (intentional-sleep: wait), but not the exact boundary (7 vs 8 chars). A future maintainer cannot determine whether changing to 6 or 12 would break anything.

Suggested additions:

// Require ~2+ words to prevent trivially short opt-outs (e.g. "wait")
const MIN_INTENTIONAL_SLEEP_REASON_LENGTH = 8;

And a boundary test:

expect(detectBlockedSleepPattern('sleep 5 # intentional-sleep: 1234567')).toBe('standalone sleep 5');
expect(detectBlockedSleepPattern('sleep 5 # intentional-sleep: 12345678')).toBeNull();

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I added a short rationale comment for the minimum reason length and covered the 7/8 character boundary in the focused shell tests.

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The intentional-sleep escape hatch has zero integration/E2E coverage. integration-tests/cli/sleep-interception.test.ts has tests for sleep blocking and < 2s allowed sleeps, but no test for the escape hatch path. The unit tests validate detectBlockedSleepPattern directly, but the end-to-end flow — model receives the error mentioning the escape hatch, then retries with the correct comment format — is unverified. Consider adding an integration test that prompts the model to run sleep 5, expects it to be blocked, then verifies the model retries with an # intentional-sleep: comment and succeeds.

— qwen3.7-max via Qwen Code /review

secs <= MAX_INTENTIONAL_SLEEP_SECONDS &&
hasIntentionalSleepComment(comment)
) {
return null;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] When the escape hatch fires, this returns null silently — no log, no telemetry, no annotation. The file has 41+ uses of debugLogger for notable events. Consider adding debugLogger.debug('intentional sleep allowed', { duration: secs, reason: comment }) before the return. This is the only breadcrumb available when investigating why an agent session was blocked in a foreground sleep for up to 10 minutes.

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in e95fa62. The allowed escape hatch path now logs debugLogger.debug("intentional sleep allowed", { durationSeconds, reason }) before returning, so a foreground delay up to the 10-minute cap has a session-log breadcrumb.

Comment thread packages/core/src/tools/shell.ts Outdated
) {
return null;
}
return rest

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] When the escape hatch is attempted but fails (duration over cap or reason too short), this fallthrough produces the same error string as when no escape hatch was attempted at all. A user who wrote sleep 700 # intentional-sleep: wait for MCP rate limit reset sees "add a trailing comment like # intentional-sleep: ..." — exactly what they already did. This creates retry loops. Consider having detectBlockedSleepPattern distinguish "no escape hatch attempted" from "escape hatch rejected: duration over cap / reason too short" so the error message can be contextual.

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in e95fa62. The validator now uses internal sleep-pattern details so rejected escape hatch attempts get contextual guidance: too-short reasons ask for a clearer reason, and sleeps over 10 minutes tell the model to use is_background: true or Monitor instead of repeating the same comment hint.

Comment thread packages/core/src/tools/shell.ts Outdated
'Run blocking commands in the background with is_background: true. ' +
'For streaming events (watching logs, polling APIs), use the Monitor tool. ' +
'If you genuinely need a delay (rate limiting, deliberate pacing), keep it under 2 seconds.'
'If you genuinely need a standalone delay (rate limiting, deliberate pacing), ' +

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] This user-facing error message — which introduces the # intentional-sleep: escape hatch syntax to the model — has no test coverage anywhere in the codebase. If the hint text is changed or removed, no test will catch the regression. Consider adding a test that invokes the pre-validation check with a blocked sleep command and asserts the returned string contains intentional-sleep:.

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in e95fa62. Added shell validation coverage asserting the blocked-sleep error introduces intentional-sleep:, plus coverage that rejected intentional comments do not fall back to the generic “add a trailing comment” hint.

@kkhomej33-netizen

Copy link
Copy Markdown
Contributor Author

Thanks, addressed the new review pass in e95fa62. I added the sleep-interception integration case that first asks the model to run sleep 5, then retries with sleep 2 # intentional-sleep: wait for MCP rate limit reset after the block. I also added focused validation tests for the user-facing hint and rejected escape hatch guidance, plus a debug log breadcrumb for allowed intentional sleeps. Local focused/unit validation passed (210 shell tests, core typecheck/build, root build/bundle/typecheck, touched-file Prettier). I attempted npx cross-env QWEN_SANDBOX=false npx vitest run cli/sleep-interception.test.ts; all four cases, including the existing three, failed before assertions with [API Error: Connection error. (cause: fetch failed)], so I could not complete the integration run locally.

expect(overCapError).toContain('foreground sleeps over 10 minutes');
expect(overCapError).not.toContain('add a trailing comment like');
});

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The validateToolParams tests cover escape hatch rejection (error contains 'intentional-sleep:', short reason, over-cap duration), but there is no test asserting that a valid intentional sleep comment returns null from validateToolParams. The feature's primary happy path — allowing the command through — has no integration-boundary assertion. A regression that always rejects would not be caught at this level.

Suggested change
it('should explain rejected intentional sleep comments', async () => {
const shortReasonError = shellTool.validateToolParams({
command: 'sleep 5 # intentional-sleep: wait',
is_background: false,
});
const overCapError = shellTool.validateToolParams({
command:
'sleep 601s # intentional-sleep: wait for MCP rate limit reset',
is_background: false,
});
expect(shortReasonError).toContain('reason is too short');
expect(shortReasonError).not.toContain('add a trailing comment like');
expect(overCapError).toContain('foreground sleeps over 10 minutes');
expect(overCapError).not.toContain('add a trailing comment like');
});
it('should allow sleep with a valid intentional sleep comment', async () => {
const error = shellTool.validateToolParams({
command: 'sleep 5 # intentional-sleep: wait for MCP rate limit reset',
is_background: false,
});
expect(error).toBeNull();
});

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in dcece9f. Added a validator-boundary happy-path test that asserts a valid standalone sleep 5 # intentional-sleep: wait for MCP rate limit reset returns null from validateToolParams.

return { command, comment: null };
}

function trimTrailingShellComment(command: string): string {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The refactoring of trimTrailingShellComment into a wrapper around splitTrailingShellComment silently changes behavior for multi-line commands with mid-stream comments. Previously trimTrailingShellComment('ls # note\nsleep 5 &') returned 'ls ' (everything from # onward discarded). Now it returns 'ls \nsleep 5 &' (post-newline content preserved). The sole caller hasTopLevelTrailingBackgroundOperator (line 1299) will now detect a trailing & that was previously invisible, causing new validation errors for a class of inputs that previously passed silently.

This is arguably more correct, but it is undocumented and has no test. Consider adding a test for hasTopLevelTrailingBackgroundOperator with multi-line commented input, or preserving the old semantics in trimTrailingShellComment and only using splitTrailingShellComment's richer return value inside detectBlockedSleepPatternDetails.

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in dcece9f. I preserved the pre-existing trimTrailingShellComment behavior for managed background validation by making only the sleep detector keep post-comment newline commands. Added a regression test for the multi-line commented input so this behavior is explicit.

'add a trailing comment like `# intentional-sleep: wait for MCP rate limit reset` (up to 10 minutes).';
return (
`Blocked: ${sleepPattern}. ` +
`Blocked: ${sleepPattern.description}. ` +

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] This error message advertises the exact escape hatch syntax to the model. In auto/yolo approval mode, a prompt injection that tricks the model into running sleep 600 will first be blocked, then receive a step-by-step tutorial on how to retry successfully. The second attempt — armed with the error message's own instructions — succeeds, unlocking up to 10 minutes of foreground blocking.

Before this PR, the guard had no bypass path discoverable from the error alone. After this PR, the guard teaches its own defeat in a single retry round-trip. Consider either (a) not advertising the escape hatch syntax in the error (require the model to discover it from documentation), or (b) adding a user-visible notification when the escape hatch is used.

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am leaving the standalone error hint in place. The escape hatch is meant to be discoverable to the agent at the moment it hits the validator; hiding the syntax would make the feature much less useful for the rate-limit backoff case this PR is addressing. The bypass remains bounded to standalone sleeps, requires a reason, is capped at 10 minutes, logs a debug breadcrumb, and chained sleeps no longer receive the concrete syntax hint after dcece9f. I think adding a broader user-visible notification would be a separate UX decision rather than part of this narrow validator fix.

Comment thread packages/core/src/tools/shell.ts Outdated
const intentionalSleepGuidance =
sleepPattern.intentionalSleepRejection ??
'If you genuinely need a standalone delay (rate limiting, deliberate pacing), ' +
'add a trailing comment like `# intentional-sleep: wait for MCP rate limit reset` (up to 10 minutes).';

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] When intentionalSleepRejection is undefined (no escape hatch was attempted), this fallback guidance is shown for all blocked sleeps, including chained commands like sleep 5 && echo ok. But the escape hatch only applies to standalone sleeps (!rest in detectBlockedSleepPatternDetails). For chained commands, the model receives actionable advice (add a trailing comment like ...) that structurally cannot work — adding the comment will get blocked again with the same generic guidance, creating a potential retry loop.

Consider scoping the escape-hatch hint to standalone sleeps only. When rest is non-empty, use a different message explaining that the escape hatch comment only applies to standalone sleep commands.

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in dcece9f. Sleep-pattern details now track whether the blocked sleep is standalone. Chained sleeps get standalone-only guidance instead of the concrete # intentional-sleep: syntax, and the validator test asserts that the chained case does not include the structurally invalid comment hint.

Comment thread packages/core/src/tools/shell.ts Outdated
// Require a real reason, not a trivial opt-out like "wait".
const MIN_INTENTIONAL_SLEEP_REASON_LENGTH = 8;

function hasIntentionalSleepCommentPrefix(comment: string | null): boolean {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] hasIntentionalSleepCommentPrefix and getIntentionalSleepReason duplicate the same null-check, trim, and startsWith(INTENTIONAL_SLEEP_COMMENT_PREFIX) logic. Since getIntentionalSleepReason already returns null when the prefix is absent, the separate predicate adds a layer of indirection without improving clarity. The call site (line 1078–1099) calls hasIntentionalSleepCommentPrefix first, then immediately calls getIntentionalSleepReason — both performing identical work on the same input.

Consider consolidating into a single function:

function getIntentionalSleepReason(comment: string | null): string | null {
  if (comment === null) return null;
  const trimmed = comment.trim();
  if (!trimmed.startsWith(INTENTIONAL_SLEEP_COMMENT_PREFIX)) return null;
  const reason = trimmed.slice(INTENTIONAL_SLEEP_COMMENT_PREFIX.length).trim();
  return reason.length >= MIN_INTENTIONAL_SLEEP_REASON_LENGTH ? reason : null;
}

Then the call site can distinguish "no prefix" from "prefix but short reason" by checking the comment string directly, eliminating the need for hasIntentionalSleepCommentPrefix.

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in dcece9f. Removed the duplicate prefix helper and consolidated reason parsing so the call site trims/checks the prefix once, then passes the trimmed comment into getIntentionalSleepReason.

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found. LGTM! ✅ — qwen3.7-max via Qwen Code /review

@wenshao

wenshao commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

Local Verification Report

PR: #4708 — fix(core): allow intentional foreground sleep for backoff
Branch: fix/intentional-sleep-commentsmain
Commit: dcece9f
Environment: macOS Darwin 25.4.0 (arm64), Node.js v22.17.0


Summary

Adds a narrow escape hatch for foreground sleep commands: a trailing comment # intentional-sleep: <reason> (≥ 8 chars) allows standalone sleeps up to 10 minutes, enabling legitimate MCP/tool rate-limit backoff delays without disabling the sleep interception safety net.

Changes span 3 files (+231/−10 lines):

  • packages/core/src/tools/shell.ts — refactors trimTrailingShellCommentsplitTrailingShellComment (returns both command and comment), adds detectBlockedSleepPatternDetails wrapper, implements the escape hatch with 3 guard rails (standalone-only, min reason length, duration cap)
  • packages/core/src/tools/shell.test.ts — 8 new unit tests covering: escape hatch mention in guidance, rejection explanations, valid intentional sleep, chain rejection, background comment trimming preservation, boundary tests for reason length (7 vs 8 chars), duration cap (601s)
  • integration-tests/cli/sleep-interception.test.ts — 1 new end-to-end test for retry flow

Build & Type Check

Step Result Notes
npm run build ✅ PASS (exit 0)
tsc --noEmit ✅ PASS (exit 0) No errors in PR-touched files. Pre-existing errors only in scripts/sync-computer-use-schemas.ts.

Tests

Suite Files Tests Result Notes
shell.test.ts (focused) 1 passed 213 passed ✅ ALL PASS All 8 new PR tests pass.
shell.test.js (compiled) 1 passed 202 passed ✅ ALL PASS Existing tests unaffected.
Full core suite ~718 passed, ~17 failed ~20k passed ⚠️ Pre-existing Failures are crawler timeouts, fetchGitDiff timeouts, AnthropicContentGenerator identity — all unrelated to this PR.
Integration (sleep-interception) 1 test failed 3 passed, 1 failed ⚠️ Pre-existing should allow sleep < 2s timed out (30s) — flaky LLM integration test, not PR-related. The new PR test should allow retrying blocked sleep with an intentional sleep comment passed.

Code Review

Security — escape hatch is narrow and well-guarded:

  1. Standalone-only: Chained commands (sleep 5 && echo ok) explicitly rejected — isStandalone check prevents hiding follow-up commands behind the comment
  2. Newline detection: splitTrailingShellComment with keepCommandsAfterCommentNewline=true re-appends commands after \n in comments — sleep 5 # intentional-sleep: reason\necho ok correctly detected as chained
  3. Minimum reason length: 8 chars prevents trivial opt-out like # intentional-sleep: wait (tested at boundary: 7 chars blocked, 8 chars allowed)
  4. Duration cap: 10 minutes (600s) — sleep 601s with valid comment still blocked
  5. Contextual guidance: Rejected attempts get specific error messages (reason too short vs. over cap) instead of generic guidance

Refactoring quality:

  • trimTrailingShellComment preserved as a thin wrapper over splitTrailingShellComment — no behavioral change for existing callers
  • detectBlockedSleepPattern preserved as public API, new detectBlockedSleepPatternDetails is internal
  • Quote/escape state machine unchanged — same coverage for single/double quotes, backticks, command substitution

No issues found.


Verdict

PASS — Safe to merge. Well-scoped feature with comprehensive test coverage and multiple safety guards against abuse. No regressions in existing tests.


Verified by wenshao (local tmux parallel execution)

@tanzhenxin tanzhenxin left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving (re-reviewed at dcece9f).

The escape hatch is secure — smuggled commands in the trailing comment never execute, and chained/newline commands stay blocked (the detector keeps the reassembling split path). The June 4 commit also fixed the trailing-& background-guard regression cleanly: trimTrailingShellComment now truncates at the comment (keepCommandsAfterCommentNewline=false) while the detector keeps reassembly, so node server & # a\n# b is blocked again, matching main. Verified the two call sites get the right variant.

One non-blocking follow-up worth a later fix: the escape hatch — and its own block-message guidance — advertises 'up to 10 minutes', but the default foreground timeout is 120s, so sleep 300 # intentional-sleep: ... passes the guard yet is killed at 2 minutes unless the caller also sets timeout. Since the model only discovers this marker via that block message (it isn't in the tool description), it's actively told 'up to 10 minutes' and may rely on it. Suggest either lowering the advertised cap to the 120s default, or auto-raising the foreground timeout to the sleep duration when an intentional-sleep is detected. Not blocking — merging now.

@tanzhenxin tanzhenxin merged commit 08d25a9 into QwenLM:main Jun 8, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type/bug Something isn't working as expected

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Foreground sleep interception blocks legitimate rate-limit backoff

3 participants