Skip to content

Add AUTO mode denial observability and caps#4476

Merged
qqqys merged 14 commits into
QwenLM:mainfrom
qqqys:auto-mode-safety-observability
Jun 2, 2026
Merged

Add AUTO mode denial observability and caps#4476
qqqys merged 14 commits into
QwenLM:mainfrom
qqqys:auto-mode-safety-observability

Conversation

@qqqys

@qqqys qqqys commented May 24, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • What changed: Added structured AUTO mode reason boundaries, a PermissionDenied hook for classifier-blocked calls, and a cumulative denial cap in addition to the existing consecutive caps.
  • Why it changed: AUTO mode needed clearer safety semantics, hook visibility for classifier denials, and protection against long sessions accumulating repeated denials without tripping a consecutive threshold.
  • Reviewer focus: Please review the separation between classifier-denied calls and fallback-to-ask, plus the total-denial reset behavior after user approval.
image

Validation

  • Commands run:
    cd packages/core && npx vitest run src/core/toolHookTriggers.test.ts src/permissions/autoMode.test.ts src/hooks/hookPlanner.test.ts src/hooks/hookSystem.test.ts src/hooks/hookEventHandler.test.ts
    cd packages/cli && npx vitest run src/ui/components/hooks/constants.test.ts
    cd packages/core && npx vitest run src/permissions/denialTracking.test.ts src/permissions/autoMode.test.ts
    npm run typecheck
    git diff --check
    npm run build
  • Prompts / inputs used: N/A
  • Expected result: PermissionDenied fires only for AUTO classifier blocks; fallback-to-ask remains a manual approval path; total denial cap triggers after 20 cumulative denials.
  • Observed result: Focused tests, typecheck, diff check, and build passed locally.
  • Quickest reviewer verification path:
    cd packages/core && npx vitest run src/permissions/denialTracking.test.ts src/permissions/autoMode.test.ts src/core/toolHookTriggers.test.ts
    cd packages/cli && npx vitest run src/ui/components/hooks/constants.test.ts
    npm run typecheck
  • Evidence: The passing commands above were run locally. The full build also completed; it reported pre-existing vscode companion lint warnings only.

Scope / Risk

  • Main risk or tradeoff: The total denial cap may force manual approval in very long AUTO sessions after repeated denied actions, even when denials are not consecutive. This is intentional safety behavior.
  • Not covered / not validated: Full telemetry parity and classifier request/message-id metadata are tracked separately.
  • Breaking changes / migration notes: Hook configs can now target PermissionDenied; existing hook behavior is unchanged.

Testing Matrix

🍏 🪟 🐧
npm run ⚠️ ⚠️
npx ⚠️ ⚠️
Docker ⚠️ N/A ⚠️
Podman ⚠️ N/A N/A
Seatbelt ⚠️ N/A N/A

Testing matrix notes:

  • macOS local checks passed.
  • Windows, Linux, Docker, Podman, and Seatbelt were not tested locally.

Linked Issues / Bugs

Related: #4475

@qqqys qqqys marked this pull request as ready for review May 24, 2026 07:54
@github-actions

Copy link
Copy Markdown
Contributor

📋 Review Summary

This PR introduces AUTO mode denial observability and caps, adding a PermissionDenied hook for classifier-blocked calls and a cumulative denial cap (20 total denials) alongside the existing consecutive caps. The implementation is well-structured with clear type definitions, comprehensive tests, and consistent patterns across CLI and core packages. The changes address safety concerns for long-running AUTO mode sessions while maintaining backward compatibility.

🔍 General Feedback

  • Strong type safety: New types (PermissionDeniedReason, FallbackToAskReason, AutoModeUnavailableReason) clearly delineate denial semantics and improve maintainability.
  • Good separation of concerns: Hook firing logic is cleanly separated from decision logic; applyAutoModeDecision centralizes outcome handling.
  • Comprehensive test coverage: Tests cover edge cases including total denial cap triggers, counter resets after approval, and reason mapping.
  • Consistent patterns: Error handling in hook functions follows established patterns (non-blocking, logged warnings, sentinel error messages).
  • Documentation quality: JSDoc comments explain the "why" behind design decisions, particularly for counter reset behavior.
  • Positive: The PR correctly identifies and addresses drift between CLI scheduler and ACP Session paths by sharing applyAutoModeDecision.

🎯 Specific Feedback

🟡 High

  • packages/core/src/permissions/denialTracking.ts:120-145 — The recordFallbackApprove function now resets total counters after the total denial cap is reached. This is a significant behavioral change from the original "telemetry-only" design. While documented, this means a single user approval after 20 denials completely erases the denial history. Consider whether this is the intended behavior or if total counters should decay more gradually (e.g., halve instead of zero) to prevent rapid re-triggering of the cap in marginally-improved sessions.

  • packages/cli/src/acp-integration/session/Session.ts:1989-2000 — The PermissionDenied hook is fired only when !this.config.getDisableAllHooks?.(). However, the hook result is not checked before returning the error response. If the hook were to return a non-empty hookError, this information is lost. Consider logging the hook result for observability even if it doesn't affect the blocking behavior.

🟢 Medium

  • packages/core/src/permissions/autoMode.ts:252-258 — The applyAutoModeDecision function maps classifier blocks to 'classifier_blocked' or 'classifier_unavailable' but doesn't include a third case for 'classifier_failure' (which appears in the PermissionDeniedReason type). The type definition includes 'classifier_failure' but no code path produces it. Either remove it from the type or add handling for classifier infrastructure failures that don't fit the "unavailable" category.

  • packages/core/src/permissions/denialTracking.ts:24 — The DenialFallbackReason type doesn't include 'total_denial' even though shouldFallback returns this reason. This type should be updated to match the implementation:

    export type DenialFallbackReason =
      | 'consecutive_block'
      | 'consecutive_unavailable'
      | 'total_denial';  // Missing!
  • packages/core/src/core/toolHookTriggers.ts:530 — The firePermissionDeniedHook function (based on the diff) follows the pattern of other hook functions but the return type PermissionDeniedHookResult has a comment "Reserved for future non-blocking hook diagnostics" with an optional hookError. This suggests the hook is purely observational. Consider making this explicit in the function JSDoc to set correct expectations for hook authors (they cannot modify or block the call).

🔵 Low

  • packages/cli/src/acp-integration/session/Session.ts:2131-2137 — The comment mentions "Parallels coreToolScheduler.ts" without a specific line reference. The original comment had "Parallels coreToolScheduler.ts:1705-1717" which was more helpful. Consider restoring the specific line numbers or adding a // TODO: keep in sync with coreToolScheduler.ts comment.

  • packages/core/src/permissions/autoMode.ts:188 — The type AutoModeDecision still has | { via: 'fallback' } without the reason field in the union, even though the code now returns { via: 'fallback', reason: FallbackToAskReason }. This appears to be an incomplete update—the old union member should be removed or updated to match the new shape.

  • packages/cli/src/ui/components/hooks/constants.test.ts:239-250 — The test count assertion ("should have 17 events") is brittle. Consider testing that DISPLAY_HOOK_EVENTS includes all expected events by name rather than asserting a specific length, which will break on future additions.

  • packages/core/src/permissions/denialTracking.ts:98-106 — The shouldFallback function checks total denials before consecutive denials. This ordering is correct for safety, but the function comment doesn't mention this priority. Consider adding a brief note: "Total denial cap takes precedence over consecutive caps."

✅ Highlights

  • Excellent type design: The separation of PermissionDeniedReason (classifier blocks) from FallbackToAskReason (manual approval triggers) shows deep understanding of the domain and prevents conflation of distinct failure modes.
  • Thoughtful counter semantics: The cross-reset behavior between block/unavailable counters and the reset-on-approve logic are well-documented with clear rationale about avoiding permanent session degradation.
  • Comprehensive validation: The test matrix and validation commands in the PR description demonstrate thorough local testing before review.
  • Proactive safety: The total denial cap (20 denials) protects against edge cases where the classifier avoids consecutive caps by alternating blocks with allows—a subtle but important attack vector.

@qqqys

qqqys commented May 24, 2026

Copy link
Copy Markdown
Collaborator Author

tmux E2E Test Report

Result: PASS

Date: 2026-05-24

Target: draft PR #4476

Scenario: verify the interactive hooks UI exposes the new PermissionDenied
event and its user-facing details.

Environment:

  • tmux: 3.6a
  • viewport: 200x50
  • command: npm run dev -- --approval-mode yolo
  • mode: interactive TUI

Steps performed:

  1. Started Qwen Code in a fresh tmux session.
  2. Waited for the interactive prompt to render.
  3. Submitted /hooks.
  4. Navigated to PermissionDenied.
  5. Opened the event detail screen.
  6. Captured the final rendered pane and cleaned up the tmux session.

Expected:

  • /hooks lists PermissionDenied.
  • The detail view describes the input payload fields:
    tool_name, tool_input, tool_use_id, and reason.
  • The detail view lists observational exit-code behavior.

Observed:

  • PermissionDenied appeared in the hooks UI.
  • The detail screen showed:
    • Input to command is JSON with tool_name, tool_input, tool_use_id, and reason.
    • 0: observe permission denial
    • Other: show stderr to user only
  • The session was cleaned up after final capture.

Notes:

  • This tmux run verifies the interactive hook-management surface.
  • Classifier-block execution behavior is covered by the focused unit tests
    listed in the PR validation section.

export type PermissionDeniedReason =
| 'classifier_blocked'
| 'classifier_unavailable'
| 'classifier_failure';

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Several type members are defined and exported but never produced by any code path: 'classifier_failure' in PermissionDeniedReason, 'plan_mode_floor' / 'org_ask_ceiling' in FallbackToAskReason, and the entire AutoModeUnavailableReason type.

Hook consumers matching on reason === 'classifier_failure' will never see it fire, and downstream code switching on FallbackToAskReason must handle variants that will never appear. The AutoModeUnavailableReason type is re-exported from permissions/index.ts with zero consumers.

Consider removing unreachable variants until they have a producer. If they represent planned features, track them in a follow-up issue rather than in the public type — or at minimum add // reserved for ... comments.

— qwen3.7-max via Qwen Code /review

| MessageBus
| undefined;
if (messageBus) {
void firePermissionDeniedHook(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The blocked path has no debug log, while the sibling fallback path (line ~1749) logs debugLogger.warn(...) with full counter state (consecutiveBlock, consecutiveUnavailable, totalBlock, totalUnavailable). The same asymmetry exists in Session.ts.

An operator debugging "AUTO mode is denying all tool calls" has zero server-side signal on the blocked path: no tool name, no reason (classifier_blocked vs classifier_unavailable), no counter state. Adding a symmetric debugLogger.warn here would close the observability gap.

Suggested change
void firePermissionDeniedHook(
case 'blocked':
debugLogger.warn(
`Auto mode blocked (${outcome.reason}): tool=${canonicalName}, ` +
`consecutiveBlock=${denialState.consecutiveBlock}, ` +
`consecutiveUnavailable=${denialState.consecutiveUnavailable}, ` +
`totalBlock=${denialState.totalBlock}, totalUnavailable=${denialState.totalUnavailable}`,
);

— qwen3.7-max via Qwen Code /review

* Fire a PermissionDenied event
* Called when AUTO mode denies a tool call without asking the user
*/
async firePermissionDeniedEvent(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] firePermissionDeniedEvent and the corresponding HookSystem.firePermissionDeniedEvent have no dedicated tests, while peer methods in the same files have extensive coverage: fireNotificationEvent (10+ tests), firePermissionRequestEvent (10+ tests), fireSubagentStartEvent (6+ tests).

Consider adding tests in hookEventHandler.test.ts following the established pattern — verifying that the correct PermissionDeniedInput is constructed and passed to executeHooks.

— qwen3.7-max via Qwen Code /review

export function recordFallbackApprove(
state: AutoModeDenialState,
): AutoModeDenialState {
if (

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] recordFallbackApprove now resets total denial counters when the total cap is reached, but both call sites (coreToolScheduler.ts:2218, Session.ts:2142) invoke it on every user approval in AUTO mode — not only when the fallback was denialTracking-forced. The guard is approvalMode === AUTO && isApproveOutcome(outcome), which also fires for unrelated manual confirmations (default 'ask' permissions, explicit ask rules).

Before this PR, this was harmless because consecutive counters were already 0 for non-fallback approvals, making recordFallbackApprove a no-op. Now the total-cap check (totalBlock + totalUnavailable >= 20) runs first and returns createDenialState(), silently resetting cumulative counters as a side effect of any approval. In a session where the user occasionally approves 'ask'-permission tools, the total denial cap may never trigger despite accumulating 20+ denials.

Consider plumbing a flag from shouldFallback to the post-approval handler, or splitting the total-cap reset into a separate function called only for denialTracking-forced fallbacks.

— qwen3.7-max via Qwen Code /review

return { kind: 'approved' };
case 'fallback':
return { kind: 'fallback' };
return { kind: 'fallback', reason: decision.reason };

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] AutoModeOutcome.reason for the fallback variant is computed here (forwarding decision.reason) but never consumed by any caller. In the scheduler, the debug log uses fallback.reason from shouldFallback(), not outcome.reason. In Session.ts, the case 'fallback': is a bare break.

The two reason sources can diverge: when both pmForcedAsk and shouldFallback are armed, outcome.reason is 'ask_rule' while the logged fallback.reason is 'consecutive_block' or 'total_denial'. Either use outcome.reason in the scheduler's debug log for accuracy, or drop the reason field from the fallback variant of AutoModeOutcome since no caller reads it.

— qwen3.7-max via Qwen Code /review

});

// ─── applyAutoModeDecision reason mapping ────────────────────────────────

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The new applyAutoModeDecision — blocked reason mapping describe block covers the two classifier paths (classifier_blocked, classifier_unavailable) but has no test for the via: 'fallback' branch. The fallback branch now returns { kind: 'fallback', reason: decision.reason } — a new shape carrying FallbackToAskReason — but this pass-through is never exercised at the applyAutoModeDecision level. The evaluateAutoMode tests verify the decision shape and stop before the outcome conversion.

If a future refactor accidentally adds a state mutation to the fallback branch (it's the only branch that does NOT call config.setAutoModeDenialState(...)) or drops the reason field, no test would catch it.

it('passes through fallback reason without mutating denial state', () => {
  const setAutoModeDenialState = vi.fn();
  const result = applyAutoModeDecision(
    { via: 'fallback', reason: 'denial_cap' },
    { setAutoModeDenialState } as unknown as Config,
    denialState,
  );
  expect(result).toEqual({ kind: 'fallback', reason: 'denial_cap' });
  expect(setAutoModeDenialState).not.toHaveBeenCalled();
});

— qwen3.7-max via Qwen Code /review

expect(setAutoModeDenialState).not.toHaveBeenCalled();
});

it('resets denial counters after approving a denialTracking fallback prompt', async () => {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The two new tests cover (1) no fallback callId + approve → no reset, and (2) fallback callId + approve → reset. But there's no negative test for (3) fallback callId + Cancel → no reset. The production code correctly guards this with isApproveOutcome(outcome), but without a test, a regression that removes this guard would silently allow cancel on a fallback prompt to reset denial counters — weakening denial-tracking protection for the rest of the session.

it('does not reset denial counters when user cancels a denialTracking fallback prompt', async () => {
  const { internals, toolCall, setAutoModeDenialState } =
    createSchedulerForDenialTrackingApprovalTest();
  internals.autoModeFallbackCallIds.add('call-1');

  await internals._handleConfirmationResponseInner(
    'call-1',
    toolCall,
    vi.fn().mockResolvedValue(undefined),
    ToolConfirmationOutcome.Cancel,
    new AbortController().signal,
  );

  expect(setAutoModeDenialState).not.toHaveBeenCalled();
});

— qwen3.7-max via Qwen Code /review

// fallback was specifically armed by denialTracking —
// a pmForcedAsk fallback isn't an audit-worthy event).
if (fallback.fallback) {
this.autoModeFallbackCallIds.add(reqInfo.callId);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] autoModeFallbackCallIds.add is gated on fallback.fallback (the stale shouldFallback predicate captured before evaluateAutoMode), not on the actual decision outcome. In evaluateAutoMode, pmForcedAsk is checked before skipClassifier — so when denialTracking is armed AND an ask rule matches the same tool call, evaluateAutoMode returns { via: 'fallback', reason: 'ask_rule' }. But because fallback.fallback was true at capture time, the callId is still added to autoModeFallbackCallIds. When the user approves the ask-rule prompt, _handleConfirmationResponseInner finds the callId and calls recordFallbackApprove, incorrectly resetting all denial counters (including totals).

The same issue exists in Session.ts:2007 where wasAutoModeDenialFallback = fallback.fallback reads from the same stale predicate.

Gate the tracking on the actual outcome reason:

Suggested change
this.autoModeFallbackCallIds.add(reqInfo.callId);
if (outcome.reason === 'denial_cap') {
this.autoModeFallbackCallIds.add(reqInfo.callId);

And in Session.ts:

wasAutoModeDenialFallback = outcome.reason === 'denial_cap';

— qwen3.7-max via Qwen Code /review

try {
await messageBus.request<HookExecutionRequest, HookExecutionResponse>(
{
type: MessageBusType.HOOK_EXECUTION_REQUEST,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Unlike every other fire*Hook function in this file (firePreToolUseHook, firePostToolUseHook, firePostToolUseFailureHook), this function:

  1. Does not check response.success from messageBus.request — a hook runner failure (success: false with an error message) is silently treated as success.
  2. Returns {} in the catch block instead of { hookError: message } — the PermissionDeniedHookResult interface declares hookError?: string but it is never populated (dead code).

Since callers use void (fire-and-forget), there is no immediate bug. But the inconsistency means the next person adding a hook will copy this pattern and miss the response check. Consider matching the established pattern:

const response = await messageBus.request<...>(...);
if (!response.success || !response.output) {
  const message = response.error?.message || 'hook runner returned no output';
  debugLogger.warn(`PermissionDenied hook response failure for ${toolName}: ${message}`);
  return { hookError: message };
}
return {};

And in the catch block: return { hookError: error instanceof Error ? error.message : String(error) };

— qwen3.7-max via Qwen Code /review

toolParams,
reqInfo.callId,
outcome.reason,
signal,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] No test in coreToolScheduler.test.ts verifies that firePermissionDeniedHook is actually called when AUTO mode blocks a tool call via the scheduler's case 'blocked' branch. The hook function has unit tests in toolHookTriggers.test.ts (3 cases), and the applyAutoModeDecision blocked-branch mapping is tested in autoMode.test.ts, but the scheduler integration path — verifying that messageBus.request is called with eventName: 'PermissionDenied' and the correct tool_name, tool_input, tool_use_id, and reason — is untested.

A regression in the wiring (wrong argument order, missing reason, or getDisableAllHooks guard removed) would go undetected.

— qwen3.7-max via Qwen Code /review

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 4 review (qwen3.7-max): no new high-confidence findings at SHA f74f0bb. All 4 confirmed issues are low confidence (terminal-only). 9 prior Suggestions remain open. 382 tests pass, tsc + eslint clean. — qwen3.7-max via Qwen Code /review

if (fallback.fallback) {
if (outcome.reason === 'denial_cap') {
this.autoModeFallbackCallIds.add(reqInfo.callId);
const fallbackReason = fallback.fallback

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The 'total_denial' else-branch in this ternary is unreachable. When outcome.reason === 'denial_cap', it is guaranteed that fallback.fallback === true — because 'denial_cap' is only produced by evaluateAutoMode when skipClassifier: true, which is set from fallback.fallback. So the ternary always evaluates to fallback.reason.

Note: when shouldFallback returns reason: 'total_denial', that value already flows through fallback.reason, so the string 'total_denial' appears in the log even without the dead branch.

Suggested change
const fallbackReason = fallback.fallback
debugLogger.warn(
`Auto mode fallback to manual approval (${fallback.reason}): ` +

— qwen3.7-max via Qwen Code /review

return earlyErrorResponse(new Error(outcome.errorMessage), fc.name);
case 'fallback':
// Drop through to the manual-approval flow below.
wasAutoModeDenialFallback = outcome.reason === 'denial_cap';

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Unlike the equivalent case 'fallback' branch in coreToolScheduler.ts (lines ~1755-1767), this path sets wasAutoModeDenialFallback but emits no debug log with the fallback reason or denial counter state. When the ACP/daemon path triggers a denial_cap fallback, operators cannot see why it was triggered or what the counter values were.

Consider mirroring the coreToolScheduler logging for parity:

case 'fallback':
  if (outcome.reason === 'denial_cap') {
    debugLogger.warn(
      `Auto mode fallback to manual approval (${fallback.reason}): ` +
        `consecutiveBlock=${denialState.consecutiveBlock}, ` +
        `consecutiveUnavailable=${denialState.consecutiveUnavailable}, ` +
        `totalBlock=${denialState.totalBlock}, ` +
        `totalUnavailable=${denialState.totalUnavailable}`,
    );
  }
  wasAutoModeDenialFallback = outcome.reason === 'denial_cap';
  break;

— qwen3.7-max via Qwen Code /review

// Caller (scheduler) has detected an armed fallback state; surface that
// so the call drops to manual approval instead of burning a classifier
// request that would deepen the denial streak.
if (input.skipClassifier) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] evaluateAutoMode collapses all shouldFallback reasons into a single 'denial_cap' via the skipClassifier: boolean parameter. The specific trigger reason (consecutive_block vs consecutive_unavailable vs total_denial) from shouldFallback is lost in the structured AutoModeOutcome.

In coreToolScheduler.ts, the specific reason is partially recovered from fallback.reason for the debug log, but outcome.reason remains 'denial_cap' regardless of which threshold triggered. In Session.ts, the specific reason is completely lost.

Consider passing the DenialFallbackReason through evaluateAutoMode (e.g., as skipClassifierReason?: DenialFallbackReason) so the structured outcome carries the specific trigger. This would also eliminate the dead fallbackReason ternary in coreToolScheduler.ts:1758-1760, since the specific reason would be available directly from outcome.

— qwen3.7-max via Qwen Code /review

case 'blocked':
debugLogger.warn(
`Auto mode blocked (${outcome.reason}): tool=${canonicalName}, ` +
`consecutiveBlock=${denialState.consecutiveBlock}, ` +

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The 4-field denial-state log template (consecutiveBlock=..., consecutiveUnavailable=..., totalBlock=..., totalUnavailable=...) is copy-pasted four times across this file and Session.ts (lines 1722, 1766 here; 1995, 2023 in Session.ts). If a future PR adds a counter to AutoModeDenialState, all four sites must be updated in lockstep.

Consider extracting a helper in denialTracking.ts:

export function formatDenialStateLog(state: AutoModeDenialState): string {
  return `consecutiveBlock=${state.consecutiveBlock}, ` +
    `consecutiveUnavailable=${state.consecutiveUnavailable}, ` +
    `totalBlock=${state.totalBlock}, ` +
    `totalUnavailable=${state.totalUnavailable}`;
}

— qwen3.7-max via Qwen Code /review

state: AutoModeDenialState,
): { fallback: true; reason: DenialFallbackReason } | { fallback: false } {
if (
state.totalBlock + state.totalUnavailable >=

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The expression state.totalBlock + state.totalUnavailable >= AUTO_MODE_DENIAL_LIMITS.maxTotalDenials appears verbatim in both shouldFallback (here) and recordFallbackApprove (line 142). If the cap calculation ever changes (weighted totals, configurable limit), both sites must be updated.

Extract a private helper:

function hasReachedTotalCap(state: AutoModeDenialState): boolean {
  return (
    state.totalBlock + state.totalUnavailable >=
    AUTO_MODE_DENIAL_LIMITS.maxTotalDenials
  );
}

— qwen3.7-max via Qwen Code /review

});

it('does NOT trigger on totalBlock alone (telemetry-only)', () => {
it('triggers fallback after 20 total denials even when they are not consecutive', () => {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The existing total-denial test carefully uses recordAllow to keep consecutive counters at zero, so the total and consecutive thresholds never overlap. No test verifies that when BOTH totalBlock + totalUnavailable >= 20 AND consecutiveBlock >= 3 are simultaneously true, shouldFallback returns 'total_denial' (not 'consecutive_block'). The precedence is implemented (total check comes first) but unguarded by test.

If someone reorders the if blocks, total_denial loses precedence and recordFallbackApprove would reset only consecutive counters instead of all counters — leaving total counters permanently pinned.

— qwen3.7-max via Qwen Code /review

totalUnavailable: 0,
};

it('maps classifier policy blocks to classifier_blocked', () => {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The blocked-path tests create setAutoModeDenialState = vi.fn() but only assert result.kind and result.reason. They never verify setAutoModeDenialState was called with the correct counter values (e.g., recordBlock(denialState) or recordUnavailable(denialState)). If the config.setAutoModeDenialState(recordBlock/recordUnavailable(denialState)) side-effect at autoMode.ts:244-248 were accidentally removed, these tests would still pass — but denial-tracking counters would stop incrementing and the total-denial cap would become unreachable.

Consider adding:

expect(setAutoModeDenialState).toHaveBeenCalledWith(
  expect.objectContaining({ consecutiveBlock: 1, totalBlock: 1 }),
);

— qwen3.7-max via Qwen Code /review

// a pmForcedAsk fallback isn't an audit-worthy event).
if (fallback.fallback) {
if (
outcome.reason === 'consecutive_block' ||

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The denial-fallback reason enumeration is hardcoded here as outcome.reason === 'consecutive_block' || outcome.reason === 'consecutive_unavailable' || outcome.reason === 'total_denial' — the same three-way check appears in Session.ts:2016-2019. Neither site has compile-time coupling to the DenialFallbackReason type in denialTracking.ts.

When a future maintainer adds a fourth value to DenialFallbackReason, nothing flags these two sites. The new reason would silently skip autoModeFallbackCallIds.add, so recordFallbackApprove would never fire and denial counters would never clear — permanently pinning the session to manual prompts.

Consider extracting a type guard in denialTracking.ts:

export function isDenialFallbackReason(
  reason: FallbackToAskReason,
): reason is DenialFallbackReason {
  return reason === 'consecutive_block' || reason === 'consecutive_unavailable' || reason === 'total_denial';
}

— qwen3.7-max via Qwen Code /review


// ─── applyAutoModeDecision reason mapping ────────────────────────────────

describe('applyAutoModeDecision — blocked reason mapping', () => {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The applyAutoModeDecision test describe block covers blocked and fallback outcomes but has no test for the classifier-approved path (via: 'classifier', shouldBlock: false). That branch at autoMode.ts:258 calls config.setAutoModeDenialState(recordAllow(denialState)) — the recovery mechanism that resets consecutive counters. If this recordAllow call were removed, sessions alternating one block and one approve would never reset consecutive counters, reaching the consecutive cap prematurely.

— qwen3.7-max via Qwen Code /review

expect(s.totalUnavailable).toBe(2);
});

it('resets total counters after the user approves a total-cap fallback prompt', () => {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The total-cap reset test alternates recordBlock + recordAllow, keeping consecutiveBlock at 0 when recordFallbackApprove is called. No test covers the case where the total cap is reached with non-zero consecutive counters (e.g., { consecutiveBlock: 2, consecutiveUnavailable: 0, totalBlock: 15, totalUnavailable: 5 }). Verifying createDenialState() clears all four fields in that scenario guards against a future refactor that might only clear totals and leave consecutives intact.

— qwen3.7-max via Qwen Code /review

wenshao
wenshao previously approved these changes May 24, 2026

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 7 review (qwen3.7-max) at SHA b8b86ea: no new findings. Clean refactoring commit that properly addresses 5 prior Suggestions (log template DRY, cap expression DRY, reason enumeration centralization, total-denial precedence test, cap overlap test). 8 prior inline comments are now stale (on earlier commits). 560 tests pass, tsc + eslint clean. — qwen3.7-max via Qwen Code /review

@LaZzyMan LaZzyMan left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

Solid feature with correct trigger semantics on the manual-approval path: PermissionDenied fires only from the AUTO classifier-block branch (not from fallback or manual deny), the cumulative-cap reset is full-wipe and gated on DenialFallbackReason, and the new fallback-marker set is the right primitive — fixing the prior "any AUTO approve resets" overshoot. Scheduler-path test coverage is good. One correctness concern below; a few minor items in the local notes aren't worth surfacing here.

1. PermissionRequest hook-handled approval bypasses the fallback-marker cleanup and cap-reset (severity: medium · confidence: very high)

The new cleanup and reset logic only fires through the scheduler's outer confirmation-response wrapper. When a PermissionRequest hook returns a decision and allows the call, the code calls onConfirm(ProceedOnce) directly — the originating callback, not the scheduler wrapper that owns the cleanup. The fallback-marker set keeps the call id, the cumulative denial counters stay at the threshold, and every subsequent classifier-eligible AUTO call keeps falling back to manual/hook handling instead of going through the classifier. The recovery semantics the cap reset is designed to provide are defeated for any user who has a PermissionRequest hook configured AND hits any DenialFallbackReason (consecutive_block, consecutive_unavailable, or total_denial). Same defect exists on the ACP path. Fix is small — route the hook-handled approval through the same cleanup, or duplicate the reset logic into the hook branch.

Verdict

COMMENT — manual-approval path is correct; hook-handled approval branch silently keeps the cap armed forever for any user who configures a PermissionRequest hook. Worth fixing before merge, but the bug is scoped to a specific configuration so I'm not blocking.

wenshao
wenshao previously approved these changes May 25, 2026

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Built this PR locally (b8b86eafc) and ran the author-listed checks plus an end-to-end behavioral harness that drives the built dist/ modules. All three claimed features check out.

Tests

Suite Result
Author-listed unit tests (toolHookTriggers, autoMode, denialTracking, hookPlanner, hookSystem, hookEventHandler, coreToolScheduler, cli hooks/constants) 594 / 594 pass
npm run typecheck (5 packages) 0 errors, 11s
packages/core full suite 9292 pass / 0 fail / 4 skip
packages/cli full suite 6792 pass / 2 fail (unrelated*) / 9 skip
-t PermissionDenied across core 8 / 8 pass

* The two cli failures are in serve/workspaceMemory.test.ts and serve/workspaceAgents.test.ts, both relying on fs.chmod(dir, 0o555) to force EACCES → 500. Under root those chmod bits are bypassed and the test gets 200. Same failures reproduce on main after reverting this PR's diff, so they're pre-existing and unrelated. (Worth a follow-up if (process.getuid?.() === 0) return; skip, same pattern as the existing win32 early-returns.)

End-to-end behavioral harness

I drove the built modules with a standalone ESM harness (no LLM/API dependency — mocks the classifier verdict shape and the MessageBus). 30 assertions covering:

Reason boundaries (autoMode.ts) — all ✅

  • via:'classifier', shouldBlock:true, unavailable:falsekind:'blocked', reason:'classifier_blocked', message "Blocked by auto mode policy: <reason>"
  • via:'classifier', shouldBlock:true, unavailable:truekind:'blocked', reason:'classifier_unavailable', message "Auto mode classifier unavailable (<reason>); action blocked for safety"
  • All 7 FallbackToAskReason values (safety_check, ask_rule, plan_mode_floor, org_ask_ceiling, consecutive_block, consecutive_unavailable, total_denial) flow through applyAutoModeDecision unchanged
  • Fast-paths route to kind:'approved' and leave denial state preserved

Cumulative cap (denialTracking.ts) — all ✅

  • AUTO_MODE_DENIAL_LIMITS = {maxConsecutiveBlock:3, maxConsecutiveUnavailable:2, maxTotalDenials:20}
  • Alternating block → allow to avoid consecutive_block=3: total cap fires at exactly cumulative 20 blocks with reason total_denial
  • total_denial takes precedence over consecutive_block when both apply (matches shouldFallback's top-of-function check)
  • Cross-reset: block×2 → unavailable zeros consecutiveBlock but keeps totalBlock=2
  • recordFallbackApprove on total-cap state → all counters reset; on consecutive-only state → only consecutive reset, totals preserved
  • recordAllow resets consecutive but never resets totals (key invariant for the cap)

PermissionDenied hook (toolHookTriggers.ts) — all ✅

  • HookEventName.PermissionDenied === 'PermissionDenied'
  • firePermissionDeniedHook dispatches MessageBus request with eventName: 'PermissionDenied', all five expected input fields (tool_name, tool_input, tool_use_id, reason, plus permission_mode if applicable)
  • Both reason: 'classifier_blocked' and 'classifier_unavailable' pass through
  • Runner success:false and thrown errors are both surfaced via hookError (no silent allow)
  • CLI scheduler (coreToolScheduler.ts:1731) and ACP Session (acp-integration/session/Session.ts:2000) both route through the shared applyAutoModeDecision and both fire the hook on outcome.kind === 'blocked' — the duplication that drove earlier review rounds is gone.

HookRegistry integration — ✅

A ~/.qwen/settings.json configured with hooks.PermissionDenied = [...] is accepted by HookRegistry.initialize() with no "Invalid hook event name" warning. Negative control with a fabricated NotARealEventName correctly produces the warning. Verified both via the registry harness and by launching the actual CLI binary with such a settings.json and watching stderr stay clean.

One nit (non-blocking)

packages/cli/src/config/settingsSchema.ts hooks.properties lists 12 hook events but is missing PermissionDenied. It's also missing TodoCreated and TodoCompleted, so this is an existing pattern, not a regression. Runtime acceptance comes from HookRegistry.isValidEventName (which uses Object.values(HookEventName)), so users can still configure the new hook — the schema gap is cosmetic / UI-only. Worth a small follow-up PR to add the three missing entries.

LGTM

Reason separation, the cumulative denial cap with total_denial precedence, and the new PermissionDenied hook are wired correctly across both the CLI scheduler and ACP Session paths. Approving.


let didRequestPermission = false;
let confirmationDetails: ToolCallConfirmationDetails | undefined;
const resetAutoModeFallbackAfterApproval = (

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] resetAutoModeFallbackAfterApproval silently resets denial counters with no debug logging. The fallback-onset path at line ~2014 logs "Auto mode fallback to manual approval" via debugLogger.warn with formatDenialStateLog, but this recovery/reset closure emits nothing.

Hook approvals are invisible — firePermissionRequestHook only logs in its catch block, never on success. The full audit trail for a hook-triggered counter reset is: (1) fallback-onset warning, (2) tool executes, (3) silence. An oncall engineer cannot distinguish "hook approved and reset counters" from "denial tracking silently broke."

Consider adding debugLogger.debug with before/after counter state:

const resetAutoModeFallbackAfterApproval = (
  outcome: ToolConfirmationOutcome,
) => {
  if (
    approvalMode === ApprovalMode.AUTO &&
    wasAutoModeDenialFallback &&
    isApproveOutcome(outcome)
  ) {
    const before = this.config.getAutoModeDenialState();
    const after = recordFallbackApprove(before);
    debugLogger.debug(
      `Auto mode denial counters reset after fallback approval: ` +
        `${formatDenialStateLog(before)} -> ${formatDenialStateLog(after)}`,
    );
    this.config.setAutoModeDenialState(after);
  }
};

The same gap exists in resetAutoModeFallbackAfterConfirmation in coreToolScheduler.ts:2342.

— qwen3.7-max via Qwen Code /review

@@ -2320,6 +2340,29 @@ export class CoreToolScheduler {
// (#4321 review-9 wenshao Critical).
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The method name and comment suggest this only acts on approval outcomes, but it unconditionally executes this.autoModeFallbackCallIds.delete(callId) regardless of outcome. The new hook-deny call site at line 1911 (ToolConfirmationOutcome.Cancel) depends entirely on this undocumented Set cleanup.

A future maintainer reading the name resetAutoModeFallbackAfterConfirmation and the comment "Cancel / abort do NOT reset" will conclude the Cancel call is dead code and remove it, leaking callIds into autoModeFallbackCallIds.

Consider splitting into two methods to make intent clear:

private clearAutoModeFallbackCallId(callId: string): void {
  this.autoModeFallbackCallIds.delete(callId);
}

private recordAutoModeFallbackResolution(
  callId: string,
  outcome: ToolConfirmationOutcome,
): void {
  const wasAutoModeFallback = this.autoModeFallbackCallIds.delete(callId);
  if (
    this.config.getApprovalMode() === ApprovalMode.AUTO &&
    wasAutoModeFallback &&
    isApproveOutcome(outcome)
  ) {
    this.config.setAutoModeDenialState(
      recordFallbackApprove(this.config.getAutoModeDenialState()),
    );
  }
}

Also consider adding debugLogger.debug with before/after counter state (same observability gap as Session.ts:2030).

— qwen3.7-max via Qwen Code /review

wenshao
wenshao previously approved these changes May 25, 2026
// (#4321 review-9 wenshao Critical).
}

private recordAutoModeFallbackResolution(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Incomplete rename: this method was renamed from resetAutoModeFallbackAfterConfirmation to recordAutoModeFallbackResolution, but the parallel closure in Session.ts:2030 still uses resetAutoModeFallbackAfterApproval. Both implement identical denial-fallback resolution logic — a maintainer grepping for the new name will miss the Session.ts code path (3 call sites at lines 2030, 2102, 2172).

Consider renaming the Session.ts closure to match (recordAutoModeFallbackResolution), or extracting the shared state-transition logic into a helper in denialTracking.ts so both call sites delegate to it (which would also resolve the duplicated 6-line block at Session.ts:2041-2048 / coreToolScheduler.ts:2361-2368).

— qwen3.7-max via Qwen Code /review

const before = this.config.getAutoModeDenialState();
const after = recordFallbackApprove(before);
debugLogger.debug(
`Auto mode denial counters reset after fallback approval: ` +

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Two issues with the new debug logging:

1. Log level inconsistency: This reset log uses debugLogger.debug() while all peer denial-tracking state-transition logs use debugLogger.warn() (blocked at :1722, fallback at :1759). The reset is the closing bracket of the same denial→fallback→reset lifecycle — using a lower level means it won't appear in the same log pipelines during incident investigation. Consider using .warn() for consistency, or adding a comment explaining why reset is below the operator-visibility threshold.

2. Missing test: No test in coreToolScheduler.test.ts or Session.test.ts asserts this log message is emitted. denialTracking.test.ts covers the pure recordFallbackApprove function but not the integration wiring to debugLogger. Consider adding at least one test per call site:

const debugSpy = vi.spyOn(debugLogger, 'debug').mockImplementation(() => {});
// trigger fallback-approval path...
expect(debugSpy).toHaveBeenCalledWith(
  expect.stringContaining('Auto mode denial counters reset'),
);

— qwen3.7-max via Qwen Code /review

`Auto mode denial counters reset after fallback approval: ` +
`${formatDenialStateLog(before)} -> ${formatDenialStateLog(after)}`,
);
this.config.setAutoModeDenialState(after);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Misleading log in no-op case: recordFallbackApprove returns the same state object unchanged when consecutiveBlock === 0 && consecutiveUnavailable === 0 (denialTracking.ts:171-173, confirmed by test at denialTracking.test.ts:250: expect(recordFallbackApprove(s)).toBe(s)). When this branch is reached (e.g., a concurrent recordAllow zeroed consecutive counters between the fallback-onset warn and this resolution), the log reads "counters reset" with identical before/after values — misleading during 3 AM debugging.

Consider guarding:

const after = recordFallbackApprove(before);
if (after !== before) {
  debugLogger.warn(
    `Auto mode denial counters reset after fallback approval: ` +
      `${formatDenialStateLog(before)} -> ${formatDenialStateLog(after)}`,
  );
  this.config.setAutoModeDenialState(after);
}

(Same change in Session.ts:2041-2048.)

— qwen3.7-max via Qwen Code /review

wenshao
wenshao previously approved these changes May 25, 2026

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 10 review (qwen3.7-max) at SHA 5eedfaab0: incremental commit cleanly addresses all 3 R9 Suggestions (rename consistency, log level, no-op log guard). 9 review agents + reverse audit found 0 high-confidence issues. 241/241 tests pass, eslint clean, CI all_pass 8/8. 4 low-confidence items noted in terminal (early-return test gap, log message precision, JSDoc contract, mock ergonomics) — none warrant inline comments. — qwen3.7-max via Qwen Code /review

@wenshao

wenshao commented May 26, 2026

Copy link
Copy Markdown
Collaborator

本地验证报告 — PR #4476

作为 maintainer 在本地用 tmux 跑了完整真实验证。结论:通过,建议合并

验证环境

平台 macOS 15.4 (Darwin 25.4.0)
Node / npm v22.17.0 / 11.8.0
工作树 /private/tmp/pr4476-test
PR HEAD 5eedfaab0 (test(core): assert auto fallback reset logging)
origin/main 的 merge-base 84f408017
origin/main HEAD e8b79d7721
git merge-tree origin/main pr-4476-test 干净(exit 0、无冲突标记)
提交数 12
tmux session pr4476-test(4 windows: install / unit-test / manual-repro / misc)

1. 单元测试 — 所有改动文件

Suite Tests
src/permissions/denialTracking.test.ts 25 ✅
src/permissions/autoMode.test.ts 33 ✅
src/core/coreToolScheduler.test.ts 167 ✅
src/core/toolHookTriggers.test.ts 54 ✅
src/hooks/hookPlanner.test.ts 58 ✅
src/hooks/hookSystem.test.ts 75 ✅
src/hooks/hookEventHandler.test.ts 118 ✅
src/hooks/hookAggregator.test.ts(间接) 41 ✅
src/config/config.test.ts(间接) 172 ✅
cli/src/acp-integration/session/Session.test.ts 74 ✅
cli/src/ui/components/hooks/constants.test.ts 31 ✅
合计 848 / 848 ✅

跨 11 个 test file,包括 PR 直接修改的文件 + 通过 grep -r 在 monorepo 里搜出来的所有间接引用 AUTO_MODE_DENIAL_LIMITS / recordFallbackApprove / shouldFallback / firePermissionDeniedHook / firePermissionDeniedEvent / PermissionDenied / isDenialFallbackReason / formatDenialStateLog / skipClassifierReason / FallbackToAskReason / PermissionDeniedReason 的测试。

2. 静态检查

CORE-TSC=0      # npx tsc --noEmit -p packages/core/tsconfig.json
CLI-TSC=0       # npx tsc --noEmit -p packages/cli/tsconfig.json
ESLINT=0        # 21 个改动文件,--max-warnings 0

3. 真实端到端复现(用构建产物)

构建后写了一个 独立 repro 脚本,直接 importdist/src/permissions/denialTracking.js 真实产物,覆盖 PR 描述里强调的全部行为契约:

Case 1: AUTO_MODE_DENIAL_LIMITS.maxTotalDenials === 20
  PASS  maxTotalDenials = 20
  PASS  maxConsecutiveBlock = 3   (existing)
  PASS  maxConsecutiveUnavailable = 2  (existing)

Case 2: alternating block/allow pattern → total_denial after 20
  PASS  totalBlock accumulated to 20 (got 20)
  PASS  consecutiveBlock stayed below cap (got 0, cap is 3)
  PASS  shouldFallback = { fallback: true, reason: total_denial }
  ─ 直接复现了 PR 想堵的攻击面:模型 block→block→allow 循环可以让连续计数永远清零,
    在 PR 之前能逃过所有 cap;现在 10 次循环(20 个 block)就触发 total_denial。

Case 3: consecutive_block fires when 3 consecutive blocks      PASS
Case 4: consecutive_unavailable fires when 2 consecutive       PASS
Case 5: total cap takes precedence over consecutive cap        PASS  reason=total_denial
Case 6: recordFallbackApprove full-reset on total_denial
  PASS  state fully cleared: consecutiveBlock=0, consecutiveUnavailable=0, totalBlock=0, totalUnavailable=0
  PASS  shouldFallback after reset = {"fallback":false}
  ─ 关键 UX 修复:到达 total cap 后用户手动放行,session 不会被永久钉死在 manual prompt。
Case 7: recordFallbackApprove preserves total on consecutive-only cap
  PASS  consecutive cleared, total preserved (totalBlock=3 stays at 3)
  ─ 对称性:只是触发了连续 cap 时的恢复**不**清空累计历史,long-running session 的累计语义保留。

Case 8: isDenialFallbackReason discrimination
  PASS  "consecutive_block" / "consecutive_unavailable" / "total_denial" → true
  PASS  "ask_rule" / "safety_check" / "plan_mode_floor" / "org_ask_ceiling" → false
  ─ 这点很关键:FallbackToAskReason ⊋ DenialFallbackReason,scheduler 用 isDenialFallbackReason
    区分「真正的 denial 自动 fallback」和「policy/ask-rule fallback」,前者才记入
    autoModeFallbackCallIds 并在用户放行时重置计数器。区分错就会破坏 PR 的核心契约。

Case 9: formatDenialStateLog output format
  PASS  exact: consecutiveBlock=2, consecutiveUnavailable=1, totalBlock=5, totalUnavailable=3
  ─ coreToolScheduler 和 Session 的 debug warn 直接拼这段;格式有任何漂移会破坏运维日志。

Case 10: blocks + unavailable mix toward total cap
  PASS  totalBlock=10 + totalUnavailable=10 = 20 → total_denial fires
  ─ 累计 cap 跨 block / unavailable 两类计数,不会被「换品种」绕过。

PR 4476 repro: 26 pass, 0 fail

4. 代码层面的观察

正向

  • autoModeFallbackCallIds: Set<string> 把 "这次 fallback 是 denialTracking 主动 fallback、不是 ask_rule" 的标记绑定到 callId,而不是在外部用一个 boolean 飘着 —— 修掉了之前 coreToolScheduler.ts:2185-2202 的隐患:原来对所有 AUTO 模式的人工放行都会调用 recordFallbackApprove,会把刚累积起来的 totalBlock 在一次 ask_rule 放行里被清空(旧逻辑里 recordFallbackApprove 不动 total,所以表面无害;但 PR 给 recordFallbackApprove 加了「total cap 时全清」的分支后,旧的无差别调用就会让累计 cap 形同虚设)。Case 7 / 8 一起验证了这一对修复是自洽的。
  • Session.ts 也用同一个语义(wasAutoModeDenialFallback boolean),代码路径与 scheduler 镜像对齐;两边的 recordAutoModeFallbackResolution 包装函数都把 before/after 日志打全,运维诊断友好。
  • firePermissionDeniedHookvoid 抛入(fire-and-forget),不阻塞用户响应;AbortSignal 传下去支持中途取消。
  • 类型层:AutoModeUnavailableReason / FallbackToAskReason / PermissionDeniedReason 三个 union 把过去字符串化的 reason 都收口;FallbackToAskReason = 'safety_check' | 'ask_rule' | 'plan_mode_floor' | 'org_ask_ceiling' | DenialFallbackReason 这种包含关系保证了 switch 穷尽性。
  • evaluateAutoModeskipClassifier: booleanskipClassifierReason: DenialFallbackReason 把「为什么 skip」信息从布尔升级到结构化原因,让下游能讲清「为什么 fallback」而不是只能讲「fallback 了」。
  • 新增的 PermissionDenied hook 只在 AUTO 分类器拦截时触发,不掺合 fallback-to-ask;这就避免了观察者侧把「manual 路径上用户点了 deny」误判成「自动模式拦截」。

潜在风险(不阻塞,仅记录)

  • maxTotalDenials = 20 是硬编码;如果有团队跑特别长的 AUTO session 会比较容易撞上。但 PR 描述里明确说这是 intentional safety behavior,且 recordFallbackApprove 一次放行就重置 —— UX 上是「卡一下确认就继续跑」,可接受。
  • Session.ts 和 coreToolScheduler.ts 的 fallback resolution 逻辑各写了一份(PR 自己也注释了「duplicated from coreToolScheduler.ts」)。这次也是同步改两份,没漏改。但如果以后只改一边会再次漂移。建议未来抽到 permissions/ 下的共享 helper,本 PR 不阻塞。
  • firePermissionDeniedHookvoid 启动后没有任何 await,如果 hook runner 自己 hang 住会留个游离 promise。AbortSignal 能救场,但前提是上游 cancel。

5. 没在本地覆盖

  • Windows / Linux 真机(仅 macOS)—— 但 CI 矩阵 3 个平台都 ✅。
  • ACP 协议层端到端(需 ACP client 联调)—— Session.test.ts 的 74 个用例已覆盖核心分支。
  • 实际 LLM 触发的 classifier_blocked 流(需在线分类器)—— 这层 PR 没动,复用现有路径。

结论

维度 状态
全部单元测试(11 文件 / 848 用例)
TSC 类型检查(core + cli)
ESLint(21 改动文件,零警告)
真实路径端到端 repro(26 项契约断言)
git merge-tree 与最新 main ✅ 无冲突
CI 远端(Lint / 3 平台 Test / CodeQL)
GitHub mergeable 标志 ✅ MERGEABLE
Breaking change ✅ 无(新增 hook 事件、新增 reason,向后兼容)
行为契约(PR claims) ✅ 全部独立复现验证

建议合并。 没有阻塞问题。后续可以单开 issue 跟踪 Session.ts / coreToolScheduler.ts 的 fallback resolution 抽公共 helper。

— wenshao

@longbinlai

Copy link
Copy Markdown

⚠️ Cross-PR Conflict Detected — Large Conflict Group (5 PRs)

This PR is part of a conflict group with PR #4507, PR #4520, PR #4545, and PR #4546. These PRs share modifications to core daemon and session handling code.

Key shared functions:

  • promptpackages/cli/src/acp-integration/session/Session.ts
  • expectCompressBeforeSendpackages/cli/src/acp-integration/session/Session.test.ts
  • waitFor — multiple test files
  • _schedule / schedulepackages/core/src/core/coreToolScheduler.ts
  • getHookMatcherTarget, getHookExitCodes, getHookShortDescriptionpackages/cli/src/ui/components/hooks/constants.ts
  • applyAutoModeDecision, recordFallbackApprovepackages/core/src/permissions/

Recommendation: This is the largest conflict group touching core daemon, ACP bridge, hooks, and auto-mode functionality. Strongly recommend coordinating with all PR authors and performing integration testing before merging.


This conflict was detected automatically by CodeScope skills — structural analysis of call graphs and file modifications.

@longbinlai

Copy link
Copy Markdown

⚠️ Cross-PR Conflict Detected — Large Conflict Group (5 PRs)

This PR is part of a conflict group with PR #4507, PR #4520, PR #4545, and PR #4546. These PRs share modifications to core daemon and session handling code.

Key shared functions:

  • promptpackages/cli/src/acp-integration/session/Session.ts
  • expectCompressBeforeSendpackages/cli/src/acp-integration/session/Session.test.ts
  • waitFor — multiple test files
  • _schedule / schedulepackages/core/src/core/coreToolScheduler.ts
  • getHookMatcherTarget, getHookExitCodes, getHookShortDescriptionpackages/cli/src/ui/components/hooks/constants.ts
  • applyAutoModeDecision, recordFallbackApprovepackages/core/src/permissions/

Recommendation: This is the largest conflict group touching core daemon, ACP bridge, hooks, and auto-mode functionality. Strongly recommend coordinating with all PR authors and performing integration testing before merging.


This conflict was detected automatically by CodeScope skills — structural analysis of call graphs and file modifications.

…e-4476-conflicts

# Conflicts:
#	packages/cli/src/acp-integration/session/Session.ts
#	packages/cli/src/ui/components/hooks/constants.ts
#	packages/core/src/core/coreToolScheduler.test.ts
#	packages/core/src/hooks/hookEventHandler.test.ts
#	packages/core/src/hooks/hookEventHandler.ts
#	packages/core/src/hooks/hookSystem.test.ts
#	packages/core/src/hooks/hookSystem.ts
#	packages/core/src/hooks/types.ts
#	packages/core/src/permissions/index.ts
wenshao
wenshao previously approved these changes May 29, 2026
@qqqys qqqys requested review from pomelo-nwu and yiliang114 June 1, 2026 02:15
…ode-safety-observability

# Conflicts:
#	packages/cli/src/ui/components/hooks/constants.test.ts

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found. LGTM! ✅ — qwen3.7-max via Qwen Code /review

@qqqys qqqys merged commit 1a5b979 into QwenLM:main Jun 2, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type/feature-request New feature or enhancement request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants