Add AUTO mode denial observability and caps#4476
Conversation
📋 Review SummaryThis PR introduces AUTO mode denial observability and caps, adding a 🔍 General Feedback
🎯 Specific Feedback🟡 High
🟢 Medium
🔵 Low
✅ Highlights
|
tmux E2E Test ReportResult: PASS Date: 2026-05-24 Target: draft PR #4476 Scenario: verify the interactive hooks UI exposes the new Environment:
Steps performed:
Expected:
Observed:
Notes:
|
| export type PermissionDeniedReason = | ||
| | 'classifier_blocked' | ||
| | 'classifier_unavailable' | ||
| | 'classifier_failure'; |
There was a problem hiding this comment.
[Suggestion] Several type members are defined and exported but never produced by any code path: 'classifier_failure' in PermissionDeniedReason, 'plan_mode_floor' / 'org_ask_ceiling' in FallbackToAskReason, and the entire AutoModeUnavailableReason type.
Hook consumers matching on reason === 'classifier_failure' will never see it fire, and downstream code switching on FallbackToAskReason must handle variants that will never appear. The AutoModeUnavailableReason type is re-exported from permissions/index.ts with zero consumers.
Consider removing unreachable variants until they have a producer. If they represent planned features, track them in a follow-up issue rather than in the public type — or at minimum add // reserved for ... comments.
— qwen3.7-max via Qwen Code /review
| | MessageBus | ||
| | undefined; | ||
| if (messageBus) { | ||
| void firePermissionDeniedHook( |
There was a problem hiding this comment.
[Suggestion] The blocked path has no debug log, while the sibling fallback path (line ~1749) logs debugLogger.warn(...) with full counter state (consecutiveBlock, consecutiveUnavailable, totalBlock, totalUnavailable). The same asymmetry exists in Session.ts.
An operator debugging "AUTO mode is denying all tool calls" has zero server-side signal on the blocked path: no tool name, no reason (classifier_blocked vs classifier_unavailable), no counter state. Adding a symmetric debugLogger.warn here would close the observability gap.
| void firePermissionDeniedHook( | |
| case 'blocked': | |
| debugLogger.warn( | |
| `Auto mode blocked (${outcome.reason}): tool=${canonicalName}, ` + | |
| `consecutiveBlock=${denialState.consecutiveBlock}, ` + | |
| `consecutiveUnavailable=${denialState.consecutiveUnavailable}, ` + | |
| `totalBlock=${denialState.totalBlock}, totalUnavailable=${denialState.totalUnavailable}`, | |
| ); |
— qwen3.7-max via Qwen Code /review
| * Fire a PermissionDenied event | ||
| * Called when AUTO mode denies a tool call without asking the user | ||
| */ | ||
| async firePermissionDeniedEvent( |
There was a problem hiding this comment.
[Suggestion] firePermissionDeniedEvent and the corresponding HookSystem.firePermissionDeniedEvent have no dedicated tests, while peer methods in the same files have extensive coverage: fireNotificationEvent (10+ tests), firePermissionRequestEvent (10+ tests), fireSubagentStartEvent (6+ tests).
Consider adding tests in hookEventHandler.test.ts following the established pattern — verifying that the correct PermissionDeniedInput is constructed and passed to executeHooks.
— qwen3.7-max via Qwen Code /review
| export function recordFallbackApprove( | ||
| state: AutoModeDenialState, | ||
| ): AutoModeDenialState { | ||
| if ( |
There was a problem hiding this comment.
[Suggestion] recordFallbackApprove now resets total denial counters when the total cap is reached, but both call sites (coreToolScheduler.ts:2218, Session.ts:2142) invoke it on every user approval in AUTO mode — not only when the fallback was denialTracking-forced. The guard is approvalMode === AUTO && isApproveOutcome(outcome), which also fires for unrelated manual confirmations (default 'ask' permissions, explicit ask rules).
Before this PR, this was harmless because consecutive counters were already 0 for non-fallback approvals, making recordFallbackApprove a no-op. Now the total-cap check (totalBlock + totalUnavailable >= 20) runs first and returns createDenialState(), silently resetting cumulative counters as a side effect of any approval. In a session where the user occasionally approves 'ask'-permission tools, the total denial cap may never trigger despite accumulating 20+ denials.
Consider plumbing a flag from shouldFallback to the post-approval handler, or splitting the total-cap reset into a separate function called only for denialTracking-forced fallbacks.
— qwen3.7-max via Qwen Code /review
| return { kind: 'approved' }; | ||
| case 'fallback': | ||
| return { kind: 'fallback' }; | ||
| return { kind: 'fallback', reason: decision.reason }; |
There was a problem hiding this comment.
[Suggestion] AutoModeOutcome.reason for the fallback variant is computed here (forwarding decision.reason) but never consumed by any caller. In the scheduler, the debug log uses fallback.reason from shouldFallback(), not outcome.reason. In Session.ts, the case 'fallback': is a bare break.
The two reason sources can diverge: when both pmForcedAsk and shouldFallback are armed, outcome.reason is 'ask_rule' while the logged fallback.reason is 'consecutive_block' or 'total_denial'. Either use outcome.reason in the scheduler's debug log for accuracy, or drop the reason field from the fallback variant of AutoModeOutcome since no caller reads it.
— qwen3.7-max via Qwen Code /review
| }); | ||
|
|
||
| // ─── applyAutoModeDecision reason mapping ──────────────────────────────── | ||
|
|
There was a problem hiding this comment.
[Suggestion] The new applyAutoModeDecision — blocked reason mapping describe block covers the two classifier paths (classifier_blocked, classifier_unavailable) but has no test for the via: 'fallback' branch. The fallback branch now returns { kind: 'fallback', reason: decision.reason } — a new shape carrying FallbackToAskReason — but this pass-through is never exercised at the applyAutoModeDecision level. The evaluateAutoMode tests verify the decision shape and stop before the outcome conversion.
If a future refactor accidentally adds a state mutation to the fallback branch (it's the only branch that does NOT call config.setAutoModeDenialState(...)) or drops the reason field, no test would catch it.
it('passes through fallback reason without mutating denial state', () => {
const setAutoModeDenialState = vi.fn();
const result = applyAutoModeDecision(
{ via: 'fallback', reason: 'denial_cap' },
{ setAutoModeDenialState } as unknown as Config,
denialState,
);
expect(result).toEqual({ kind: 'fallback', reason: 'denial_cap' });
expect(setAutoModeDenialState).not.toHaveBeenCalled();
});— qwen3.7-max via Qwen Code /review
| expect(setAutoModeDenialState).not.toHaveBeenCalled(); | ||
| }); | ||
|
|
||
| it('resets denial counters after approving a denialTracking fallback prompt', async () => { |
There was a problem hiding this comment.
[Suggestion] The two new tests cover (1) no fallback callId + approve → no reset, and (2) fallback callId + approve → reset. But there's no negative test for (3) fallback callId + Cancel → no reset. The production code correctly guards this with isApproveOutcome(outcome), but without a test, a regression that removes this guard would silently allow cancel on a fallback prompt to reset denial counters — weakening denial-tracking protection for the rest of the session.
it('does not reset denial counters when user cancels a denialTracking fallback prompt', async () => {
const { internals, toolCall, setAutoModeDenialState } =
createSchedulerForDenialTrackingApprovalTest();
internals.autoModeFallbackCallIds.add('call-1');
await internals._handleConfirmationResponseInner(
'call-1',
toolCall,
vi.fn().mockResolvedValue(undefined),
ToolConfirmationOutcome.Cancel,
new AbortController().signal,
);
expect(setAutoModeDenialState).not.toHaveBeenCalled();
});— qwen3.7-max via Qwen Code /review
| // fallback was specifically armed by denialTracking — | ||
| // a pmForcedAsk fallback isn't an audit-worthy event). | ||
| if (fallback.fallback) { | ||
| this.autoModeFallbackCallIds.add(reqInfo.callId); |
There was a problem hiding this comment.
[Suggestion] autoModeFallbackCallIds.add is gated on fallback.fallback (the stale shouldFallback predicate captured before evaluateAutoMode), not on the actual decision outcome. In evaluateAutoMode, pmForcedAsk is checked before skipClassifier — so when denialTracking is armed AND an ask rule matches the same tool call, evaluateAutoMode returns { via: 'fallback', reason: 'ask_rule' }. But because fallback.fallback was true at capture time, the callId is still added to autoModeFallbackCallIds. When the user approves the ask-rule prompt, _handleConfirmationResponseInner finds the callId and calls recordFallbackApprove, incorrectly resetting all denial counters (including totals).
The same issue exists in Session.ts:2007 where wasAutoModeDenialFallback = fallback.fallback reads from the same stale predicate.
Gate the tracking on the actual outcome reason:
| this.autoModeFallbackCallIds.add(reqInfo.callId); | |
| if (outcome.reason === 'denial_cap') { | |
| this.autoModeFallbackCallIds.add(reqInfo.callId); |
And in Session.ts:
wasAutoModeDenialFallback = outcome.reason === 'denial_cap';— qwen3.7-max via Qwen Code /review
| try { | ||
| await messageBus.request<HookExecutionRequest, HookExecutionResponse>( | ||
| { | ||
| type: MessageBusType.HOOK_EXECUTION_REQUEST, |
There was a problem hiding this comment.
[Suggestion] Unlike every other fire*Hook function in this file (firePreToolUseHook, firePostToolUseHook, firePostToolUseFailureHook), this function:
- Does not check
response.successfrommessageBus.request— a hook runner failure (success: falsewith an error message) is silently treated as success. - Returns
{}in thecatchblock instead of{ hookError: message }— thePermissionDeniedHookResultinterface declareshookError?: stringbut it is never populated (dead code).
Since callers use void (fire-and-forget), there is no immediate bug. But the inconsistency means the next person adding a hook will copy this pattern and miss the response check. Consider matching the established pattern:
const response = await messageBus.request<...>(...);
if (!response.success || !response.output) {
const message = response.error?.message || 'hook runner returned no output';
debugLogger.warn(`PermissionDenied hook response failure for ${toolName}: ${message}`);
return { hookError: message };
}
return {};And in the catch block: return { hookError: error instanceof Error ? error.message : String(error) };
— qwen3.7-max via Qwen Code /review
| toolParams, | ||
| reqInfo.callId, | ||
| outcome.reason, | ||
| signal, |
There was a problem hiding this comment.
[Suggestion] No test in coreToolScheduler.test.ts verifies that firePermissionDeniedHook is actually called when AUTO mode blocks a tool call via the scheduler's case 'blocked' branch. The hook function has unit tests in toolHookTriggers.test.ts (3 cases), and the applyAutoModeDecision blocked-branch mapping is tested in autoMode.test.ts, but the scheduler integration path — verifying that messageBus.request is called with eventName: 'PermissionDenied' and the correct tool_name, tool_input, tool_use_id, and reason — is untested.
A regression in the wiring (wrong argument order, missing reason, or getDisableAllHooks guard removed) would go undetected.
— qwen3.7-max via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
Round 4 review (qwen3.7-max): no new high-confidence findings at SHA f74f0bb. All 4 confirmed issues are low confidence (terminal-only). 9 prior Suggestions remain open. 382 tests pass, tsc + eslint clean. — qwen3.7-max via Qwen Code /review
| if (fallback.fallback) { | ||
| if (outcome.reason === 'denial_cap') { | ||
| this.autoModeFallbackCallIds.add(reqInfo.callId); | ||
| const fallbackReason = fallback.fallback |
There was a problem hiding this comment.
[Suggestion] The 'total_denial' else-branch in this ternary is unreachable. When outcome.reason === 'denial_cap', it is guaranteed that fallback.fallback === true — because 'denial_cap' is only produced by evaluateAutoMode when skipClassifier: true, which is set from fallback.fallback. So the ternary always evaluates to fallback.reason.
Note: when shouldFallback returns reason: 'total_denial', that value already flows through fallback.reason, so the string 'total_denial' appears in the log even without the dead branch.
| const fallbackReason = fallback.fallback | |
| debugLogger.warn( | |
| `Auto mode fallback to manual approval (${fallback.reason}): ` + |
— qwen3.7-max via Qwen Code /review
| return earlyErrorResponse(new Error(outcome.errorMessage), fc.name); | ||
| case 'fallback': | ||
| // Drop through to the manual-approval flow below. | ||
| wasAutoModeDenialFallback = outcome.reason === 'denial_cap'; |
There was a problem hiding this comment.
[Suggestion] Unlike the equivalent case 'fallback' branch in coreToolScheduler.ts (lines ~1755-1767), this path sets wasAutoModeDenialFallback but emits no debug log with the fallback reason or denial counter state. When the ACP/daemon path triggers a denial_cap fallback, operators cannot see why it was triggered or what the counter values were.
Consider mirroring the coreToolScheduler logging for parity:
case 'fallback':
if (outcome.reason === 'denial_cap') {
debugLogger.warn(
`Auto mode fallback to manual approval (${fallback.reason}): ` +
`consecutiveBlock=${denialState.consecutiveBlock}, ` +
`consecutiveUnavailable=${denialState.consecutiveUnavailable}, ` +
`totalBlock=${denialState.totalBlock}, ` +
`totalUnavailable=${denialState.totalUnavailable}`,
);
}
wasAutoModeDenialFallback = outcome.reason === 'denial_cap';
break;— qwen3.7-max via Qwen Code /review
| // Caller (scheduler) has detected an armed fallback state; surface that | ||
| // so the call drops to manual approval instead of burning a classifier | ||
| // request that would deepen the denial streak. | ||
| if (input.skipClassifier) { |
There was a problem hiding this comment.
[Suggestion] evaluateAutoMode collapses all shouldFallback reasons into a single 'denial_cap' via the skipClassifier: boolean parameter. The specific trigger reason (consecutive_block vs consecutive_unavailable vs total_denial) from shouldFallback is lost in the structured AutoModeOutcome.
In coreToolScheduler.ts, the specific reason is partially recovered from fallback.reason for the debug log, but outcome.reason remains 'denial_cap' regardless of which threshold triggered. In Session.ts, the specific reason is completely lost.
Consider passing the DenialFallbackReason through evaluateAutoMode (e.g., as skipClassifierReason?: DenialFallbackReason) so the structured outcome carries the specific trigger. This would also eliminate the dead fallbackReason ternary in coreToolScheduler.ts:1758-1760, since the specific reason would be available directly from outcome.
— qwen3.7-max via Qwen Code /review
| case 'blocked': | ||
| debugLogger.warn( | ||
| `Auto mode blocked (${outcome.reason}): tool=${canonicalName}, ` + | ||
| `consecutiveBlock=${denialState.consecutiveBlock}, ` + |
There was a problem hiding this comment.
[Suggestion] The 4-field denial-state log template (consecutiveBlock=..., consecutiveUnavailable=..., totalBlock=..., totalUnavailable=...) is copy-pasted four times across this file and Session.ts (lines 1722, 1766 here; 1995, 2023 in Session.ts). If a future PR adds a counter to AutoModeDenialState, all four sites must be updated in lockstep.
Consider extracting a helper in denialTracking.ts:
export function formatDenialStateLog(state: AutoModeDenialState): string {
return `consecutiveBlock=${state.consecutiveBlock}, ` +
`consecutiveUnavailable=${state.consecutiveUnavailable}, ` +
`totalBlock=${state.totalBlock}, ` +
`totalUnavailable=${state.totalUnavailable}`;
}— qwen3.7-max via Qwen Code /review
| state: AutoModeDenialState, | ||
| ): { fallback: true; reason: DenialFallbackReason } | { fallback: false } { | ||
| if ( | ||
| state.totalBlock + state.totalUnavailable >= |
There was a problem hiding this comment.
[Suggestion] The expression state.totalBlock + state.totalUnavailable >= AUTO_MODE_DENIAL_LIMITS.maxTotalDenials appears verbatim in both shouldFallback (here) and recordFallbackApprove (line 142). If the cap calculation ever changes (weighted totals, configurable limit), both sites must be updated.
Extract a private helper:
function hasReachedTotalCap(state: AutoModeDenialState): boolean {
return (
state.totalBlock + state.totalUnavailable >=
AUTO_MODE_DENIAL_LIMITS.maxTotalDenials
);
}— qwen3.7-max via Qwen Code /review
| }); | ||
|
|
||
| it('does NOT trigger on totalBlock alone (telemetry-only)', () => { | ||
| it('triggers fallback after 20 total denials even when they are not consecutive', () => { |
There was a problem hiding this comment.
[Suggestion] The existing total-denial test carefully uses recordAllow to keep consecutive counters at zero, so the total and consecutive thresholds never overlap. No test verifies that when BOTH totalBlock + totalUnavailable >= 20 AND consecutiveBlock >= 3 are simultaneously true, shouldFallback returns 'total_denial' (not 'consecutive_block'). The precedence is implemented (total check comes first) but unguarded by test.
If someone reorders the if blocks, total_denial loses precedence and recordFallbackApprove would reset only consecutive counters instead of all counters — leaving total counters permanently pinned.
— qwen3.7-max via Qwen Code /review
| totalUnavailable: 0, | ||
| }; | ||
|
|
||
| it('maps classifier policy blocks to classifier_blocked', () => { |
There was a problem hiding this comment.
[Suggestion] The blocked-path tests create setAutoModeDenialState = vi.fn() but only assert result.kind and result.reason. They never verify setAutoModeDenialState was called with the correct counter values (e.g., recordBlock(denialState) or recordUnavailable(denialState)). If the config.setAutoModeDenialState(recordBlock/recordUnavailable(denialState)) side-effect at autoMode.ts:244-248 were accidentally removed, these tests would still pass — but denial-tracking counters would stop incrementing and the total-denial cap would become unreachable.
Consider adding:
expect(setAutoModeDenialState).toHaveBeenCalledWith(
expect.objectContaining({ consecutiveBlock: 1, totalBlock: 1 }),
);— qwen3.7-max via Qwen Code /review
| // a pmForcedAsk fallback isn't an audit-worthy event). | ||
| if (fallback.fallback) { | ||
| if ( | ||
| outcome.reason === 'consecutive_block' || |
There was a problem hiding this comment.
[Suggestion] The denial-fallback reason enumeration is hardcoded here as outcome.reason === 'consecutive_block' || outcome.reason === 'consecutive_unavailable' || outcome.reason === 'total_denial' — the same three-way check appears in Session.ts:2016-2019. Neither site has compile-time coupling to the DenialFallbackReason type in denialTracking.ts.
When a future maintainer adds a fourth value to DenialFallbackReason, nothing flags these two sites. The new reason would silently skip autoModeFallbackCallIds.add, so recordFallbackApprove would never fire and denial counters would never clear — permanently pinning the session to manual prompts.
Consider extracting a type guard in denialTracking.ts:
export function isDenialFallbackReason(
reason: FallbackToAskReason,
): reason is DenialFallbackReason {
return reason === 'consecutive_block' || reason === 'consecutive_unavailable' || reason === 'total_denial';
}— qwen3.7-max via Qwen Code /review
|
|
||
| // ─── applyAutoModeDecision reason mapping ──────────────────────────────── | ||
|
|
||
| describe('applyAutoModeDecision — blocked reason mapping', () => { |
There was a problem hiding this comment.
[Suggestion] The applyAutoModeDecision test describe block covers blocked and fallback outcomes but has no test for the classifier-approved path (via: 'classifier', shouldBlock: false). That branch at autoMode.ts:258 calls config.setAutoModeDenialState(recordAllow(denialState)) — the recovery mechanism that resets consecutive counters. If this recordAllow call were removed, sessions alternating one block and one approve would never reset consecutive counters, reaching the consecutive cap prematurely.
— qwen3.7-max via Qwen Code /review
| expect(s.totalUnavailable).toBe(2); | ||
| }); | ||
|
|
||
| it('resets total counters after the user approves a total-cap fallback prompt', () => { |
There was a problem hiding this comment.
[Suggestion] The total-cap reset test alternates recordBlock + recordAllow, keeping consecutiveBlock at 0 when recordFallbackApprove is called. No test covers the case where the total cap is reached with non-zero consecutive counters (e.g., { consecutiveBlock: 2, consecutiveUnavailable: 0, totalBlock: 15, totalUnavailable: 5 }). Verifying createDenialState() clears all four fields in that scenario guards against a future refactor that might only clear totals and leave consecutives intact.
— qwen3.7-max via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
Round 7 review (qwen3.7-max) at SHA b8b86ea: no new findings. Clean refactoring commit that properly addresses 5 prior Suggestions (log template DRY, cap expression DRY, reason enumeration centralization, total-denial precedence test, cap overlap test). 8 prior inline comments are now stale (on earlier commits). 560 tests pass, tsc + eslint clean. — qwen3.7-max via Qwen Code /review
LaZzyMan
left a comment
There was a problem hiding this comment.
Review
Solid feature with correct trigger semantics on the manual-approval path: PermissionDenied fires only from the AUTO classifier-block branch (not from fallback or manual deny), the cumulative-cap reset is full-wipe and gated on DenialFallbackReason, and the new fallback-marker set is the right primitive — fixing the prior "any AUTO approve resets" overshoot. Scheduler-path test coverage is good. One correctness concern below; a few minor items in the local notes aren't worth surfacing here.
1. PermissionRequest hook-handled approval bypasses the fallback-marker cleanup and cap-reset (severity: medium · confidence: very high)
The new cleanup and reset logic only fires through the scheduler's outer confirmation-response wrapper. When a PermissionRequest hook returns a decision and allows the call, the code calls onConfirm(ProceedOnce) directly — the originating callback, not the scheduler wrapper that owns the cleanup. The fallback-marker set keeps the call id, the cumulative denial counters stay at the threshold, and every subsequent classifier-eligible AUTO call keeps falling back to manual/hook handling instead of going through the classifier. The recovery semantics the cap reset is designed to provide are defeated for any user who has a PermissionRequest hook configured AND hits any DenialFallbackReason (consecutive_block, consecutive_unavailable, or total_denial). Same defect exists on the ACP path. Fix is small — route the hook-handled approval through the same cleanup, or duplicate the reset logic into the hook branch.
Verdict
COMMENT — manual-approval path is correct; hook-handled approval branch silently keeps the cap armed forever for any user who configures a PermissionRequest hook. Worth fixing before merge, but the bug is scoped to a specific configuration so I'm not blocking.
wenshao
left a comment
There was a problem hiding this comment.
Built this PR locally (b8b86eafc) and ran the author-listed checks plus an end-to-end behavioral harness that drives the built dist/ modules. All three claimed features check out.
Tests
| Suite | Result |
|---|---|
| Author-listed unit tests (toolHookTriggers, autoMode, denialTracking, hookPlanner, hookSystem, hookEventHandler, coreToolScheduler, cli hooks/constants) | 594 / 594 pass |
npm run typecheck (5 packages) |
0 errors, 11s |
packages/core full suite |
9292 pass / 0 fail / 4 skip |
packages/cli full suite |
6792 pass / 2 fail (unrelated*) / 9 skip |
-t PermissionDenied across core |
8 / 8 pass |
* The two cli failures are in serve/workspaceMemory.test.ts and serve/workspaceAgents.test.ts, both relying on fs.chmod(dir, 0o555) to force EACCES → 500. Under root those chmod bits are bypassed and the test gets 200. Same failures reproduce on main after reverting this PR's diff, so they're pre-existing and unrelated. (Worth a follow-up if (process.getuid?.() === 0) return; skip, same pattern as the existing win32 early-returns.)
End-to-end behavioral harness
I drove the built modules with a standalone ESM harness (no LLM/API dependency — mocks the classifier verdict shape and the MessageBus). 30 assertions covering:
Reason boundaries (autoMode.ts) — all ✅
via:'classifier', shouldBlock:true, unavailable:false→kind:'blocked', reason:'classifier_blocked', message"Blocked by auto mode policy: <reason>"via:'classifier', shouldBlock:true, unavailable:true→kind:'blocked', reason:'classifier_unavailable', message"Auto mode classifier unavailable (<reason>); action blocked for safety"- All 7
FallbackToAskReasonvalues (safety_check,ask_rule,plan_mode_floor,org_ask_ceiling,consecutive_block,consecutive_unavailable,total_denial) flow throughapplyAutoModeDecisionunchanged - Fast-paths route to
kind:'approved'and leave denial state preserved
Cumulative cap (denialTracking.ts) — all ✅
AUTO_MODE_DENIAL_LIMITS = {maxConsecutiveBlock:3, maxConsecutiveUnavailable:2, maxTotalDenials:20}- Alternating
block → allowto avoidconsecutive_block=3: total cap fires at exactly cumulative 20 blocks with reasontotal_denial total_denialtakes precedence overconsecutive_blockwhen both apply (matchesshouldFallback's top-of-function check)- Cross-reset:
block×2 → unavailablezerosconsecutiveBlockbut keepstotalBlock=2 recordFallbackApproveon total-cap state → all counters reset; on consecutive-only state → only consecutive reset, totals preservedrecordAllowresets consecutive but never resets totals (key invariant for the cap)
PermissionDenied hook (toolHookTriggers.ts) — all ✅
HookEventName.PermissionDenied === 'PermissionDenied'firePermissionDeniedHookdispatches MessageBus request witheventName: 'PermissionDenied', all five expected input fields (tool_name,tool_input,tool_use_id,reason, pluspermission_modeif applicable)- Both
reason: 'classifier_blocked'and'classifier_unavailable'pass through - Runner
success:falseand thrown errors are both surfaced viahookError(no silent allow) - CLI scheduler (
coreToolScheduler.ts:1731) and ACP Session (acp-integration/session/Session.ts:2000) both route through the sharedapplyAutoModeDecisionand both fire the hook onoutcome.kind === 'blocked'— the duplication that drove earlier review rounds is gone.
HookRegistry integration — ✅
A ~/.qwen/settings.json configured with hooks.PermissionDenied = [...] is accepted by HookRegistry.initialize() with no "Invalid hook event name" warning. Negative control with a fabricated NotARealEventName correctly produces the warning. Verified both via the registry harness and by launching the actual CLI binary with such a settings.json and watching stderr stay clean.
One nit (non-blocking)
packages/cli/src/config/settingsSchema.ts hooks.properties lists 12 hook events but is missing PermissionDenied. It's also missing TodoCreated and TodoCompleted, so this is an existing pattern, not a regression. Runtime acceptance comes from HookRegistry.isValidEventName (which uses Object.values(HookEventName)), so users can still configure the new hook — the schema gap is cosmetic / UI-only. Worth a small follow-up PR to add the three missing entries.
LGTM
Reason separation, the cumulative denial cap with total_denial precedence, and the new PermissionDenied hook are wired correctly across both the CLI scheduler and ACP Session paths. Approving.
|
|
||
| let didRequestPermission = false; | ||
| let confirmationDetails: ToolCallConfirmationDetails | undefined; | ||
| const resetAutoModeFallbackAfterApproval = ( |
There was a problem hiding this comment.
[Suggestion] resetAutoModeFallbackAfterApproval silently resets denial counters with no debug logging. The fallback-onset path at line ~2014 logs "Auto mode fallback to manual approval" via debugLogger.warn with formatDenialStateLog, but this recovery/reset closure emits nothing.
Hook approvals are invisible — firePermissionRequestHook only logs in its catch block, never on success. The full audit trail for a hook-triggered counter reset is: (1) fallback-onset warning, (2) tool executes, (3) silence. An oncall engineer cannot distinguish "hook approved and reset counters" from "denial tracking silently broke."
Consider adding debugLogger.debug with before/after counter state:
const resetAutoModeFallbackAfterApproval = (
outcome: ToolConfirmationOutcome,
) => {
if (
approvalMode === ApprovalMode.AUTO &&
wasAutoModeDenialFallback &&
isApproveOutcome(outcome)
) {
const before = this.config.getAutoModeDenialState();
const after = recordFallbackApprove(before);
debugLogger.debug(
`Auto mode denial counters reset after fallback approval: ` +
`${formatDenialStateLog(before)} -> ${formatDenialStateLog(after)}`,
);
this.config.setAutoModeDenialState(after);
}
};The same gap exists in resetAutoModeFallbackAfterConfirmation in coreToolScheduler.ts:2342.
— qwen3.7-max via Qwen Code /review
| @@ -2320,6 +2340,29 @@ export class CoreToolScheduler { | |||
| // (#4321 review-9 wenshao Critical). | |||
| } | |||
|
|
|||
There was a problem hiding this comment.
[Suggestion] The method name and comment suggest this only acts on approval outcomes, but it unconditionally executes this.autoModeFallbackCallIds.delete(callId) regardless of outcome. The new hook-deny call site at line 1911 (ToolConfirmationOutcome.Cancel) depends entirely on this undocumented Set cleanup.
A future maintainer reading the name resetAutoModeFallbackAfterConfirmation and the comment "Cancel / abort do NOT reset" will conclude the Cancel call is dead code and remove it, leaking callIds into autoModeFallbackCallIds.
Consider splitting into two methods to make intent clear:
private clearAutoModeFallbackCallId(callId: string): void {
this.autoModeFallbackCallIds.delete(callId);
}
private recordAutoModeFallbackResolution(
callId: string,
outcome: ToolConfirmationOutcome,
): void {
const wasAutoModeFallback = this.autoModeFallbackCallIds.delete(callId);
if (
this.config.getApprovalMode() === ApprovalMode.AUTO &&
wasAutoModeFallback &&
isApproveOutcome(outcome)
) {
this.config.setAutoModeDenialState(
recordFallbackApprove(this.config.getAutoModeDenialState()),
);
}
}Also consider adding debugLogger.debug with before/after counter state (same observability gap as Session.ts:2030).
— qwen3.7-max via Qwen Code /review
| // (#4321 review-9 wenshao Critical). | ||
| } | ||
|
|
||
| private recordAutoModeFallbackResolution( |
There was a problem hiding this comment.
[Suggestion] Incomplete rename: this method was renamed from resetAutoModeFallbackAfterConfirmation to recordAutoModeFallbackResolution, but the parallel closure in Session.ts:2030 still uses resetAutoModeFallbackAfterApproval. Both implement identical denial-fallback resolution logic — a maintainer grepping for the new name will miss the Session.ts code path (3 call sites at lines 2030, 2102, 2172).
Consider renaming the Session.ts closure to match (recordAutoModeFallbackResolution), or extracting the shared state-transition logic into a helper in denialTracking.ts so both call sites delegate to it (which would also resolve the duplicated 6-line block at Session.ts:2041-2048 / coreToolScheduler.ts:2361-2368).
— qwen3.7-max via Qwen Code /review
| const before = this.config.getAutoModeDenialState(); | ||
| const after = recordFallbackApprove(before); | ||
| debugLogger.debug( | ||
| `Auto mode denial counters reset after fallback approval: ` + |
There was a problem hiding this comment.
[Suggestion] Two issues with the new debug logging:
1. Log level inconsistency: This reset log uses debugLogger.debug() while all peer denial-tracking state-transition logs use debugLogger.warn() (blocked at :1722, fallback at :1759). The reset is the closing bracket of the same denial→fallback→reset lifecycle — using a lower level means it won't appear in the same log pipelines during incident investigation. Consider using .warn() for consistency, or adding a comment explaining why reset is below the operator-visibility threshold.
2. Missing test: No test in coreToolScheduler.test.ts or Session.test.ts asserts this log message is emitted. denialTracking.test.ts covers the pure recordFallbackApprove function but not the integration wiring to debugLogger. Consider adding at least one test per call site:
const debugSpy = vi.spyOn(debugLogger, 'debug').mockImplementation(() => {});
// trigger fallback-approval path...
expect(debugSpy).toHaveBeenCalledWith(
expect.stringContaining('Auto mode denial counters reset'),
);— qwen3.7-max via Qwen Code /review
| `Auto mode denial counters reset after fallback approval: ` + | ||
| `${formatDenialStateLog(before)} -> ${formatDenialStateLog(after)}`, | ||
| ); | ||
| this.config.setAutoModeDenialState(after); |
There was a problem hiding this comment.
[Suggestion] Misleading log in no-op case: recordFallbackApprove returns the same state object unchanged when consecutiveBlock === 0 && consecutiveUnavailable === 0 (denialTracking.ts:171-173, confirmed by test at denialTracking.test.ts:250: expect(recordFallbackApprove(s)).toBe(s)). When this branch is reached (e.g., a concurrent recordAllow zeroed consecutive counters between the fallback-onset warn and this resolution), the log reads "counters reset" with identical before/after values — misleading during 3 AM debugging.
Consider guarding:
const after = recordFallbackApprove(before);
if (after !== before) {
debugLogger.warn(
`Auto mode denial counters reset after fallback approval: ` +
`${formatDenialStateLog(before)} -> ${formatDenialStateLog(after)}`,
);
this.config.setAutoModeDenialState(after);
}(Same change in Session.ts:2041-2048.)
— qwen3.7-max via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
Round 10 review (qwen3.7-max) at SHA 5eedfaab0: incremental commit cleanly addresses all 3 R9 Suggestions (rename consistency, log level, no-op log guard). 9 review agents + reverse audit found 0 high-confidence issues. 241/241 tests pass, eslint clean, CI all_pass 8/8. 4 low-confidence items noted in terminal (early-return test gap, log message precision, JSDoc contract, mock ergonomics) — none warrant inline comments. — qwen3.7-max via Qwen Code /review
本地验证报告 — PR #4476作为 maintainer 在本地用 tmux 跑了完整真实验证。结论:通过,建议合并。 验证环境
1. 单元测试 — 所有改动文件
跨 11 个 test file,包括 PR 直接修改的文件 + 通过 2. 静态检查3. 真实端到端复现(用构建产物)构建后写了一个 独立 repro 脚本,直接 4. 代码层面的观察正向:
潜在风险(不阻塞,仅记录):
5. 没在本地覆盖
结论
建议合并。 没有阻塞问题。后续可以单开 issue 跟踪 Session.ts / coreToolScheduler.ts 的 fallback resolution 抽公共 helper。 — wenshao |
|
This PR is part of a conflict group with PR #4507, PR #4520, PR #4545, and PR #4546. These PRs share modifications to core daemon and session handling code. Key shared functions:
Recommendation: This is the largest conflict group touching core daemon, ACP bridge, hooks, and auto-mode functionality. Strongly recommend coordinating with all PR authors and performing integration testing before merging. This conflict was detected automatically by CodeScope skills — structural analysis of call graphs and file modifications. |
|
This PR is part of a conflict group with PR #4507, PR #4520, PR #4545, and PR #4546. These PRs share modifications to core daemon and session handling code. Key shared functions:
Recommendation: This is the largest conflict group touching core daemon, ACP bridge, hooks, and auto-mode functionality. Strongly recommend coordinating with all PR authors and performing integration testing before merging. This conflict was detected automatically by CodeScope skills — structural analysis of call graphs and file modifications. |
…e-4476-conflicts # Conflicts: # packages/cli/src/acp-integration/session/Session.ts # packages/cli/src/ui/components/hooks/constants.ts # packages/core/src/core/coreToolScheduler.test.ts # packages/core/src/hooks/hookEventHandler.test.ts # packages/core/src/hooks/hookEventHandler.ts # packages/core/src/hooks/hookSystem.test.ts # packages/core/src/hooks/hookSystem.ts # packages/core/src/hooks/types.ts # packages/core/src/permissions/index.ts
…ode-safety-observability # Conflicts: # packages/cli/src/ui/components/hooks/constants.test.ts
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — qwen3.7-max via Qwen Code /review
Summary
Validation
Scope / Risk
PermissionDenied; existing hook behavior is unchanged.Testing Matrix
Testing matrix notes:
Linked Issues / Bugs
Related: #4475