feat(session): gate repeated successful tool calls#406
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (6)
🚧 Files skipped from review as they are similar to previous changes (5)
📝 WalkthroughWalkthroughSession loop diagnostics were rewritten to track structured success/failure signatures and counters, introduce outcome-prefixed signature keys, add threshold-driven reminder/block/stop decisions, and propagate richer loop metadata through observation, parent-state derivation, gating, synthetic block/stop recording, export, renderer, and tests. ChangesLoop-Gate Success & Failure Tracking (single cohesive DAG)
Sequence DiagramsequenceDiagram
participant Tool as Tool
participant Proc as Processor
participant Diag as Diagnostics
participant Gate as GateEvaluator
participant LLM as Model
Tool->>Proc: tool call completes (success)
Proc->>Diag: observeToolCall(success record)
Diag->>Diag: signatureKey(outcome:"success", kind, tool, hash)
Diag->>Diag: increment success bucket, compute repeats
alt repeats >= LOOP_THRESHOLDS.reminderAt
Diag->>Proc: emit success reminder / record loop outcome:"success"
end
Tool->>Proc: tool call errors (failure)
Proc->>Diag: observeToolError(error record)
Diag->>Diag: signatureKey(outcome:"failure", kind, tool, hash)
Diag->>Diag: increment failure bucket, compute reminders
alt failures >= LOOP_THRESHOLDS.reminderAt
Diag->>Proc: emit failure reminder / record loop outcome:"failure"
end
Proc->>Gate: applyLoopGate(parentLoopState)
Gate->>Diag: queryGateAction(outcome:"failure")
Gate->>Diag: queryGateAction(outcome:"success")
Gate->>Gate: chooseGateDecision
alt decision.block
Gate->>Proc: recordSyntheticBlock(outcome, completedCount, ...)
else decision.stop
Gate->>Proc: recordSyntheticStop(outcome, completedCount, ...)
end
Gate->>LLM: return gating action or allow continuation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Review rate limit: 9/10 reviews remaining, refill in 6 minutes. Comment |
There was a problem hiding this comment.
Code Review
This pull request implements tracking and gating for repeated successful tool requests, mirroring the existing logic for failed requests. It introduces LoopOutcome and LOOP_THRESHOLDS to standardize the loop detection lifecycle (reminders at 3, blocks at 4, and stops at 5 occurrences). Additionally, the system now tracks step indices to avoid blocking parallel tool calls within the same execution step. A redundant logical condition in the gating logic was identified for simplification.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
packages/opencode/test/session/prompt-effect.test.ts (1)
583-617: ⚡ Quick winAdd a mixed success/failure loop-gate regression.
These new cases only exercise failure-only histories. Since the gate now merges independent success and failure counters, add one case where the same signature has a failure-side
blockcandidate and a success-sidestopcandidate, and assert the turn stops. That would catch the precedence bug above.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opencode/test/session/prompt-effect.test.ts` around lines 583 - 617, Add a new regression test alongside "loop gate blocks repeated tool errors across model steps" that builds a mixed success/failure history: create a session via Session.Service and send parts via SessionPrompt.Service/llm so the same tool signature produces both failing invocations that would generate a failure-side "block" candidate and a successful invocation that would produce a success-side "stop" candidate; then invoke prompt.loop (SessionPrompt.Service.loop) and assert the loop decision honors the success-side stop (i.e., the turn stops), checking MessageV2.filterCompactedEffect for the tool parts and that the final decision reflects a "stop" outcome rather than being blocked by the failure-side candidate. Ensure the test uses the same signature, verifies both candidate types (loopAction "block" and "stop"), and asserts the overall turn stopped.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/opencode/src/session/processor.ts`:
- Around line 314-325: In buildLoopContext (session/processor.ts) the current
logic assigns parts before the first "step-start" in an assistant message a
lower stepIndex (and observedStepIndex = -1), which lets late "step-start"
events make same-step tool bursts look like prior-step history; fix by tracking
whether this message has seen any "step-start" yet (e.g., a local boolean like
sawStepStart while iterating message.parts) and when handling tool parts for the
active assistant message (message.info.id === ctx.assistantMessage.id) set
observedStepIndex to currentStepIndex if no step-start has been seen (instead of
-1); ensure loopWithStep uses loop.stepIndex ?? observedStepIndex so early tool
parts in the current assistant step get the same-step index rather than being
treated as prior-step.
In `@packages/opencode/src/session/prompt.ts`:
- Around line 137-151: The current merge picks failureDecision over
successDecision which can downgrade a "stop" to a "block"; change the merge
logic after computing failureDecision and successDecision (from
SessionDiagnostics.queryGateAction) to pick the strongest action across both: if
either.failureDecision.action === "stop" || successDecision.action === "stop"
then decision = { ...choose stop decision (prefer the one with stop) }, else if
either has action === "block" then decision = { ...choose block decision }, else
decision = the observe outcome; ensure you preserve other metadata from the
chosen decision object (not just the action).
---
Nitpick comments:
In `@packages/opencode/test/session/prompt-effect.test.ts`:
- Around line 583-617: Add a new regression test alongside "loop gate blocks
repeated tool errors across model steps" that builds a mixed success/failure
history: create a session via Session.Service and send parts via
SessionPrompt.Service/llm so the same tool signature produces both failing
invocations that would generate a failure-side "block" candidate and a
successful invocation that would produce a success-side "stop" candidate; then
invoke prompt.loop (SessionPrompt.Service.loop) and assert the loop decision
honors the success-side stop (i.e., the turn stops), checking
MessageV2.filterCompactedEffect for the tool parts and that the final decision
reflects a "stop" outcome rather than being blocked by the failure-side
candidate. Ensure the test uses the same signature, verifies both candidate
types (loopAction "block" and "stop"), and asserts the overall turn stopped.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: cbcd8d9f-41c0-43b7-97eb-20ab19995c1c
📒 Files selected for processing (11)
packages/opencode/src/session/diagnostics.tspackages/opencode/src/session/export.tspackages/opencode/src/session/loop-renderer.tspackages/opencode/src/session/processor.tspackages/opencode/src/session/prompt.tspackages/opencode/test/session/compaction.test.tspackages/opencode/test/session/diagnostics.test.tspackages/opencode/test/session/export.test.tspackages/opencode/test/session/loop-gate.test.tspackages/opencode/test/session/loop-renderer.test.tspackages/opencode/test/session/prompt-effect.test.ts
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/opencode/src/session/processor.ts (1)
209-218:⚠️ Potential issue | 🟠 Major | ⚡ Quick winKeep input-hash-only successes in
loopRecords.
observeToolCall()still builds its reminder history fromloopRecords(parentID), but this helper drops any completed success whereloop.targetHashis missing. That means tools that only produce aninputHashnever setloopRecoverFiredFor, so their success repeats can never advance to block/stop even though this PR is supposed to gate(tool, inputHash | targetHash)signatures.Suggested fix
- if (!loop?.inputHash || !loop.targetHash) return [] + if (!loop?.inputHash) return [] if (loop.errorFingerprint || loop.loopAction) return [] return [ { sessionID: ctx.sessionID, parentID, tool: part.tool, inputHash: loop.inputHash, - targetHash: loop.targetHash, + targetHash: loop.targetHash ?? "", metadata: { diagnostics: { loop } }, } satisfies SessionDiagnostics.ToolCallRecord, ]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opencode/src/session/processor.ts` around lines 209 - 218, The helper loopRecords currently rejects loop entries that lack loop.targetHash, causing successes that only have loop.inputHash to be dropped; update the predicate in loopRecords (the check around toolDiagnostics(part)?.loop) to only require loop.inputHash (i.e., change the condition from "if (!loop?.inputHash || !loop.targetHash) return []" to "if (!loop?.inputHash) return []") so entries with an inputHash but no targetHash are kept for observeToolCall() to build reminder history and set loopRecoverFiredFor; keep the other exclusions (errorFingerprint, loopAction, completed status, role/parentID checks) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@packages/opencode/src/session/processor.ts`:
- Around line 209-218: The helper loopRecords currently rejects loop entries
that lack loop.targetHash, causing successes that only have loop.inputHash to be
dropped; update the predicate in loopRecords (the check around
toolDiagnostics(part)?.loop) to only require loop.inputHash (i.e., change the
condition from "if (!loop?.inputHash || !loop.targetHash) return []" to "if
(!loop?.inputHash) return []") so entries with an inputHash but no targetHash
are kept for observeToolCall() to build reminder history and set
loopRecoverFiredFor; keep the other exclusions (errorFingerprint, loopAction,
completed status, role/parentID checks) unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 27b157f4-6e2c-4093-aea7-fb50224c723e
📒 Files selected for processing (5)
packages/opencode/src/session/diagnostics.tspackages/opencode/src/session/processor.tspackages/opencode/src/session/prompt.tspackages/opencode/test/session/loop-gate.test.tspackages/opencode/test/session/prompt-effect.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- packages/opencode/test/session/loop-gate.test.ts
- packages/opencode/src/session/diagnostics.ts
|
Handled the non-inline CodeRabbit note in c4340ee: |
Fix #406 follow-up false positives in the session loop gate. Problem - PR #406 made successful repeated tool calls observable, but same-target success repeats were treated as hard loop evidence. - Normal sessions can read different ranges of the same file, or rerun the same command after a file mutation, while still making progress. Policy changes - success + same_target is only a weak reminder signal and never directly hard-blocks or stops. - success + exact input only gates when prior completed records have the same output and the same parent-turn mutationEpoch. - mutationEpoch is parent-turn scoped across assistant messages, not message-local. - failure gates keep the stricter target/input protections. - synthetic block/stop diagnostics preserve attemptedInput with size guarding. - sanitized exports redact loop attemptedInput, and success loop exports no longer pretend completedFailures. Regression coverage - read same file with different ranges does not block or stop. - successful exact input with changed output does not gate. - successful exact input across mutation epochs does not gate. - cross-assistant-message parent-turn mutationEpoch reset is covered. - bash file mutation produces patch tracking and resets successful exact-input gating. - repeated same-output success signature still blocks, and repeated blocked signature still stops. Verification - bun --cwd packages/opencode test test/session/loop-gate.test.ts test/session/diagnostics.test.ts test/session/processor-effect.test.ts test/session/prompt-effect.test.ts - bun --cwd packages/opencode test test/session - bun run --cwd packages/opencode typecheck - git diff --check - GitHub Actions and CodeRabbit were green before merge.
Summary
Implement a unified structural loop gate for repeated tool calls:
completedFailures.Why
#279 showed that PawWork could detect repeated failures, but repeated successful low-value tool calls could still loop until the user interrupted. #378 improved provider/tool-call matching, but it does not solve successful-repeat loops.
Related Issue
Closes #279.
Human Review Status
Pending. A human should make the final merge decision after reviewing the final diff and verification evidence.
Review Focus
Please focus on loop-gate semantics:
Risk Notes
Behavior change in the session tool loop gate. Failure loops now use the same 3 / 4 / 5 occurrence ladder as success loops instead of the older looser thresholds. The main residual risk is real-provider streaming timing around highly parallel tool calls, though local prompt-effect coverage exercises the same-step and cross-step boundaries.
No dependencies, migrations, generated files, credentials, permissions, or visible UI changes.
How To Verify
Screenshots or Recordings
Not applicable. No visible UI changes.
Checklist
dev, and my PR title and commit messages use Conventional Commits in EnglishSummary by CodeRabbit
New Features
Bug Fixes
Tests