feat(goal): abandon impossible /goal conditions instead of looping#322
Conversation
Ported from QwenLM#4230, adapted to our goal subsystem (core/src/goal, single goalEvaluator) rather than upstream's goalJudge. The goal evaluator can now return a terminal "impossible" verdict — used only when the condition is genuinely unachievable in this session (self- contradictory, depends on an unavailable resource, or all reasonable approaches are exhausted). When it fires, the autonomous /goal loop stops and the goal is recorded as failed instead of continuing forever. Guardrails mirror upstream: the assistant claiming impossibility is evidence, not proof; "impossible" is only honoured alongside met:false; and an evaluator error or empty reply never fails a goal (stays met:false, continues). - types: `GoalEvaluationResult.impossible`, `GoalState.failedAt` - goalEvaluator: prompt + parser support the third verdict shape - GoalManager: `markImpossible(reason)` / `getLastFailedGoal()`; reset clears it - client: impossible verdict → markImpossible + stop (no continuation) - goalCommand: `/goal` status reports an abandoned-as-impossible goal, preferring the most recent terminal outcome - tests: parser, manager, and status-text coverage (60 goal tests pass) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
👀 Quinn is reviewing — verdict (PASS / WARN / FAIL) + findings to follow. |
|
Warning Review limit reached
More reviews will be available in 7 minutes and 36 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (8)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
QA Audit — PR #322 | VERDICT: PASS CI Status
Diff Review
Observations
✅ Submitted APPROVE review on |
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
…nce (#344) * fix(goal): stop runaway loops and let the judge credit artifact evidence Goal mode could loop forever re-confirming a finished task. Three causes, all fixed here on top of the existing impossible-goal abandonment (#322): - The evaluator judged only execution-shaped evidence (test output, exit codes), so a docs/research goal whose deliverable is files-that-exist could never be marked met. The judge now accepts the requested artifact itself as evidence and is told to judge the condition exactly as written, without inventing stricter acceptance criteria. - The evidence window was the last 20 raw tool calls, so read-only churn on re-confirmation turns pushed the original writes out of view and the judge lost sight of the work that satisfied the goal. Replaced with summarizeToolCallsForGoal: dedupes repeats into counts, classifies state-changing actions vs read-only inspections, and keeps every distinct action while capping read noise — the deliverable stays visible no matter how long the agent loops. - Added a stall guard: when the evaluator returns the same not-met reason GOAL_STALL_LIMIT (3) turns running without converging, the loop clears the goal and spends one final turn reporting to the user instead of re-firing. Distinct from an impossible verdict, which still ends the loop immediately. Tests: 56 goal-module tests (incl. stall detection + summary) and 64 client tests pass; typecheck, eslint, prettier clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(goal): avoid polynomial-redos in normalizeReason CodeQL flagged `/[.!?]+$/` (js/polynomial-redos, high): it backtracks polynomially on strings with many '!', and the reason text it runs on comes from the evaluator model (uncontrolled input). Replace it with a trailing- character scan that strips the same punctuation in linear time. Behavior is unchanged; existing stall-detection tests still pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
What
Port of QwenLM/qwen-code#4230 ("fail impossible goals"), adapted to our goal subsystem (
packages/core/src/goal, singlegoalEvaluator) rather than upstream'sgoalJudgearchitecture./goalkeeps the agent working turn-after-turn until a small model confirms the condition holds. Today, a genuinely unachievable condition loops forever (until/goal clearor a budget cap). This adds a terminal "impossible" verdict: when the evaluator determines the condition can never be satisfied in this session, the autonomous loop stops and the goal is recorded as failed.Guardrails (mirrors upstream intent)
impossible: trueis only used when the condition is genuinely unachievable — self-contradictory, depends on an unavailable resource/capability, or all reasonable approaches are exhausted and the transcript confirms no path forward. Specifically:{met: false}withoutimpossible."impossibleis only honoured alongsidemet: false(ignored on a met verdict).met: falseand continues, so the guardrail can't strand a solvable goal.Changes
GoalEvaluationResult.impossible,GoalState.failedAtimpossible(only when not met)markImpossible(reason)→ terminallastFailedstate;getLastFailedGoal();reset()clears itmarkImpossibleand stop (no continuation); logs the abandonment/goalstatus reports an abandoned-as-impossible goal, preferring the most recent terminal outcomeTests
60 goal tests pass — parser (impossible parsed when not-met, ignored when met, omitted when absent/false), manager (
markImpossible/getLastFailedGoal/reset), and status-text (impossible report, most-recent-terminal preference). tsc + eslint clean.Note / possible follow-up
I deliberately scoped out upstream's dedicated
GoalStatusMessageUI component. Terminal state is surfaced via the pill clearing +/goalstatus (consistent with how achieved goals already work). A richer inline "goal failed" message could be a small follow-up if wanted.🤖 Generated with Claude Code