Skip to content

feat(core): fail impossible goals#4230

Merged
wenshao merged 8 commits into
QwenLM:mainfrom
qqqys:feat/goal-impossible
May 17, 2026
Merged

feat(core): fail impossible goals#4230
wenshao merged 8 commits into
QwenLM:mainfrom
qqqys:feat/goal-impossible

Conversation

@qqqys

@qqqys qqqys commented May 17, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • What changed: add an impossible judge outcome for /goal, terminate that goal as failed, and render/restore the failed terminal state in the CLI.
  • Why it changed: impossible goals should clear instead of looping until the iteration cap.
  • Reviewer focus: judge prompt/schema semantics and failed terminal UI wording.

Validation

  • Commands run:

    cd packages/core && npx vitest run src/goals/goalJudge.test.ts src/goals/goalHook.test.ts
    cd packages/cli && npx vitest run src/ui/components/messages/GoalStatusMessage.test.tsx src/ui/commands/goalCommand.test.ts src/ui/utils/restoreGoal.test.ts
    npm run build
    npm run typecheck
    git diff --check

  • Observed result: targeted tests passed, build passed, typecheck passed. npm run build reported an existing lint warning in packages/vscode-ide-companion/src/utils/editorGroupUtils.ts.

Scope / Risk

  • Main risk or tradeoff: the judge prompt may classify a goal as impossible too aggressively; the prompt limits this to genuinely unachievable cases.
  • Not covered / not validated: no interactive TUI screenshot/video was captured.
  • Breaking changes / migration notes: none expected.

Linked Issues / Bugs

Related #4228

@github-actions

Copy link
Copy Markdown
Contributor

📋 Review Summary

This PR introduces a new "failed" terminal state for goals that are judged as genuinely impossible to achieve, preventing the agent from looping until the iteration cap. The implementation adds an impossible flag to the judge's JSON response schema, updates the goal hook to handle this verdict, and renders the failed state in the CLI UI. The change is well-scoped and addresses a real usability issue where impossible goals would otherwise burn iterations unnecessarily.

🔍 General Feedback

  • Clean separation of concerns: The core logic (judge schema + hook callback) is cleanly separated from UI rendering, making the change easy to trace across packages.
  • Good test coverage: Tests cover the new failed kind in both the judge parsing and the UI components, including the restore/clear behavior.
  • Consistent with existing patterns: The failed kind follows the same structure as achieved/aborted, minimizing cognitive load for future maintainers.
  • Risk acknowledged: The PR description correctly identifies the main risk (over-aggressive classification) and mitigates it with clear prompt guidance.

🎯 Specific Feedback

🟡 High

  • File: packages/core/src/goals/goalJudge.ts:34-40 — The impossible flag is added to RESPONSE_SCHEMA but marked as optional in required array. This is correct for backward compatibility, but consider adding validation that impossible: true can only appear when ok: false. Currently, a malformed response like {ok: true, impossible: true} would pass through without warning. Consider adding:

    if (impossible && ok) {
      debugLogger.debug('Judge returned impossible=true with ok=true; ignoring impossible flag');
      return { ok, reason: reasonText };
    }
  • File: packages/core/src/goals/goalHook.ts:187-197 — The impossible verdict path calls finishGoal with kind: 'failed', but doesn't include a systemMessage field like the aborted path does (line 206). This could lead to inconsistent behavior if downstream code expects systemMessage on all terminal events. Consider whether verdict.reason should also be passed as systemMessage for parity, or document why it's not needed.

🟢 Medium

  • File: packages/cli/src/ui/components/messages/GoalStatusMessage.tsx:84-90 — The new failed case returns the same prefix: '!' and prefixColor: theme.status.error as the warning style. Consider whether failed should use a distinct visual treatment from aborted (warning) since they represent semantically different outcomes: "couldn't be done" vs. "gave up trying." If the current design is intentional, a brief comment explaining why both share the ! prefix would help future maintainers.

  • File: packages/cli/src/ui/utils/restoreGoal.ts:23-34 — The findGoalToRestore function now correctly returns null for failed goals (tested in line 102-110 of the test file), but the JSDoc comment still says "(achieved / cleared / aborted)" without mentioning failed. Update the comment to:

    * or `null` if the last goal_status was terminal/cancelled
    * (achieved / failed / cleared / aborted) or none exists.
  • File: packages/cli/src/ui/utils/restoreGoal.ts:47-58 — Similarly, findLastTerminalGoal JSDoc says "(achieved / aborted)" but should include failed:

    * Finds the most recent terminal (achieved / failed / aborted) goal_status item in

🔵 Low

  • File: packages/core/src/goals/goalJudge.ts:58-62 — The updated prompt guidance for impossible is thorough, but the phrase "the transcript confirms there is no path forward" could be tightened. Consider adding an explicit example of what not to classify as impossible (e.g., "Do not use this for goals that are merely difficult or time-consuming").

  • File: packages/cli/src/ui/commands/goalCommand.ts:60-67 — The formatTerminalSummary function now handles failed in the title logic, but the variable name title could be more descriptive since it's used in both the checking branch (line 47) and the terminal branch. Consider renaming to cardTitle or statusTitle for clarity.

  • File: packages/core/src/goals/activeGoalStore.ts:73 — The type definition GoalTerminalKind = 'achieved' | 'aborted' | 'failed' is updated correctly, but consider adding a brief JSDoc comment explaining the semantic difference between aborted (system gave up) and failed (genuinely impossible) for future contributors who encounter this type.

✅ Highlights

  • Excellent test coverage: The new failed kind is tested across all layers — judge parsing (goalJudge.test.ts), hook behavior (goalHook.test.ts), UI rendering (GoalStatusMessage.test.tsx), command behavior (goalCommand.test.ts), and restore logic (restoreGoal.test.ts). This multi-layer testing approach is exemplary.

  • Thoughtful schema evolution: The impossible field is added as an optional boolean with backward-compatible parsing logic. The parseJudgeReply function (lines 336-343 in goalJudge.ts) correctly handles the new field without breaking existing judge responses.

  • Clear UI differentiation: The GoalStatusMessage component now renders "Goal could not be achieved" for failed vs. "Goal aborted" for iteration caps, giving users immediate visibility into why their goal stopped.

  • Proper state cleanup: The failed kind correctly prevents goal restoration on session resume (tested in restoreGoal.test.ts:102-110), which is the right behavior since impossible goals shouldn't auto-restart.

@qqqys qqqys marked this pull request as ready for review May 17, 2026 05:53

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 Code Review — PR #4230

Scope: 13 files (+243/-29) | Build: ✅ | Tests: 76/76 pass | Typecheck: ✅ | Lint:

This PR adds an "impossible" terminal state for /goal, allowing the judge to signal that a goal is genuinely unachievable. The implementation is clean and well-structured. I found 1 critical issue, 2 suggestions, and 2 minor comment updates.


🔴 Critical — Prompt injection via judgeReason (goalHook.ts:48-57)

The continuationReasonForGoal function now interpolates the judge's free-form verdict.reason into the continuation prompt fed back to the working model. The old code explicitly avoided this with a comment explaining why:

"Keep the judge's free-form diagnostic in goal state/UI only. The Stop hook reason is fed back to the model as the next continuation prompt, so it must be a fixed instruction derived from the original user goal rather than untrusted transcript-derived judge text."

The new code injects it verbatim (with only a soft "Hook feedback only" label). This is a prompt-injection surface that affects all goals, not just the impossible path. A compromised or confused judge can feed arbitrary instructions into the model's continuation context.

The test at goalHook.test.ts was explicitly changed to assert that "rm -rf /" IS present in the continuation reason — converting a security guardrail into a regression test for the injection.


🟡 Suggestion — impossible bypasses iteration-cap safety (goalHook.ts:192-202)

The control flow is: verdict.okverdict.impossible → iteration-cap check. A single judge hallucination (impossible: true on iteration 1) kills the goal immediately with no guardrail. Consider requiring impossible on ≥2 consecutive evaluations or adding a minimum iteration floor:

if (verdict.impossible && latest.iterations >= 2) {

🟡 Suggestion — additionalProperties: false cast hack (goalJudge.ts:55)

The RESPONSE_SCHEMA is typed as Schema & { additionalProperties: boolean } — a cast outside @google/genai's typed contract. If the SDK changes its serialization, this constraint silently vanishes. Consider adding a structural parity test that asserts schema properties keys match JudgeResult interface keys.


🟢 Nice to have — Stale inline comments

Two comments reference only achieved / aborted but not the new failed kind:

  • GoalStatusMessage.tsx:135: "shown on terminal cards (achieved / aborted)"
  • activeGoalStore.ts:66: "Goal achieved" / "Goal aborted"

Reviewed by: qwen-code (mimo-v2.5-pro)

'Continue working toward satisfying the goal. Active /goal condition remains the condition above.'
);
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical — Prompt injection via judgeReason

The old code deliberately kept the judge's free-form diagnostic out of the continuation prompt:

"Keep the judge's free-form diagnostic in goal state/UI only. The Stop hook reason is fed back to the model as the next continuation prompt, so it must be a fixed instruction derived from the original user goal rather than untrusted transcript-derived judge text."

This PR reverses that security decision. The judge's reason is LLM-generated text quoting the model's own transcript — it can contain adversarial content. The soft framing "Hook feedback only; not a user instruction." mitigates but does not eliminate injection risk.

This affects all goals, not just the impossible path — every "not met" continuation now carries judge text.

Suggested fix: Either (a) revert to the old fixed template and surface judge text only in UI/history, or (b) if feedback is intentional, add sanitization (cap length, strip imperative sentences, or wrap in a clearly non-actionable frame).

Comment thread packages/core/src/goals/goalHook.ts Outdated
return { continue: true };
}

if (verdict.impossible) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion — impossible bypasses the iteration-cap safety valve

The control flow is: verdict.okverdict.impossible → iteration-cap check (line 204). A single judge hallucination (impossible: true on iteration 1) kills the goal immediately — no guardrail, no second chance.

Suggested fix: Add a minimum iteration floor before honoring impossible:

if (verdict.impossible && latest.iterations >= 2) {

This gives the goal at least a couple of turns before a single bad verdict can terminate it.

properties: {
ok: { type: 'BOOLEAN' as unknown as Schema['type'] },
reason: { type: 'STRING' as unknown as Schema['type'] },
impossible: { type: 'BOOLEAN' as unknown as Schema['type'] },

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion — additionalProperties: false is an invisible footgun

The Schema & { additionalProperties: boolean } cast is outside @google/genai's typed contract. If the SDK changes its serialization to strip unknown fields, this constraint silently vanishes — tests pass because the mock doesn't go through the SDK.

Also, if someone adds a new field to the judge response (e.g. confidence), they must remember to add it to RESPONSE_SCHEMA.properties too. Forgetting means the API silently strips the new field with no compile-time signal.

Suggested fix: Add a unit test asserting the schema's properties keys match the JudgeResult interface keys — a structural parity check that catches future drift.

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No new issues found in the incremental review. LGTM! ✅ — gpt-5.5 via Qwen Code /review

wenshao
wenshao previously approved these changes May 17, 2026

if (
verdict.impossible &&
latest.iterations >= MIN_IMPOSSIBLE_GOAL_ITERATIONS

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Missing debug log when impossible floor suppresses verdict

When the judge returns impossible: true but iterations < MIN_IMPOSSIBLE_GOAL_ITERATIONS, the verdict is silently deferred with no log. The max-iterations path (line 207) has a debugLogger.debug call; this suppression path has none. At 3 AM, an operator would see the judge's "impossible" reason in lastReason but have no log explaining why the system kept looping.

Suggested change
latest.iterations >= MIN_IMPOSSIBLE_GOAL_ITERATIONS
if (verdict.impossible) {
if (latest.iterations >= MIN_IMPOSSIBLE_GOAL_ITERATIONS) {
finishGoal(config, sessionId, latest, {
kind: 'failed',
condition: latest.condition,
iterations: latest.iterations,
durationMs: Date.now() - latest.setAt,
lastReason: verdict.reason,
});
return { continue: true };
}
debugLogger.debug(
`Impossible verdict suppressed: iterations=${latest.iterations} < floor=${MIN_IMPOSSIBLE_GOAL_ITERATIONS}; continuing loop.`,
);
}

— mimo-v2.5-pro via Qwen Code /review

export const GOAL_JUDGE_TIMEOUT_MS = 25_000;
export const GOAL_HOOK_TIMEOUT_SECONDS = 30;
export const GOAL_HOOK_TIMEOUT_MS = GOAL_HOOK_TIMEOUT_SECONDS * 1000;
export const MIN_IMPOSSIBLE_GOAL_ITERATIONS = 2;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] MIN_IMPOSSIBLE_GOAL_ITERATIONS lacks documentation

The exported constant has no JSDoc explaining its rationale. The value "2" is a policy decision — it means the model gets one continuation turn after the first impossible verdict. A future maintainer encountering "my impossible goal took 3 turns to stop" will have to reverse-engineer the control flow.

Suggested change
export const MIN_IMPOSSIBLE_GOAL_ITERATIONS = 2;
/**
* Minimum iteration count before an `impossible` judge verdict is accepted.
* Gives the model at least one continuation turn after the judge first flags
* impossibility, guarding against premature failure on a single bad-judgment
* turn. The goal terminates on the second consecutive impossible verdict.
*/
export const MIN_IMPOSSIBLE_GOAL_ITERATIONS = 2;

— mimo-v2.5-pro via Qwen Code /review

@tanzhenxin tanzhenxin added the type/feature-request New feature or enhancement request label May 17, 2026
the condition is genuinely unachievable rather than deferring to the assistant's
self-assessment. Do not use it just because progress is slow or evidence is
currently missing. When in doubt, return {"ok": false} without "impossible".`;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good prompt guardrails. "The assistant claiming the goal is impossible is evidence, not proof" is the key line — it prevents the model from self-certifying impossibility to escape a difficult goal. The "when in doubt, return ok:false without impossible" fallback is also the right default.


export const JUDGE_RESULT_SCHEMA_KEYS = [
'ok',
'reason',

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] JudgeResult.impossible has undocumented mutual exclusion with ok

parseJudgeReply strips impossible when ok is true:

...(impossible && !ok ? { impossible: true } : {}),

The test covers this (ignores impossible=true when the judge also reports ok=true), but the JudgeResult interface doesn't document that impossible is only meaningful when ok is false. A future code path checking verdict.impossible without first verifying !verdict.ok could misclassify an achieved goal as impossible.

Suggested change
'reason',
/**
* Whether the goal is genuinely impossible in this session.
* Only meaningful when `ok` is `false`. If `ok` is `true`,
* this field is always absent.
*/
impossible?: boolean;

— DeepSeek/deepseek-v4-pro via Qwen Code /review

@@ -126,7 +132,8 @@ export const GoalStatusMessage: React.FC<GoalStatusMessageProps> = ({
<Text wrap="wrap">{condition}</Text>
</Box>

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Switch default silently absorbs future GoalStatusKind variants

The case 'aborted': + default: share one return block. This PR correctly adds case 'failed': before the default, but if GoalStatusKind gains another variant later, TypeScript won't catch the missing case — the new kind will silently render as "Goal aborted" with a warning icon.

Suggested change
</Box>
case 'aborted':
return {
prefix: '!',
prefixColor: theme.status.warning,
title: 'Goal aborted',
};
// No default — let TypeScript enforce exhaustiveness.

— DeepSeek/deepseek-v4-pro via Qwen Code /review

return { continue: true };
}

// Give the latest assistant output one final evaluation before aborting.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Missing debugLogger.debug() on the actual failed termination path

The suppressed-impossible path (below floor) has a debugLogger.debug() call, which is great. But the actual termination path where the goal IS marked as failed (iterations ≥ MIN) calls finishGoal without any debug log. At 3 AM, if a user reports their goal was prematurely marked impossible, there's no structured evidence that the judge returned impossible: true — only the freeform lastReason text in UI history.

Suggested change
// Give the latest assistant output one final evaluation before aborting.
debugLogger.debug(
`Goal judge ruled impossible; clearing goal.`,
{ reason: verdict.reason, iterations: latest.iterations },
);
finishGoal(config, sessionId, latest, {

— DeepSeek/deepseek-v4-pro via Qwen Code /review

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional notes (could not be mapped to specific diff lines)

Suggestion — Removed prompt injection safety comment. The PR removed the comment explaining why the continuation prompt must use continuationReasonForGoal(condition) instead of verdict.reason: verdict.reason is untrusted model-generated text and feeding it back as the continuation prompt would be a prompt injection vector. The safety property is encoded in the runtime string but the design rationale is lost. Consider restoring the comment or moving it to continuationReasonForGoal's JSDoc.

Suggestion — Missing test for impossible precedence over MAX_GOAL_ITERATIONS. The code correctly checks impossible before the iteration cap, so failed takes precedence over aborted when both conditions are true. But no test verifies this — if the check order is accidentally reversed in a future refactor, the behavior would silently regress.

Suggestion — Missing integration test for impossible→failed pipeline. Only unit-level tests with mocked judgeGoal cover the new path. An integration test through the real HookSystem would catch bugs in how judgeGoalWithTimeout interacts with the impossible verdict (e.g., if the timeout wrapper accidentally swallows impossible).


Verdict: Comment — no Critical issues. Build ✅, tests 77/77 ✅.

— DeepSeek/deepseek-v4-pro via Qwen Code /review

: never;

export const JUDGE_RESULT_SCHEMA_COVERS_INTERFACE: SchemaCoversJudgeResult = true;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this exported constant is never read at runtime — it exists solely to trigger the SchemaCoversJudgeResult compile-time check. Consider making it unexported (just const) or adding a brief // compile-time only annotation, since it currently shows up in the public API surface but has no runtime consumers.

export const JUDGE_RESULT_SCHEMA_COVERS_INTERFACE: SchemaCoversJudgeResult = true;

const RESPONSE_SCHEMA: Schema & { additionalProperties: boolean } = {
// Schema typing in @google/genai uses an enum-like Type, but accepts the

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The additionalProperties field is bolted on via Schema & { additionalProperties: boolean } because @google/genai's Schema type does not include it. This works but is fragile — if the upstream type adds the property with a different shape, the intersection could silently conflict. Consider a // @ts-expect-error or a short comment noting the upstream gap so future readers understand the intent.

line). One Text + natural wrap keeps the continuation flush. */}
{(kind === 'achieved' || kind === 'aborted') && lastReason?.trim() ? (
{(kind === 'achieved' || kind === 'aborted' || kind === 'failed') &&
lastReason?.trim() ? (

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This disjunction is growing — it now matches all terminal kinds except set, cleared, and checking. The same pattern appears in restoreGoal.ts:findLastTerminalGoal and goalCommand.ts:formatTerminalSummary. Consider extracting a shared isTerminalGoalKind() helper or a TERMINAL_GOAL_KINDS set so adding another terminal kind in the future only requires one change instead of N.

return { decision: 'block', reason: continuationReasonForGoal(condition) };
return {
decision: 'block',
reason: continuationReasonForGoal(condition),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old code had a multi-line comment here explaining why the stop hook reason uses fixed text (continuationReasonForGoal) instead of passing through the judge's free-form diagnostic. That security rationale (untrusted transcript-derived text must not flow into the model's continuation prompt) is non-obvious and was removed in this reformatting. Worth preserving — new contributors may wonder why verdict.reason isn't included in the hook response.

durationMs: Date.now() - latest.setAt,
lastReason: verdict.reason,
});
return { continue: true };

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] failed terminal state missing systemMessage

The failed path returns { continue: true } with no systemMessage, while the aborted path (line 222) returns { continue: true, systemMessage: GOAL_ABORTED_REASON }. After a failed outcome, the user gets no explicit hint to rephrase and re-attempt the goal — unlike for aborted goals which include actionable guidance ("Re-set with /goal <condition> if you still need it").

Consider adding a GOAL_FAILED_REASON constant and including it as systemMessage in both the finishGoal call and the return, matching the aborted pattern:

const GOAL_FAILED_REASON =
  'Goal determined to be impossible; cleared. Re-phrase and re-set with `/goal <condition>` to try a different approach.';

— glm-5.1 via Qwen Code /review


// Compile-time only: fails if JudgeResult grows a key that the response schema
// key list does not include.
const JUDGE_RESULT_SCHEMA_COVERS_INTERFACE: SchemaCoversJudgeResult = true;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Compile-time assertion is exported but never imported

JUDGE_RESULT_SCHEMA_COVERS_INTERFACE is exported but never imported anywhere in the codebase. Its sole purpose is to trigger a compile-time type error when JUDGE_RESULT_SCHEMA_KEYS diverges from JudgeResult. An aggressive "remove unused exports" lint rule or manual cleanup could delete it, silently breaking the compile-time safety net.

Consider either:

  1. Keeping it unexported with a void statement (which the diff already has) — drop the export keyword, or
  2. Adding a comment explaining why it must be kept.

— glm-5.1 via Qwen Code /review

wenshao
wenshao previously approved these changes May 17, 2026

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No new issues found. Existing 7 open review comments already cover all concerns — no additional findings in this pass. LGTM! ✅ — DeepSeek/deepseek-v4-pro via Qwen Code /review

ok,
reason: reasonText,
...(impossible && !ok ? { impossible: true } : {}),
};

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] parseJudgeReply uses strict === true for impossible

const impossible = (payload as { impossible?: unknown }).impossible === true;

If the model returns "true" (string) or 1 (number), the verdict is silently treated as not-impossible and the goal continues looping. The structured-output schema requires boolean, but parseJudgeReply already has fallback parsing for malformed JSON (bracket-search in start/end), suggesting the model doesn't always respect the schema.

Consider adding a tolerance test for non-boolean impossible values and optionally using Boolean() coercion to be defensive.

Suggested change
};
const impossible = Boolean((payload as { impossible?: unknown }).impossible);

— DeepSeek/deepseek-v4-pro via Qwen Code /review

@@ -81,13 +85,20 @@ export const GoalStatusMessage: React.FC<GoalStatusMessageProps> = ({
prefixColor: theme.text.secondary,
title: 'Goal cleared',

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] GoalStatusMessage is a pure functional component but not wrapped in React.memo

In the Ink/React rendering model, every streaming event triggers a parent re-render, causing all historical GoalStatusMessage instances in the scrollback to re-render even though their props (kind, condition, iterations, durationMs, lastReason) are immutable once the card is rendered.

Suggested change
title: 'Goal cleared',
export const GoalStatusMessage: React.FC<GoalStatusMessageProps> = React.memo(({

— DeepSeek/deepseek-v4-pro via Qwen Code /review

| 'checking';

export const TERMINAL_GOAL_STATUS_KINDS = [
'achieved',

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. The satisfies readonly GoalStatusKind[] constraint plus isTerminalGoalStatusKind type guard means adding a new terminal kind to GoalTerminalKind without adding it here produces a type error at the assertNever call sites. Single source of truth for terminal-kind membership — exactly what I was suggesting.

@@ -209,9 +239,12 @@ export function createGoalStopHookCallback(args: {
recordGoalIteration(sessionId, verdict.reason);
// Keep the judge's free-form diagnostic in goal state/UI only. The Stop
// hook reason is fed back to the model as the next continuation prompt, so

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for restoring this. The security rationale (untrusted judge text must not flow into the continuation prompt) is non-obvious and worth preserving for future contributors.

// Compile-time only: fails if JudgeResult grows a key that the response schema
// key list does not include.
const JUDGE_RESULT_SCHEMA_COVERS_INTERFACE: SchemaCoversJudgeResult = true;
void JUDGE_RESULT_SCHEMA_COVERS_INTERFACE;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean fix — unexported with void to suppress the unused-variable warning. The // Compile-time only comment makes the intent immediately clear.

@wenshao

wenshao commented May 17, 2026

Copy link
Copy Markdown
Collaborator

Local validation report

Verified on 9f2430a in a clean checkout.

Unit / typecheck

Command Result
npx vitest run packages/core/src/goals/ 46/46 passed across 4 files
npm run typecheck --workspace packages/core exit 0
npx vitest run packages/cli/src/ui/commands/goalCommand.test.ts packages/cli/src/ui/components/messages/GoalStatusMessage.test.tsx packages/cli/src/ui/utils/restoreGoal.test.ts 37/37 passed across 3 files
npm run typecheck --workspace packages/cli exit 0

Spot check

  • parseJudgeReply impossible-detection (goalJudge.ts:374) — the strict === true works when the structured schema is honored. The post-approval [Suggestion] flagged that string "true" or number 1 from a misbehaving model would fall through silently to "not impossible." Non-blocking but worth folding into a follow-up that widens the predicate or asserts the schema before the cast.
  • isTerminalGoalStatusKind (types.ts:513) — satisfies readonly GoalStatusKind[] + the type guard make GoalTerminalKind a single source of truth; adding a new terminal kind without registering it here produces a tsc error at the assertNever call sites. Confirmed by toggling the array locally — npm run typecheck fails as expected if the constant drifts.
  • goalHook.ts:241 security-rationale comment about not letting untrusted judge text flow into the continuation prompt — preserved as requested.
  • GoalStatusMessage.tsx:86 is a pure functional component; the post-approval React.memo suggestion is a perf nit specifically for Ink scrollback re-renders. Not a correctness issue.

GitHub CI

All 8 required checks green on 9f2430a: Lint / CodeQL ×2 / Test (mac+ubuntu+win) / Classify / Coverage skip.

@wenshao wenshao merged commit f84ddd4 into QwenLM:main May 17, 2026
9 checks passed
mabry1985 added a commit to protoLabsAI/protoCLI that referenced this pull request May 27, 2026
)

Ported from QwenLM#4230, adapted to our goal subsystem
(core/src/goal, single goalEvaluator) rather than upstream's goalJudge.

The goal evaluator can now return a terminal "impossible" verdict — used only
when the condition is genuinely unachievable in this session (self-
contradictory, depends on an unavailable resource, or all reasonable
approaches are exhausted). When it fires, the autonomous /goal loop stops and
the goal is recorded as failed instead of continuing forever.

Guardrails mirror upstream: the assistant claiming impossibility is evidence,
not proof; "impossible" is only honoured alongside met:false; and an evaluator
error or empty reply never fails a goal (stays met:false, continues).

- types: `GoalEvaluationResult.impossible`, `GoalState.failedAt`
- goalEvaluator: prompt + parser support the third verdict shape
- GoalManager: `markImpossible(reason)` / `getLastFailedGoal()`; reset clears it
- client: impossible verdict → markImpossible + stop (no continuation)
- goalCommand: `/goal` status reports an abandoned-as-impossible goal,
  preferring the most recent terminal outcome
- tests: parser, manager, and status-text coverage (60 goal tests pass)

Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
mabry1985 added a commit to protoLabsAI/protoCLI that referenced this pull request May 27, 2026
…sion, SDK canUseTool timeout) (#331)

Documents behavior shipped in 0.51.x–0.52.0 that lacked coverage:

- guides/goal.md: the evaluator's three verdicts (met / not met / impossible),
  the conservative impossible→abandoned terminal state, and how `/goal` status
  reports an abandoned goal. (QwenLM#4230)
- explanation/agent-harness.md: new "Reactive compression" section — proactive
  vs. reactive compression and the one-shot force-compress-and-retry on a
  provider context-overflow rejection. (QwenLM#3879)
- reference/sdk-api.md + contributing/sdk-typescript.md: document the
  `timeout.canUseTool` option and that the SDK now forwards it to the CLI so
  the control-plane timeout matches the callback's. (QwenLM#4491)

(The `--max-tool-calls` / `--max-wall-time` budgets were already documented in
guides/run-headless.md when those features shipped.)

Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type/feature-request New feature or enhancement request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants