Skip to content

Roadmap: harden /goal into a reliable long-horizon workflow primitive #4228

@qqqys

Description

@qqqys

Summary

Now that the base /goal design has been discussed in #4074 and the Stop-hook runaway cap follow-up was captured in #4206, it would be useful to track the next product/technical roadmap for making goal-driven work reliable in Qwen Code.

This issue is intentionally not about the first /goal command only. It is a staged plan for turning /goal from a single slash command into a dependable long-horizon workflow primitive: users set an outcome, Qwen Code keeps making progress, shows why it is continuing or stopping, and fails safely when the goal is impossible or underspecified.

Motivation

/goal is valuable because it removes repeated "keep going" prompts from multi-step work. But the first implementation should be followed by a few layers that make the feature trustworthy in real sessions:

  • clear progress visibility while the goal is active;
  • strict but debuggable goal evaluation;
  • safe stop / recovery behavior for impossible goals;
  • resume/non-interactive semantics;
  • observability for why the loop continued or ended;
  • reusable infrastructure for future agent workflows beyond /goal.

Without these follow-ups, /goal risks becoming either too silent (user cannot tell what is happening) or too aggressive (agent keeps looping without useful progress).

Proposed staged roadmap

Phase 1 — UX hardening for the active goal

Goal: make /goal understandable and controllable during interactive use.

  • Show a stable active-goal indicator in the footer/status area.
  • /goal with no args should show:
    • current condition;
    • elapsed time or turns since set;
    • latest judge result/reason, if available;
    • how to clear or replace it.
  • /goal <new condition> should replace the current goal, not stack multiple goals.
  • /goal clear should remove the active hook and clearly report that auto-continuation is disabled.
  • Keep the visible text compact; avoid dumping the full judge prompt or transcript.

Acceptance criteria:

  • Users can always tell whether a goal is active.
  • Users can inspect the latest reason the goal has not been met.
  • Users can clear or replace a goal without restarting the session.

Phase 2 — Judge reliability and evidence discipline

Goal: make the LLM judge strict enough to be useful, but not a black box.

  • Feed the judge only the relevant transcript window since the goal was set, plus compact tool results/evidence.
  • Bias the judge toward not met when required evidence is missing, while requiring concise reasons.
  • Ensure the main agent is encouraged to surface concrete evidence before stopping, e.g. test output, lint output, git status, created PR URL, etc.
  • Record the latest judge decision in session state for /goal status display.
  • Add structured tests for:
    • met vs not met;
    • missing evidence;
    • partial completion;
    • impossible/ambiguous goals.

Acceptance criteria:

  • The judge returns { met: boolean, reason: string } or equivalent structured output.
  • Missing evidence is treated as not met.
  • The latest judge reason is available to the UI/status path.

Phase 3 — Runaway protection and recovery

Goal: prevent /goal from trapping users in long loops.

Related: #4206.

  • Add or align with a generic consecutive Stop/SubagentStop hook blocking cap.
  • Surface a concise warning when the cap is reached, e.g. goal blocked stopping N times and was paused/overridden.
  • Consider pausing rather than deleting the goal when the cap is reached, so the user can refine or clear it.
  • Keep a broader max-iteration cap as a second safety net.
  • Ensure cancellation/interrupt works immediately even when the goal hook is active.

Acceptance criteria:

  • An impossible goal cannot continue indefinitely.
  • The user sees why auto-continuation stopped.
  • Ctrl+C / cancellation remains responsive.

Phase 4 — Resume and non-interactive semantics

Goal: define how /goal behaves outside a single live TUI session.

Open design questions:

  • Should active goals restore on /resume / --continue, or should they be session-only?
  • If restored, should elapsed/turn counters reset or persist?
  • In non-interactive mode, should qwen -p "/goal npm test exits 0" keep running until the judge marks it met, with the same caps?
  • What summary should be printed when a headless goal ends because it was met, capped, interrupted, or failed?

Suggested direction:

  • Start with interactive session-scoped goals only.
  • Add explicit non-interactive support only after caps, status summaries, and cancellation are stable.
  • If resume support is added, restore the goal condition but reset operational counters to avoid stale loop accounting.

Acceptance criteria:

  • The chosen resume behavior is documented.
  • Non-interactive behavior is either explicitly unsupported or tested.
  • Headless runs have clear terminal output for met/capped/interrupted outcomes.

Phase 5 — Generalize into an agent workflow primitive

Goal: make /goal a stepping stone toward broader long-horizon agent control.

Possible follow-ups:

  • Goal templates: tests green, PR ready, issue triaged, migration complete.
  • Goal-aware todo integration: map judge reasons into todo updates without requiring manual prompting.
  • Goal progress events for logs/telemetry/debugging.
  • A reusable GoalController or ContinuationController abstraction instead of keeping all logic inside one slash command.
  • Integration with future plan/checkpoint/session-management work.

Acceptance criteria:

  • /goal logic is not tightly coupled to one command if future continuation workflows need it.
  • The same continuation/evaluation primitives can support plan execution, PR readiness checks, or CI self-fix loops.

Suggested implementation slicing

  1. Small PR: active-goal status display + /goal inspect/replace/clear polish.
  2. Small PR: store and expose latest judge decision/reason.
  3. Safety PR: generic Stop-hook blocking cap and capped-loop UI warning (Consider adding a configurable Stop hook blocking cap for /goal loops #4206).
  4. Design issue/PR: document resume and non-interactive semantics before enabling them broadly.
  5. Refactor PR: extract goal loop state/evaluation into a reusable controller if the command grows too large.

Risks / mitigations

  • False positive completion → require evidence in transcript; allow users to re-set the goal with a stricter condition.
  • False negative completion → cap consecutive blocks; show latest judge reason so the user can refine the condition.
  • Cost from repeated side queries → use a fast model and a bounded transcript window.
  • User confusion → visible active-goal pill plus /goal status command.
  • Over-engineering → keep Phase 1-3 small and defer templates/general controller until usage proves the need.

Related issues

Why track this separately?

#4074 is the base feature. #4206 is one safety guardrail. This issue tracks the broader product direction: making goal-driven work observable, controllable, resumable, and reusable as Qwen Code evolves toward longer-running agent workflows.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions