Roadmap: harden /goal into a reliable long-horizon workflow primitive

## Summary

Now that the base `/goal` design has been discussed in #4074 and the Stop-hook runaway cap follow-up was captured in #4206, it would be useful to track the next product/technical roadmap for making goal-driven work reliable in Qwen Code.

This issue is intentionally **not** about the first `/goal` command only. It is a staged plan for turning `/goal` from a single slash command into a dependable long-horizon workflow primitive: users set an outcome, Qwen Code keeps making progress, shows why it is continuing or stopping, and fails safely when the goal is impossible or underspecified.

## Motivation

`/goal` is valuable because it removes repeated "keep going" prompts from multi-step work. But the first implementation should be followed by a few layers that make the feature trustworthy in real sessions:

- clear progress visibility while the goal is active;
- strict but debuggable goal evaluation;
- safe stop / recovery behavior for impossible goals;
- resume/non-interactive semantics;
- observability for why the loop continued or ended;
- reusable infrastructure for future agent workflows beyond `/goal`.

Without these follow-ups, `/goal` risks becoming either too silent (user cannot tell what is happening) or too aggressive (agent keeps looping without useful progress).

## Proposed staged roadmap

### Phase 1 — UX hardening for the active goal

Goal: make `/goal` understandable and controllable during interactive use.

- Show a stable active-goal indicator in the footer/status area.
- `/goal` with no args should show:
  - current condition;
  - elapsed time or turns since set;
  - latest judge result/reason, if available;
  - how to clear or replace it.
- `/goal <new condition>` should replace the current goal, not stack multiple goals.
- `/goal clear` should remove the active hook and clearly report that auto-continuation is disabled.
- Keep the visible text compact; avoid dumping the full judge prompt or transcript.

Acceptance criteria:

- [ ] Users can always tell whether a goal is active.
- [ ] Users can inspect the latest reason the goal has not been met.
- [ ] Users can clear or replace a goal without restarting the session.

### Phase 2 — Judge reliability and evidence discipline

Goal: make the LLM judge strict enough to be useful, but not a black box.

- Feed the judge only the relevant transcript window since the goal was set, plus compact tool results/evidence.
- Bias the judge toward **not met** when required evidence is missing, while requiring concise reasons.
- Ensure the main agent is encouraged to surface concrete evidence before stopping, e.g. test output, lint output, `git status`, created PR URL, etc.
- Record the latest judge decision in session state for `/goal` status display.
- Add structured tests for:
  - met vs not met;
  - missing evidence;
  - partial completion;
  - impossible/ambiguous goals.

Acceptance criteria:

- [ ] The judge returns `{ met: boolean, reason: string }` or equivalent structured output.
- [ ] Missing evidence is treated as not met.
- [ ] The latest judge reason is available to the UI/status path.

### Phase 3 — Runaway protection and recovery

Goal: prevent `/goal` from trapping users in long loops.

Related: #4206.

- Add or align with a generic consecutive Stop/SubagentStop hook blocking cap.
- Surface a concise warning when the cap is reached, e.g. goal blocked stopping N times and was paused/overridden.
- Consider pausing rather than deleting the goal when the cap is reached, so the user can refine or clear it.
- Keep a broader max-iteration cap as a second safety net.
- Ensure cancellation/interrupt works immediately even when the goal hook is active.

Acceptance criteria:

- [ ] An impossible goal cannot continue indefinitely.
- [ ] The user sees why auto-continuation stopped.
- [ ] Ctrl+C / cancellation remains responsive.

### Phase 4 — Resume and non-interactive semantics

Goal: define how `/goal` behaves outside a single live TUI session.

Open design questions:

- Should active goals restore on `/resume` / `--continue`, or should they be session-only?
- If restored, should elapsed/turn counters reset or persist?
- In non-interactive mode, should `qwen -p "/goal npm test exits 0"` keep running until the judge marks it met, with the same caps?
- What summary should be printed when a headless goal ends because it was met, capped, interrupted, or failed?

Suggested direction:

- Start with interactive session-scoped goals only.
- Add explicit non-interactive support only after caps, status summaries, and cancellation are stable.
- If resume support is added, restore the goal condition but reset operational counters to avoid stale loop accounting.

Acceptance criteria:

- [ ] The chosen resume behavior is documented.
- [ ] Non-interactive behavior is either explicitly unsupported or tested.
- [ ] Headless runs have clear terminal output for met/capped/interrupted outcomes.

### Phase 5 — Generalize into an agent workflow primitive

Goal: make `/goal` a stepping stone toward broader long-horizon agent control.

Possible follow-ups:

- Goal templates: `tests green`, `PR ready`, `issue triaged`, `migration complete`.
- Goal-aware todo integration: map judge reasons into todo updates without requiring manual prompting.
- Goal progress events for logs/telemetry/debugging.
- A reusable `GoalController` or `ContinuationController` abstraction instead of keeping all logic inside one slash command.
- Integration with future plan/checkpoint/session-management work.

Acceptance criteria:

- [ ] `/goal` logic is not tightly coupled to one command if future continuation workflows need it.
- [ ] The same continuation/evaluation primitives can support plan execution, PR readiness checks, or CI self-fix loops.

## Suggested implementation slicing

1. **Small PR:** active-goal status display + `/goal` inspect/replace/clear polish.
2. **Small PR:** store and expose latest judge decision/reason.
3. **Safety PR:** generic Stop-hook blocking cap and capped-loop UI warning (#4206).
4. **Design issue/PR:** document resume and non-interactive semantics before enabling them broadly.
5. **Refactor PR:** extract goal loop state/evaluation into a reusable controller if the command grows too large.

## Risks / mitigations

- **False positive completion** → require evidence in transcript; allow users to re-set the goal with a stricter condition.
- **False negative completion** → cap consecutive blocks; show latest judge reason so the user can refine the condition.
- **Cost from repeated side queries** → use a fast model and a bounded transcript window.
- **User confusion** → visible active-goal pill plus `/goal` status command.
- **Over-engineering** → keep Phase 1-3 small and defer templates/general controller until usage proves the need.

## Related issues

- #4074 — base `/goal` slash command proposal.
- #4206 — Stop-hook blocking cap / runaway-loop guardrail.

## Why track this separately?

#4074 is the base feature. #4206 is one safety guardrail. This issue tracks the broader product direction: making goal-driven work observable, controllable, resumable, and reusable as Qwen Code evolves toward longer-running agent workflows.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roadmap: harden /goal into a reliable long-horizon workflow primitive #4228

Summary

Motivation

Proposed staged roadmap

Phase 1 — UX hardening for the active goal

Phase 2 — Judge reliability and evidence discipline

Phase 3 — Runaway protection and recovery

Phase 4 — Resume and non-interactive semantics

Phase 5 — Generalize into an agent workflow primitive

Suggested implementation slicing

Risks / mitigations

Related issues

Why track this separately?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Roadmap: harden /goal into a reliable long-horizon workflow primitive #4228

Description

Summary

Motivation

Proposed staged roadmap

Phase 1 — UX hardening for the active goal

Phase 2 — Judge reliability and evidence discipline

Phase 3 — Runaway protection and recovery

Phase 4 — Resume and non-interactive semantics

Phase 5 — Generalize into an agent workflow primitive

Suggested implementation slicing

Risks / mitigations

Related issues

Why track this separately?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions