Skip to content

v0.8.6 feat: Goal mode — stated objective with token budget, self-verification, and continuation prompts #397

@Hmbown

Description

@Hmbown

Pitch

Codex parity. Goal mode is the missing UX primitive for "work on this thing well" sessions. The user states an objective; the agent acquires it, plans, executes, and self-verifies against a checklist before declaring it complete. Persisted at the thread level, survives session restarts, has an optional token budget that triggers a graceful wind-down rather than a hard stop.

Reference design

Codex-main implements this in `/Volumes/VIXinSSD/codex-main/codex-rs/core/src/goals.rs` (1593 LOC) plus:

  • `tools/src/goal_tool.rs` — three tools: `get_goal`, `create_goal`, `update_goal`
  • `core/templates/goals/continuation.md` — continuation prompt re-injecting objective + budget every turn
  • `core/templates/goals/budget_limit.md` — graceful wind-down when budget exhausts
  • `tui/src/chatwidget/goal_menu.rs` — UI surface
  • `tui/src/app/thread_goal_actions.rs` — actions

We don't need to copy the code (license check first), but the design is well-validated and worth following:

State machine

  • `active` — pursuing the objective
  • `paused` — user-initiated halt; resumable
  • `complete` — model called `update_goal status: complete` after self-verification
  • `budget_limited` — token budget exhausted; wind-down prompt active

Anti-rationalization safeguards

The continuation prompt is explicit:

Do not rely on intent, partial progress, elapsed effort, memory of earlier work, or a plausible final answer as proof of completion. … Treat uncertainty as not achieved; do more verification or continue the work.

The `update_goal` tool only exposes `status: complete` — pause/resume/budget-limit are user-controlled, not model-controlled. Critical to prevent the model from rationalizing its way out of unfinished work.

Three tools

  • `create_goal` — fails if a goal already exists (one active goal per thread)
  • `get_goal` — read-only, returns objective + status + budget + remaining
  • `update_goal` — only `status: complete` allowed

Scope: split across multiple PRs

Phase A — minimal goal lifecycle:

Phase B — token budget:

  • Optional `--budget ` flag on `/goal`
  • Token tracking per-goal (additive across the goal's turns)
  • Budget-limit prompt swap when exhausted
  • Footer chip showing remaining budget

Phase C — UI polish:

  • Goal menu (active / paused / complete / budget_limited states)
  • `/goal pause` / `/goal resume` / `/goal clear`
  • Snapshot tests for each menu state
  • /help section

Phase D — DeepSeek-unique: V4 cycle integration:

  • Goal continuation prompt mentions cycle boundaries and uses our cycle_manager
  • Cycle-aware budget accounting (cache hits don't drain budget the same way)
  • Differentiator from codex: V4-Pro vs V4-Flash routing decisions are budget-aware

Acceptance (Phase A)

Open questions

  • Multiple goals per thread? Codex says one. Probably right — but a goal stack might map cleanly onto our cycle_manager.
  • Goal templates? Codex doesn't have them; we could ship 3-4 starter templates ("fix issue", "refactor module", "add feature") that pre-fill the objective + verification checklist.
  • Differentiator beyond parity: a goal could emit a PR-attempt envelope automatically when it self-marks complete. That would close the loop in a way codex doesn't.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestv0.8.6Targeting v0.8.6

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions