Pitch
Codex parity. Goal mode is the missing UX primitive for "work on this thing well" sessions. The user states an objective; the agent acquires it, plans, executes, and self-verifies against a checklist before declaring it complete. Persisted at the thread level, survives session restarts, has an optional token budget that triggers a graceful wind-down rather than a hard stop.
Reference design
Codex-main implements this in `/Volumes/VIXinSSD/codex-main/codex-rs/core/src/goals.rs` (1593 LOC) plus:
- `tools/src/goal_tool.rs` — three tools: `get_goal`, `create_goal`, `update_goal`
- `core/templates/goals/continuation.md` — continuation prompt re-injecting objective + budget every turn
- `core/templates/goals/budget_limit.md` — graceful wind-down when budget exhausts
- `tui/src/chatwidget/goal_menu.rs` — UI surface
- `tui/src/app/thread_goal_actions.rs` — actions
We don't need to copy the code (license check first), but the design is well-validated and worth following:
State machine
- `active` — pursuing the objective
- `paused` — user-initiated halt; resumable
- `complete` — model called `update_goal status: complete` after self-verification
- `budget_limited` — token budget exhausted; wind-down prompt active
Anti-rationalization safeguards
The continuation prompt is explicit:
Do not rely on intent, partial progress, elapsed effort, memory of earlier work, or a plausible final answer as proof of completion. … Treat uncertainty as not achieved; do more verification or continue the work.
The `update_goal` tool only exposes `status: complete` — pause/resume/budget-limit are user-controlled, not model-controlled. Critical to prevent the model from rationalizing its way out of unfinished work.
Three tools
- `create_goal` — fails if a goal already exists (one active goal per thread)
- `get_goal` — read-only, returns objective + status + budget + remaining
- `update_goal` — only `status: complete` allowed
Scope: split across multiple PRs
Phase A — minimal goal lifecycle:
Phase B — token budget:
Phase C — UI polish:
Phase D — DeepSeek-unique: V4 cycle integration:
Acceptance (Phase A)
Open questions
- Multiple goals per thread? Codex says one. Probably right — but a goal stack might map cleanly onto our cycle_manager.
- Goal templates? Codex doesn't have them; we could ship 3-4 starter templates ("fix issue", "refactor module", "add feature") that pre-fill the objective + verification checklist.
- Differentiator beyond parity: a goal could emit a PR-attempt envelope automatically when it self-marks complete. That would close the loop in a way codex doesn't.
Pitch
Codex parity. Goal mode is the missing UX primitive for "work on this thing well" sessions. The user states an objective; the agent acquires it, plans, executes, and self-verifies against a checklist before declaring it complete. Persisted at the thread level, survives session restarts, has an optional token budget that triggers a graceful wind-down rather than a hard stop.
Reference design
Codex-main implements this in `/Volumes/VIXinSSD/codex-main/codex-rs/core/src/goals.rs` (1593 LOC) plus:
We don't need to copy the code (license check first), but the design is well-validated and worth following:
State machine
Anti-rationalization safeguards
The continuation prompt is explicit:
The `update_goal` tool only exposes `status: complete` — pause/resume/budget-limit are user-controlled, not model-controlled. Critical to prevent the model from rationalizing its way out of unfinished work.
Three tools
Scope: split across multiple PRs
Phase A — minimal goal lifecycle:
Phase B — token budget:
Phase C — UI polish:
Phase D — DeepSeek-unique: V4 cycle integration:
Acceptance (Phase A)
Open questions