Add /goal session continuation command#85723
Conversation
Dependency Changes DetectedThis PR changes dependency-related files. Maintainers should confirm these changes are intentional. Changed files:
Maintainer follow-up:
|
1e24225 to
b6e47e8
Compare
|
Codex review: needs maintainer review before merge. Reviewed May 27, 2026, 2:16 PM ET / 18:16 UTC. Summary PR surface: Source +1036, Tests +1128, Docs +175, Config +38, Generated 0, Other +17. Total +2394 across 41 files. Reproducibility: not applicable. this is a new user-facing feature and SDK capability, not a report of broken current-main behavior. Current-main search shows the requested Review metrics: 3 noteworthy metrics.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Mantis proof suggestion Risk before merge
Maintainer options:
Next step before merge Security Review detailsBest possible solution: Land only after maintainer sign-off on the bundled product surface and SDK contract, with the compatibility and transport risks either explicitly accepted or reduced by an additional live transport proof. Do we have a high-confidence way to reproduce the issue? Not applicable: this is a new user-facing feature and SDK capability, not a report of broken current-main behavior. Current-main search shows the requested Is this the best way to solve the issue? Unclear pending maintainer judgment: the implementation is a coherent bundled plugin plus generic continuation lease API, but whether this is the right core product/API surface must be accepted by maintainers before merge. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against c0f16460d748. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +1036, Tests +1128, Docs +175, Config +38, Generated 0, Other +17. Total +2394 across 41 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
ClawSweeper PR egg ✨ Hatched: 🥚 common Cosmic Test Hopper Hatch commandComment Hatchability rules:
Rarity: 🥚 common. What is this egg doing here?
|
b6e47e8 to
3abbf4a
Compare
|
Addressed the two concrete ClawSweeper code findings in head
Focused verification after the patch: The broader @clawsweeper re-review |
|
🦞👀 Command router queued. I will update this comment with the next step. |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
For the reviewer, this PR depends on #85722 which was closed by @clawsweeper for being superseded by another PR which was about a completely different topic and bears no similarity (as far as I can tell). If this PR is interesting enough, please reopen it or request that it's pulled into this PR |
3abbf4a to
d9650ac
Compare
|
All alerts resolved. Learn more about Socket for GitHub. This PR previously contained dependency changes with security issues that have been resolved, removed, or ignored. |
|
@clawsweeper re-review Current head Please re-review/update the durable PR state for this head; this looks infra-side rather than an author-code failure. |
|
🦞🧹 I asked ClawSweeper to review this item again. |
|
@clawsweeper re-review Current head
I am not asking for another Telegram/Mantis proof yet because the structural proof gate is now passing and the existing Mantis failure was infrastructure-side: Crabbox could not start a Telegram Desktop lease, so no product behavior was exercised. If exact-current-head transport proof is still required after this re-review, I can run a fresh canary proof and/or retry Telegram proof with the lease issue addressed. |
|
🦞👀 Command router queued. I will update this comment with the next step. Re-review progress:
|
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
|
Thanks for pushing this forward. I am going to close this PR and reimplement the feature from the core/thread-goal boundary instead of landing the current bundled-plugin shape. The useful product idea is right, but after comparing it with the Codex goal implementation, this needs to be owned by session/runtime state rather than by a plugin-local JSON store and Cron-backed continuation lease. In Codex, goals are persisted thread state with runtime-owned accounting, hidden continuation context, app/server notifications, and model tools restricted to create/get/update. This PR instead makes I am also not landing this branch mechanically because it is currently conflicting with I will preserve the good parts of the behavior in a replacement PR: core-owned persisted goal state, runtime-owned continuation/accounting, a thin |
Summary
goalextension with/goal help,/goal start,/goal status,/goal events,/goal pause,/goal resume,/goal done, and/goal cleargoal_statustool so the active session can reportcontinue,done,blocked,paused, orwaiting_approvalScope
This PR now carries both the session-scoped continuation lease workflow API and the bundled
/goaluser flow that exercises it. The earlier split-out scaffold PR (#85722) was closed, so this branch targetsmaindirectly rather than depending on a stacked prerequisite.AI assistance and review transparency
401 Unauthorized; I replaced that with manual diff/log review and focused local validation instead of claiming a Codex review pass.Why
The continuation lease API is a runtime primitive, and that is hard to judge without a real user flow.
/goalis that user flow: a human starts an objective, inspects what happened, pauses or resumes continuation, and finishes or clears the goal while the model can request only bounded same-session continuation throughgoal_status.The user experience is not only "let the agent continue." It is also "let the human see whether the agent is behaving well enough to keep continuing." That matters across model tiers: a frontier model may stay on target most of the time, but cheaper or smaller models need tighter rails, clearer stop states, and a visible decision trail so users can trust, pause, resume, or stop work without guessing.
This is why the command surface includes both lifecycle controls and lightweight observability.
/goal statusshows the current state;/goal eventsshows the recent decision trail. The user does not need a dashboard or debug mode to answer the basic question: "is this still doing the right thing?"Safety
goal_statususes trusted tool context, not model-suppliedsessionKeyorgoalIdcontinueis accepted only while the current goal is already incontinuedoneandblockedare terminal for/goal resume; start a new goal to continuepausedstops continuation until a human resumesclearremoves the active user-facing goal state and clears any matching continuation leasewaiting_approval, clears the lease, and tells the user to start a new goalCommands
Tests
Plugin guardrails also passed:
Real behavior proof (required for external PRs)
Behavior or issue addressed: Real bounded same-session goal continuation for the bundled
/goalcommand: start a goal, schedule a continuation lease, resume the same session, visibly announce continuation output, finish asdone, and clear active goal state.Real environment tested: Isolated local canary Gateway profile using the PR checkout, bundled
codex,discord, andgoalplugins, separate canary Discord bot, loopback Gateway, and a private Discord proof channel. Private channel/session IDs are redacted from this PR body.Exact steps or command run after this patch: In the canary Discord proof channel, run
/goal start <objective>, allow the scheduled continuation lease to fire, let the resumed session complete, then inspect the canary goal proof artifact, canary cron run records, active goal session directory, and visible canary bot messages in the proof channel.Evidence after fix (screenshot, recording, terminal capture, console output, redacted runtime log, linked artifact, or copied live output): Copied redacted runtime evidence from the isolated canary profile and private Discord proof channel on pushed head
41ecd3a38da1d64343110cde8926b0dd46009929:Observed result after fix: The canary run created a real session goal, scheduled a real continuation lease, resumed the same Discord session, displayed the continuation count and event log, reached final status
done, and preserved visible proof of the scheduler-backed continuation path.What was not tested: Public Discord identifiers and the raw screen recording are not embedded in this PR body because the proof channel is private. The full recording is available on request.
Proof limitations or environment constraints: The Mantis Telegram proof still did not exercise product behavior because Crabbox could not start a Telegram Desktop lease, so baseline/candidate Telegram capture was skipped. The Discord canary proof above covers the
/goalcontinuation behavior; later current-head commits only addressed unrelated CI/harness/lint drift and did not change the/goalruntime path.Current validation
Local/current-head validation for the source fixes included:
GitHub current-head readback on pushed head
b61b6d0499c46c4e3c5f81c238d67a29733fb6f9is pending after rebasing onto currentmain8f6a2f0f6b119e8eb3e63d53800207fabe78e735. Local targeted validation passed:node scripts/run-vitest.mjs src/cron/isolated-agent/run.message-tool-policy.test.ts src/agents/embedded-agent-runner/run/attempt.test.ts src/agents/embedded-agent-runner/run/attempt-tool-construction-plan.test.ts src/plugins/contracts/scheduled-turns.contract.test.ts src/plugins/loader.test.ts extensions/goal/index.test.ts src/plugins/channel-plugin-ids.test.ts test/scripts/openclaw-e2e-instance.test.ts;pnpm plugin-sdk:api:check; andgit diff --check.