Skip to content

v0.22.10 fix: autopilot-cycle handler forwards job.data.phases to runCycle#521

Merged
garrytan merged 5 commits intomasterfrom
fix/autopilot-cycle-phase-passthrough
Apr 30, 2026
Merged

v0.22.10 fix: autopilot-cycle handler forwards job.data.phases to runCycle#521
garrytan merged 5 commits intomasterfrom
fix/autopilot-cycle-phase-passthrough

Conversation

@garrytan
Copy link
Copy Markdown
Owner

@garrytan garrytan commented Apr 29, 2026

Summary

autopilot-cycle Minion handler in src/commands/jobs.ts now forwards job.data.phases to runCycle(). Previously the array was accepted via MinionJobInput.params but discarded before dispatch — every cycle ran the full 6-phase pipeline regardless of caller intent. Bit hard in production when embed accumulated a 17K-chunk backlog and every 5-minute autopilot cron submitted a job that timed out at 30 min.

Behavior:

  • phases: ["lint","backlinks"] → runs only those two phases (canonical order, not caller order — runCycle keys on phases.includes(...)).
  • phases: ["BAD"] → all names filtered, falls back to default 6 phases (caller's input was unrecoverable).
  • phases: [] or non-array → falls back to default 6 phases (prior behavior preserved).
  • phases: undefined → default 6 phases (unchanged).
  • Phase names validated against ALL_PHASES from src/core/cycle.ts (set lookup, no injection surface).

Test Coverage

  • 4 new test cases in test/handlers.test.ts under autopilot-cycle handler — phase passthrough:
    • job.data.phases restricts which phases run — valid phases forwarded
    • invalid phase names in job.data.phases are filtered out — bogus names dropped
    • empty phases array falls back to all phases — same as no phases
    • non-array phases value is ignored — string "lint" ignored
  • Source-level regression guard window in test/cycle-abort.test.ts widened (500 → 2000 chars) so it still finds signal: job.signal after the new validation block was added between worker.register('autopilot-cycle', …) and the runCycle(…) call.
  • Coverage: 100% on the new code paths.

Pre-Landing Review

Manual review of the 9-LoC production change:

  • No SQL, no LLM trust boundary, no auth surface.
  • Filter is exhaustive across all four input shapes (array+valid, array+invalid, empty, non-array).
  • Behavior preserved for callers that don't pass phases at all.
  • No issues found.

Adversarial Review

Independent subagent review (fresh context, no checklist bias). Findings ranked:

  • Medium — Duplicate phases (["embed","embed"]) are not deduped before forwarding. Low blast radius: runCycle keys on phases.includes(...) so each phase runs at most once. Tidiness fix ([...new Set(filtered)]) is harmless but not required for v0.22.10.
  • Low — Empty-after-filter falls through to the default 6-phase cycle. A caller passing all-bad phase names silently runs the slow default. Distinguishing "filtered to empty" from "not specified" is a future enhancement; current behavior is "best effort run something."
  • Low — Two await import('../core/cycle.ts') calls in close succession. Bun's module cache makes the second a no-op lookup, but they could be combined into one destructured import for tidiness.

None blocking. All three are filed mentally as future cleanup; the production fix is correct and the test suite is comprehensive.

Eval Results

No prompt-related files changed — evals skipped.

Greptile Review

No Greptile comments on the PR.

Plan Completion

No plan file — this is a direct response to a production incident. The CHANGELOG entry stands in for a plan: production observation → root cause → fix → tests.

Verification Results

bun test test/cycle-abort.test.ts test/handlers.test.ts15 pass, 0 fail locally.

CI on prior commit (d95f1d2):

  • test (1) test (2) test (3) test (4) — pass
  • Tier 1 (Mechanical) — pass (after one rerun; flake unrelated to this PR)
  • gitleaks — pass

TODOS

No items to mark complete (this PR is a hotfix from production observation, not a planned TODO).

Documentation

Test plan

  • bun test test/cycle-abort.test.ts test/handlers.test.ts passes (15/15)
  • CI test shards (1)–(4) green on the prior commit
  • Tier 1 (Mechanical) passes after rerun (flake)
  • VERSION + package.json synced to 0.22.10
  • CHANGELOG has v0.22.10 entry
  • CLAUDE.md annotation reflects the new behavior

🤖 Generated with Claude Code

Wintermute and others added 4 commits April 29, 2026 22:56
The autopilot-cycle handler always ran ALL_PHASES regardless of job data.
This caused production stalls when the embed phase had a large backlog
(17K+ stale chunks) that exceeded the 30-minute job timeout. Every 5-min
cycle would start, hit the embed wall, stall, and get force-killed —
creating an infinite stall loop that kept the queue perpetually unhealthy.

The fix validates job.data.phases against ALL_PHASES (preventing injection)
and forwards the selected phases to runCycle(). Callers can now submit
fast cycles (lint+backlinks+sync+extract) on a 5-min cron and run embed
separately with a longer timeout during off-peak hours.

If phases is omitted, not an array, or filters to empty, behavior is
unchanged (all phases run).

Tests: 4 new cases covering phase restriction, invalid name filtering,
empty array fallback, and non-array type safety.
The regression guard sliced the first 500 chars after `worker.register('autopilot-cycle'`
and asserted `signal: job.signal` was present. The phase-validation block added in
787ec7d pushed the signal arg past that boundary, so CI test shard 3 failed even
though the handler still propagates the signal correctly. Bump the window to 2000.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Note autopilot-cycle phases passthrough fix on the src/commands/jobs.ts
key-files annotation so future readers know the handler honors
job.data.phases (validated against ALL_PHASES) as of v0.22.10.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@garrytan garrytan changed the title fix: autopilot-cycle handler forwards job.data.phases to runCycle v0.22.10 fix: autopilot-cycle handler forwards job.data.phases to runCycle Apr 30, 2026
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@garrytan garrytan merged commit 5d9dc43 into master Apr 30, 2026
7 checks passed
garrytan added a commit that referenced this pull request Apr 30, 2026
PR #506 claims v0.22.15, PR #521 claims v0.22.10, intermediate slots
(.11/.12/.13/.14) are claimed by other open PRs. v0.22.16 is the next
clean PATCH slot. v0.23.0 is claimed by PR #462 so MINOR isn't free.
This release fits the 0.22.x train; v0.23.0 lands when #462 ships.

Updates VERSION, package.json, CHANGELOG.md header, TODOS.md follow-up
labels. Code is unchanged.
garrytan added a commit that referenced this pull request Apr 30, 2026
…arness (#522)

* feat: hermeticity migration — every $GBRAIN_HOME write site honors the env override

configDir() in src/core/config.ts already implemented $GBRAIN_HOME as a
parent-dir override (returns <override>/.gbrain), but ~12 consumers built paths
from os.homedir() directly and bypassed it. Critically, loadConfig/saveConfig
themselves used a private getConfigDir() that ignored the env. Fixed.

Migrated every write site to gbrainPath() — fail-improve, validator-lint, cycle
lock, shell-audit, backpressure-audit, sync-failures, integrity logs,
integrations heartbeat, init pglite path, migrate-engine manifest, import
checkpoint, v0_13_1 rollback, v0_14_0 host-work. Read-side host-detection in
init.ts (~/.claude / ~/.openclaw probes) intentionally NOT migrated; that's a
v1.1 follow-up under a separate $GBRAIN_HOST_HOME override.

Adds gbrainPath(...segments) sugar plus path validation: $GBRAIN_HOME must be
absolute and contain no '..' segments (throws GbrainHomeInvalidError).

test/gbrain-home-isolation.test.ts proves write-isolation across all migrated
sites. test/migrations-v0_14_0.test.ts updated to use $GBRAIN_HOME instead of
the old HOME-swap pattern.

Closes part of the claw-test E2E harness preconditions (D13 + D21).

* feat: gbrain friction {log,render,list,summary} — agent friction reporter

Append-only JSONL writer at $GBRAIN_HOME/friction/<run-id>.jsonl. Schema is a
flat extension of StructuredAgentError (D20), one envelope shape across both
agent-emitted entries and harness-wrapped command failures. Run-id resolves
from --run-id > $GBRAIN_FRICTION_RUN_ID > 'standalone'.

Subcommands stay ≤30 LOC each; core lives in src/core/friction.ts (writer +
reader + renderer + redactor). render --redact (default for md output) strips
\$HOME / \$CWD to placeholders so reports paste safely in PRs/issues.

Severity: confused | error | blocker | nit. Kind: friction | delight (D7) |
phase-marker | interrupted. Readers tolerate malformed lines (skip + warn).

40 unit tests; this is the channel the claw-test harness writes to and that
agents emit through during live-mode runs.

* feat: gbrain claw-test — end-to-end fresh-install friction harness

Two modes: scripted (CI gate, no agent) and --live (real agent subprocess).
Phases: setup → install_brain (gbrain init --pglite) → import (--no-embed) →
query → extract all --source fs → verify (gbrain doctor --json, asserts
status==='ok' and progress.jsonl phase coverage).

AgentRunner interface + registry — interface stays narrow (detect, invoke,
optional postInstallHook). v1 ships only OpenClawRunner; the registry pattern
lets v1.1 land hermes/codex as ~50-line additions without refactoring callers.
OpenClaw invocation: 'openclaw agent --local --agent <name> --message <brief>'
matching test/e2e/skills.test.ts (NOT --prompt-file, which doesn't exist).

transcript-capture: spawns child with piped stdio, async-drains via
fs.createWriteStream + 'drain' events so 256KB+ bursts don't stall the child
(D17 backpressure). Writes <run>/transcript.jsonl with schema_version + ts +
channel + byte_offset + bytes_b64. Friction entries' transcript_offset field
references byte offsets here so render --transcripts can resolve back.

progress-tail: parses gbrain's --progress-json events out of child stderr.
Phase verification asserts each scenario.expected_phases entry (dotted names
like import.files, extract.links_fs, doctor.db_checks) saw at least one event
from the actual command — proves the COMMAND ran, not that the agent obeyed
prompts.

seed-pglite: ~50 LOC SQL replay primitive for the upgrade-from-v0.18 scenario.
Existing migration helpers (test/e2e/helpers.ts) are Postgres-only; PGLite has
no equivalent. seedPglite opens a fresh PGLite, executes each statement
individually (errors name the failing one), then disconnects so gbrain init
can take over and walk forward.

53 unit tests covering registry selection, runner detection, multi-byte UTF-8
chunk-boundary safety, PIPE buffer drain, scenario load+validate, progress
event parsing, and SQL splitter.

* feat: claw-test scenario fixtures + friction-protocol skills convention

Two scenarios ship in v1 — fresh-install and upgrade-from-v0.18. Each is a
self-contained directory: brain/ (markdown pages), BRIEF.md (live-mode prompt),
expected.json (scripted-mode assertions), scenario.json (kind, expected_phases,
optional from_version + seed paths). Schema is owned by src/core/claw-test/
scenarios.ts.

upgrade-from-v0.18 ships scaffolded — seed/dump.sql is the v1.1 follow-up
(needs a real v0.18-shape PGLite dump; seed/README.md documents the gen
procedure). The harness gracefully no-ops the seed phase when dump.sql is
absent.

skills/_friction-protocol.md is a cross-cutting convention skill (like
_brain-filing-rules.md). Tells agents when to call gbrain friction log and how
to choose severity. Skills the claw-test exercises will gain a > Convention:
callout pointing here in a v1.1 sweep.

13 unit tests for the scenario loader + 'shipped scenarios load cleanly' for
both.

* feat: register gbrain claw-test + gbrain friction; CLAUDE.md + llms sync

Wires both commands into src/cli.ts CLI_ONLY allow-list and adds dispatch
in handleCliOnly so neither command requires a brain engine connection.

CLAUDE.md gains entries for src/commands/{friction,claw-test}.ts +
src/core/claw-test/ + skills/_friction-protocol.md, and a Commands section
listing all 8 new gbrain claw-test ... and gbrain friction ... invocations
with the v0.23 marker. Documents the GBRAIN_HOME write-isolation contract
and the v1 caveat (read-side host-fingerprint detection deferred to v1.1).
llms.txt + llms-full.txt regenerated via 'bun run build:llms' so the
committed generator-output gate passes.

test/e2e/claw-test.test.ts is the scripted-mode E2E. Builds a tiny shim that
delegates to 'bun run src/cli.ts' (NOT bun --compile, which doesn't bundle
PGLite's runtime assets), points the harness at it via GBRAIN_BIN_OVERRIDE,
runs --scenario fresh-install end-to-end. Asserts exit 0, zero error/blocker
friction. Includes a deliberate-break test that proves the friction signal
fires when a phase command rejects.

test/claw-test-cli.test.ts covers shipped-scenario load + agent registry +
OpenClawRunner detection (relative-path / .. / missing-bin guards) + the
GBRAIN_FRICTION_RUN_ID env handoff between harness and friction CLI.

Closes the v0.23 claw-test E2E feature.

* chore: bump version and changelog (v0.24.0)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(tests): typecheck failures + spawnWithCapture timeout headroom in CI

Three CI fixes after PR #522 landed:

1. test/agent-runner.test.ts:89 — UnavailableRunner.invoke() returns
   Promise<void> by default but the AgentRunner contract requires
   Promise<InvokeResult>. Annotate the throw-only invoke explicitly so tsc
   sees the contract is satisfied (the throw makes the body unreachable as
   far as the return type is concerned).

2. test/seed-pglite.test.ts — bun:test signature is test(name, fn, timeoutMs:
   number), not test(name, opts: {timeout}, fn). The {timeout: 30_000} object
   form was a guess that tsc on bun 1.3.13 rejects. Move the 30s cap to the
   trailing positional number arg on each PGLite-using test.

3. test/transcript-capture.test.ts — `spawnWithCapture > timeout fires
   SIGTERM/SIGKILL` blew the 10s outer cap on the GitHub runner. Two fixes:
   (a) use `exec sleep` so the child we spawn IS sleep — SIGTERM goes
   directly to it, no `/bin/sh` fork-vs-exec process-group ambiguity that
   could orphan the sleep and force the SIGKILL grace path. (b) bump outer
   cap to 30s for headroom even when the runner is slow and SIGKILL after
   the 5s grace is what actually ends the child.

* chore: rebump to v0.22.16 (next free 0.22.x patch slot per queue)

PR #506 claims v0.22.15, PR #521 claims v0.22.10, intermediate slots
(.11/.12/.13/.14) are claimed by other open PRs. v0.22.16 is the next
clean PATCH slot. v0.23.0 is claimed by PR #462 so MINOR isn't free.
This release fits the 0.22.x train; v0.23.0 lands when #462 ships.

Updates VERSION, package.json, CHANGELOG.md header, TODOS.md follow-up
labels. Code is unchanged.

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant