v0.28.3 feat(recipes): restart-sweep — detect dropped Telegram messages after gateway restarts by garrytan-agents · Pull Request #675 · garrytan/gbrain

garrytan-agents · 2026-05-06T13:14:48Z

Summary

Reshape PR #675's recipes/restart-sweep/ directory into a single self-contained recipes/restart-sweep.md recipe with the (fixed) script inlined as a fenced code block. Apply 8 code-quality fixes, port + extend the test suite to bun:test (12 ported + 14 new = 26 cases + 1 sentinel guard = 27 total).

Why land it as a recipe, not a default behavior: restart-sweep is host-specific to OpenClaw + Telegram + webhook mode. CLAUDE.md is explicit that host-specific operational tooling lives as plugin handlers in the host's own repo, not in gbrain core. So it ships as an opt-in recipe alongside twilio-voice-brain, email-to-brain, etc. — discoverable via gbrain integrations list, only "configured" when the user sets the OpenClaw envs. The recipe body documents the v2 upgrade path: registered Minion handler in the openclaw repo against gbrain/minions (see docs/guides/plugin-handlers.md).

Commits:

feat(recipes): reshape restart-sweep into single .md recipe + harden script — the meat of the change. New recipes/restart-sweep.md with frontmatter (no expect_exit_code, schema doesn't have it) + agent-facing setup body + inlined ~325-line script with 8 fixes. New test/restart-sweep.test.ts with 27 bun:test cases anchored on a  sentinel comment. Old recipes/restart-sweep/ directory deleted.
chore: bump version and changelog (v0.28.3) — VERSION + package.json + CHANGELOG entry written in the GStack release-summary voice.
docs: sync README + CLAUDE.md for v0.28.3 restart-sweep recipe — README's recipes table gets the new row, CLAUDE.md's test inventory gets the new test annotation, llms-full.txt regenerated via bun run build:llms.

Test Coverage

CODE PATHS                                            STATUS
[+] recipes/restart-sweep.md (inlined script ~325 lines)
  ├── determineAlertMode (3 modes)                    ✓ 3 cases
  ├── filterTelegramSessions (3 paths)                ✓ 3 cases
  ├── detectDroppedMessages
  │   ├── abortedLastRun primary                      ✓ tested
  │   ├── topic extraction                            ✓ tested
  │   ├── malformed key fallback                      ✓ tested
  │   ├── AGGRESSIVE=unset (silent)                   ✓ NEW
  │   └── AGGRESSIVE=1 (fires)                        ✓ NEW
  ├── timing window correctness                        ✓ NEW
  ├── log timestamp regex (Gateway + OpenClaw)        ✓ 2 cases
  ├── loadAlerted (missing/corrupt/prune)             ✓ 3 NEW
  ├── saveAlerted (atomic tmp+rename)                 ✓ NEW
  ├── cooldown layer (not-in-map / suppress / expire) ✓ 3 NEW
  ├── round-trip (2nd invocation skips alerted)       ✓ NEW
  ├── alert formatting (real \n)                      ✓ NEW
  ├── execFile argv shape (no shell)                  ✓ NEW
  ├── GBRAIN_HOME path override                       ✓ NEW
  ├── constructor-time env reads                      ✓ NEW
  └── sentinel-shape guard                            ✓ NEW

COVERAGE: 27/27 paths (100%)  |  bun test test/restart-sweep.test.ts → 27 pass / 0 fail
Tests: 3902 → 3929 (+27 new)

Coverage gate: PASS (100%).

Pre-Landing Review

Already cleared via /plan-eng-review (6 issues, all 6 resolved with recommended option) and /codex consult mode (8 findings, all 8 resolved). The plan file at ~/.claude/plans/figure-out-if-we-eager-coral.md carries the full review trace.

Codex caught 2 silent-correctness bugs the eng review missed:

C1 (idempotency key collapse): original (sessionKey, restartTimeIso) key changes every run when the bootstrap log is missing, so the same stale session re-alerts forever. Fixed by adding a (sessionKey, lastAlertedAt) cooldown layer with 6h re-alert threshold.
C2 (import-time env snapshot): original script snapshotted env at module load — tests mutating process.env after import were semantically bogus. Fixed by moving env reads into the MessageSweepDetector constructor.

Eval Results

No prompt-related files changed — evals skipped.

Greptile Review

No Greptile comments on the PR.

Scope Drift

CLEAN. Branch intent: reshape PR #675's recipe shape + apply the 8 code fixes + add proper bun:test coverage. Delivered: same. No files outside recipes/restart-sweep.{md,mjs}, test/restart-sweep.test.ts, or the doc-sync targets.

Plan Completion

[DONE] Single self-contained recipes/restart-sweep.md (D2)
[DONE] No expect_exit_code in command health_check (D1)
[DONE] Atomic tmp+rename write for alerted.json (D3)
[DONE] Corrupt-JSON recovery in loadAlerted (D4)
[DONE] 12 ported + 14 new test cases (D5, +1 sentinel guard = 27 total)
[DONE] AGGRESSIVE-flip recipe-body callout (D6)
[DONE] Cooldown layer for synthesized restart-time bug (C1)
[DONE] Constructor-time env reads (C2)
[DONE] D3-claim wording corrected — atomicity ≠ no-dupes (C3)
[DONE] Cron environment troubleshooting subsection (C4)
[DONE] Plugin-handler v2 upgrade-path TODO (C5)
[DONE] Sentinel-anchored test extractor + ESM-cache-bypass salting (C6)
[DONE] Recipe-listing-vs-env-presence wording fixed (C7)
[DONE] Test-runner cite includes both parallel + shard scripts (C8)

12 plan items, 12 done. 0 deferred.

Verification Results

bun test test/restart-sweep.test.ts → 27 pass / 0 fail
gbrain integrations show restart-sweep → renders cleanly
gbrain integrations test recipes/restart-sweep.md → frontmatter validates
gbrain integrations doctor (with OPENCLAW_OWNER_IDS=test OPENCLAW_TELEGRAM_GROUP=-100) → all 3 health checks pass
bun run typecheck → clean
bun run verify → all 7 pre-test gates pass (privacy, jsonb, progress, test-isolation, wasm, admin-build, typecheck)
bun run test → 3,929 pass / 0 fail across 8 parallel shards + serial pass

TODOS

No TODO items completed in this PR.

Documentation

Updated three files to sync with v0.28.3:

README.md — added Restart Sweep to the "Getting Data In" recipes table
CLAUDE.md — added test/restart-sweep.test.ts annotation to the unit-test inventory
llms-full.txt — regenerated via bun run build:llms

Test plan

bun test test/restart-sweep.test.ts (27 pass / 0 fail)
bun run verify (privacy + jsonb + progress + test-isolation + wasm + admin-build + typecheck — all pass)
bun run test (3,929 pass / 0 fail, no regressions)
gbrain integrations show restart-sweep renders cleanly
gbrain integrations test recipes/restart-sweep.md frontmatter validates
gbrain integrations doctor restart-sweep (with envs set) — all 3 health checks pass
Real cron-driven dry run on a deployed OpenClaw setup (manual, post-merge)

🤖 Generated with Claude Code

…way restarts Adds a tool to detect Telegram messages dropped during OpenClaw gateway restarts by analyzing session state patterns. Features: - Detects sessions with abortedLastRun flag (primary heuristic) - Identifies timing gaps (active before restart, silent after) - Configurable alert modes (Telegram, stdout) - Environment-based configuration - Comprehensive test suite - PII-scrubbed for public use The tool addresses webhook message loss that occurs when the gateway restarts while messages are in-flight. Unlike long-polling, webhooks cannot replay missed messages, making this detection crucial for production reliability.

…script Reshape the directory-shaped recipes/restart-sweep/ into a single self-contained recipes/restart-sweep.md with the (fixed) script inlined as a fenced code block. The recipe loader at integrations.ts:445-485 only discovers *.md, so the directory shape was invisible. Eight script fixes: 1. Newline double-escape ('\\n' → '\n') at 8 sites 2. Hard-coded /tmp/ paths → ~/.gbrain/integrations/restart-sweep/ (honors GBRAIN_HOME); bootstrap-log path env-overridable via OPENCLAW_BOOTSTRAP_LOG 3. exec() of interpolated string → execFile with argv array (no shell) 4. Idempotency: loadAlerted/saveAlerted helpers, atomic tmp+rename, corrupt- JSON recovery, 30-day prune 5. Aggressive heuristic gated behind OPENCLAW_RESTART_SWEEP_AGGRESSIVE=1 (default OFF — false-positive prone during quiet periods) 6. Old directory shape removed 7. Env reads moved from module top-level to constructor (fixes the import- time-snapshot bug that made tests semantically bogus) 8. Cooldown layer keyed on (sessionKey, lastAlertedAt) with 6h re-alert threshold — prevents re-alerting forever when the bootstrap log is missing and restartTime is synthesized fresh each run Recipe body adds a Cron environment troubleshooting section with the wrapper-script pattern (set -a; source .env; set +a; exec node ...) plus explicit PATH= line for the cron entry. Plus a TODO line pointing at docs/guides/plugin-handlers.md as the v2 upgrade path (registered Minion handler in the openclaw repo for queue-backed idempotency). Tests: 27 bun:test cases (12 ported + 14 new + 1 sentinel-shape guard). The extractor anchors on  sentinel and salts the tmp filename to bypass the ESM import cache. A separate test asserts the sentinel itself is present so future doc edits dropping it fail loud. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- README.md: add restart-sweep row to "Getting Data In" recipes table - CLAUDE.md: add test/restart-sweep.test.ts to the unit-test inventory - llms-full.txt: regenerated via bun run build:llms Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Brings in v0.28.1 (zombie process reaping, /health timeout, engine disconnect idempotency, PR garrytan#637). Conflicts resolved: - VERSION → 0.28.3 (ours; newer than master's 0.28.1) - package.json → version 0.28.3 (matches VERSION) - CHANGELOG.md → kept v0.28.3 entry above master's v0.28.1 entry; both full entries preserved with their own ### Itemized changes sections Post-merge actions: - bun install (no dep changes) - bun run build:llms (regenerated llms-full.txt to pick up master's CLAUDE.md additions for v0.28.1) - bun run test (3,876 pass / 0 fail) + verify (clean) + typecheck (clean) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

….28.6 Master shipped three v0.28.x patch releases without the takes feature while v0.28-release was in flight: - v0.28.1: zombie process accumulation + health endpoint timeout (#637) - v0.28.3: restart-sweep — detect dropped Telegram messages (#675) - v0.28.4: skillify cross-modal eval quality gate (#674) Master's v0.28.0 slot was consumed without the takes layer ever landing, so this release ships the original takes feature as v0.28.6 (skipping v0.28.5 to leave space for any in-flight master patches). The migration orchestrator file (v0_28_0.ts) and migration skill doc (skills/migrations/v0.28.0.md) keep their original version keys — those identify the migration version, not the release version. Conflicts resolved: - VERSION → 0.28.6 (was 0.28.0; master had 0.28.4) - package.json → 0.28.6 (auto-merged ai-sdk deps from master's v0.27) - CHANGELOG.md → renamed top entry "## [0.28.0]" → "## [0.28.6]" with date 2026-05-06; rebuilt the "To take advantage of" block (was truncated by stale === markers from a prior merge); preserved master's v0.28.4/v0.28.3/v0.28.1 entries beneath - src/cli.ts auto-merged (CLI_ONLY has providers + takes/think both) Verified post-merge: - bun run verify: PASS (privacy + jsonb + progress + test-isolation + wasm + admin-build + typecheck) - 133 tests pass: migrate + apply-migrations + takes-engine + takes-fence - migrations v37 (takes) + v38 (access_tokens_permissions) apply cleanly on top of master's v35 (auto-RLS) + v36 (subagent persistence)

garrytan changed the title ~~feat(recipes): add restart-sweep — detect dropped messages after gateway restarts~~ v0.28.3 feat(recipes): add restart-sweep — detect dropped messages after gateway restarts May 6, 2026

garrytan and others added 3 commits May 6, 2026 11:12

chore: bump version and changelog (v0.28.3)

8b1532c

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

garrytan changed the title ~~v0.28.3 feat(recipes): add restart-sweep — detect dropped messages after gateway restarts~~ v0.28.3 feat(recipes): restart-sweep — detect dropped Telegram messages after gateway restarts May 6, 2026

garrytan merged commit e744eda into garrytan:master May 6, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.28.3 feat(recipes): restart-sweep — detect dropped Telegram messages after gateway restarts#675

v0.28.3 feat(recipes): restart-sweep — detect dropped Telegram messages after gateway restarts#675
garrytan merged 5 commits intogarrytan:masterfrom
garrytan-agents:feat/restart-sweep

garrytan-agents commented May 6, 2026 •

edited by garrytan

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

garrytan-agents commented May 6, 2026 • edited by garrytan Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Coverage

Pre-Landing Review

Eval Results

Greptile Review

Scope Drift

Plan Completion

Verification Results

TODOS

Documentation

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

garrytan-agents commented May 6, 2026 •

edited by garrytan

Loading