v0.42.18.0 fix: sync orphan-pileup watchdog (#1633) + links-lag µs stamp (#1768)#1807
Merged
Conversation
Stamp the full-microsecond updated_at (via to_char ... AT TIME ZONE UTC) instead of the millisecond-truncated JS Date, so links_extracted_at equals the DB updated_at exactly and the staleness predicate clears. Stamp SQL unchanged: version-arm backdating still works, D4 preserved, CDX-1 strengthened. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bun eval-Worker that SIGTERM->grace->SIGKILLs its own process from a separate OS thread, so a sync whose main event loop is starved (ReDoS spin) still dies. Signals SELF (no PID-reuse footgun). Empirically validated on Bun 1.3.13. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
cli.ts installs the watchdog before connectEngine (bounds connect hangs); resolveSyncHardDeadline + composeAbortSignals in sync.ts; SIGINT graceful cancel on single-source + --all; withRefreshingLock timer unref'd. Non-TTY default 3600s makes cron orphan-pileup structurally impossible. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sibling workspaces claimed v0.42.13-v0.42.17; advance this branch's slot. VERSION + package.json + CHANGELOG header + CLAUDE.md annotations + llms bundles. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ngo-v4 # Conflicts: # CHANGELOG.md # CLAUDE.md # VERSION # llms-full.txt # package.json
…check:doc-history) The doc-history guard bans the bolded **v0.X release-clause marker in reference docs (history belongs in CHANGELOG + git). Rewrote the extract.ts/sync.ts additions as current-state prose and de-versioned the process-watchdog entry. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ngo-v4 # Conflicts: # CHANGELOG.md # VERSION # docs/architecture/KEY_FILES.md # package.json
mgunnin
added a commit
to mgunnin/gbrain
that referenced
this pull request
Jun 3, 2026
* upstream/master: v0.42.23.0 feat(jobs): --nice scheduling-priority flag for jobs work/supervisor (garrytan#1815) (garrytan#1820) v0.42.22.0 fix(minions): supervisor progress watchdog + worker DB self-defense — alive-but-wedged worker self-heals (garrytan#1801) (garrytan#1824) v0.42.21.0 fix(postgres): module-singleton ownership — canonical landing for the dream-cycle "connect() has not been called" class (garrytan#1404/garrytan#1471/garrytan#1619) (garrytan#1805) v0.42.20.0 fix: reliability wave — PGLite capture lock-pin + Postgres reconnect race + search embed-hang (garrytan#1762 garrytan#1745 garrytan#1775) (garrytan#1810) v0.42.19.0 fix(skillopt): close the last gap in the AI SDK v6 tool-loop fix (write-capture mapper + regression test) (garrytan#1809) v0.42.18.0 fix: sync orphan-pileup watchdog (garrytan#1633) + links-lag µs stamp (garrytan#1768) (garrytan#1807) v0.42.17.0 fix(sync): resumable incremental sync — killed mid-import no longer loses progress (garrytan#1794) (garrytan#1808) v0.42.16.0 feat(doctor): brain health as a solved problem — cause-ranked doctor + OOM-loop line + auto-drain + pool-reap (garrytan#1685) (garrytan#1802) v0.42.15.0 fix: decouple CLI primary output from process.stdout.isTTY (garrytan#1784) (garrytan#1806) v0.42.14.0 fix(zero-config): code-* readiness signal + init embedding-key validation + lock self-heal (garrytan#1780) (garrytan#1804) v0.42.13.0 fix(search): archive/ content findable by default, demoted not hard-excluded (garrytan#1777) (garrytan#1797) v0.42.12.0 feat: self-upgrading gbrain — invocation-riding update check + opt-in auto-upgrade (garrytan#1798) v0.42.11.0 feat(skillopt): held-out eval gate, honest receipts, ENFORCE + ablation opts (garrytan#1759) v0.42.10.0 feat(extract): opt-in global-basename wikilink resolution (closes garrytan#972) (garrytan#1388)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two unrelated bugs reported from real Postgres/Supabase brains.
#1633 —
gbrain syncspins, ignores SIGTERM, orphans pile up under cronA
sync --source <id>could enter a busy loop that pegs a CPU core and ignores SIGTERM/kill(onlykill -9stopped it). Under cron the stuck process orphaned and the next tick spawned another — one reporter found 13, 24h+ old. Root cause: a spinning sync starves its own event loop, so the SIGTERM handler and--timeoutgbrain already had can never run.Fix — out-of-band hard-deadline watchdog.
src/core/process-watchdog.tsspawns a Bunworker_threadsWorker (eval:trueso it survivesbun --compile) on a separate OS thread; at the deadline it SIGTERMs its own process, and at deadline+grace SIGKILLs it — fires even when the main loop is starved. Signals SELF, so no PID-reuse footgun. Empirically validated on Bun 1.3.13 (worker timer + SIGKILL killed awhile(true){}-starved process; Codex outside-voice + a repo spike both confirmed).cli.tsbeforeconnectEngineso a connect-phase hang is bounded too.GBRAIN_SYNC_MAX_RUNTIME_SECONDS,--hard-deadline <s>, or opt out with--no-hard-deadline.--timeout <s>auto-arms the hard backstop. Interactive TTY runs stay unbounded.--all(Ctrl-C returns a cleanpartial+ releases the lock via the normal finally, instead of a hard cut that leaked the lock).withRefreshingLocktimerunref'd.[sync-watchdog]heartbeat + the existing[gbrain phase]lines pinpoint the next hang.#1768 —
links_extraction_lagstuck at 100% on Postgresgbrain extract --stalestamped every page, yetgbrain doctorstill reported every page as needing extraction — the remediation it recommends could never satisfy its own check. Root cause: the stamp went through a JSDate(millisecond-truncated) while the DBupdated_atkeeps microseconds, soupdated_at > links_extracted_atstayed permanently true. (Not a trigger — there is noBEFORE UPDATEtrigger onpages.)Fix. Both engines'
listStalePagesForExtractionSELECT now projects a deterministic full-µs UTC string (to_char(updated_at AT TIME ZONE 'UTC', '…US"Z"'));StalePageRow.updated_at_iso;extractStaleFromDBstamps that. ThemarkPagesExtractedBatchSQL is unchanged, so backdated stamps (version-arm test) still work and the CDX-1 edited-since arm is strengthened to exact equality. Postgres-only symptom.Tests
test/process-watchdog.test.ts(pure decision matrix + handle contract),test/process-watchdog.serial.test.ts(Bun-pinned: starved process IS killed ~deadline+grace; no-watchdog control does NOT self-exit; clean dispose never kills),test/sync-hard-deadline.test.ts(resolution precedence +composeAbortSignals).test/extract-stale.test.ts(inject a µsupdated_at, run--stale, assert lag → 0 and stays 0).verify29/29; typecheck clean; targeted post-merge suites 187 pass.Pre-existing failures (NOT from this branch)
The full suite shows 4 failures in
test/facts-classify.test.ts(×2) andtest/mcp-eval-capture.test.ts(×2). Verified against a cleanorigin/mastercheckout: the same 4 fail there too — they predate this branch and live in areas this PR never touches (facts classifier, op-layer eval-capture). This PR adds 187+ passing tests and introduces zero new failures.Reviews
/plan-eng-reviewCLEAR (9 decisions resolved) ·/codexoutside-voice CLEAR (validated both load-bearing bets: theto_charµs round-trip and the Bun worker-self-kill).🤖 Generated with Claude Code