fix(staleness): measure sync lag relative to newest committed content (not wall-clock)#1623
Closed
garrytan-agents wants to merge 1 commit into
Closed
Conversation
Source staleness previously used wall-clock (now - last_sync_at), which flagged quiet/caught-up repos as severely stale even when nothing new had been committed since the last sync. Lag is now content-relative: 0 when the newest tracked/committed content is at or before last_sync_at; otherwise the wall-clock delta. Untracked files (git status '??') are excluded — they are not part of the repo. Falls back to wall-clock when content can't be probed (non-git/unreadable path) so detection never regresses. Negative wall-clock (future last_sync_at) is surfaced for clock-skew detection. Wired into buildSyncStatusReport, computeAllSourceMetrics/isSourceStale, and checkSyncFreshness/checkCycleFreshness. Regression tests added.
Owner
|
Superseded by #1656, re-implemented in the base repo (garrytan-agents PRs run in a fork without secret access, so CI can't gate them — base-repo branch fixes that). Two substantive changes vs this PR:
Your root-cause analysis (untracked dirs defeating the clean-tree gate) was exactly right and is the headline fix. Thanks for it. |
garrytan
added a commit
that referenced
this pull request
May 30, 2026
…1623) (#1656) * fix(staleness): commit-relative sync staleness (HEAD-hash local, durable column remote) Quiet, fully-caught-up repos no longer false-alarm as SEVERELY STALE in gbrain doctor / sources status. Staleness now means "is there committed content the sync hasn't ingested?" not raw wall-clock since the last sync. - git-head.ts: requireCleanWorkingTree gains 'ignore-untracked' mode (git status --porcelain --untracked-files=no). Untracked dirs no longer defeat the freshness short-circuit — sync's incremental path keys off the commit diff and never imports untracked files, so doctor agrees with sync. - source-health.ts: newestCommitMs (HEAD committer time) + pure lagFromContentMs comparator; computeAllSourceMetrics {probeContent} routes local→live commit-hash, remote→stored column. Dead isSourceStale removed. - migration v108 sources.newest_content_at + fresh-schema blobs. - sync.ts: writeSyncAnchor stamps newest_content_at atomically with last_commit/last_sync_at; buildSyncStatusReport (remote get_status_snapshot) reads the column — no git subprocess (v0.41.27.0 trust boundary intact). - doctor.ts: checkSyncFreshness short-circuit ignores untracked; remote path reads the column; clock-skew check stays on raw wall-clock. Local consumers probe live git (catch HEAD moving to an old-dated commit, which a timestamp compare would miss); remote consumers read the durable column so a remote-callable endpoint never shells out to a DB-supplied local_path. Supersedes #1623 (re-implemented in base repo with the trust boundary preserved). Co-Authored-By: t <t@t> * chore(ci): offload tests to on-demand cloud runners from a local CLI scripts/ship-remote-tests.sh pushes the branch, dispatches the test workflow, and blocks on `gh run watch --exit-status` — a local caller (human or agent) awaits the GitHub run exactly like a local `bun run test`, with a real pass/fail exit code. Frees a load-saturated local machine (many Conductor agents running their own bun-test suites at once → load avg 120 on 16 cores → PGLite OOM/crawl). test.yml gains workflow_dispatch so the suite can be triggered from any branch. * chore: bump version and changelog (v0.41.32.0) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: t <t@t> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
rayers
added a commit
to rayers/gbrain
that referenced
this pull request
May 31, 2026
Resolve writeSyncAnchor signature conflict: PR garrytan#1430's pullFailed param and upstream garrytan#1623's newestContentEpochMs param both added a 5th positional arg to the same function. Merged to take both — newest_content_at (git-intrinsic HEAD committer time) stamps regardless of pull outcome; last_sync_at (the "observed upstream" freshness signal) stays gated on pullFailed. All 3 call sites pass both args. tsc --noEmit clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
mgunnin
added a commit
to mgunnin/gbrain
that referenced
this pull request
Jun 3, 2026
* upstream/master: v0.41.36.0 feat(mcp): publish agent skills (list_skills / get_skill) for thin clients (garrytan#1661) v0.41.35.0 feat(guardrails): vendor-neutral content guardrail seams (supersedes garrytan#1652) (garrytan#1660) v0.41.34.0 feat(search): retrieval cathedral — max-pool + title + alias + evidence (garrytan#1657) v0.41.33.0 feat(search): intent-aware adaptive return-sizing + agent-facing query param (garrytan#1640) v0.41.32.0 fix(staleness): commit-relative sync staleness (supersedes garrytan#1623) (garrytan#1656) v0.41.31.0 feat(embed): delta-aware sync --all cost gate + real stale-embedding semantics (garrytan#1632) v0.41.30.0 fix(brainstorm/lsd): --save writes the advertised .md file via canonical ingestion path (garrytan#1655) # Conflicts: # src/core/operations.ts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Source staleness in
gbrain doctor(andsources status) was measured as raw wall-clock time since the last sync:This flags quiet, fully-caught-up repos as severely stale even when nothing new has been committed since the last sync. A federated source that hasn't received a commit in days is not stale for search purposes — the sync has everything the repo contains. But the wall-clock metric kept escalating, producing false
SEVERELY STALEalerts.Error Log (anonymized)
A daily health check repeatedly fired against a low-churn source:
Yet the underlying repo was caught up:
The newest committed content predated the last sync by ~28h, so the source was fully synced. The alert was pure wall-clock noise.
Root Cause
Three independent consumers all derived staleness from
now - last_sync_at:buildSyncStatusReport(sync.ts) →staleness_hours/ dashboardcomputeAllSourceMetrics+isSourceStale(source-health.ts) →lag_seconds, federation healthcheckSyncFreshness+checkCycleFreshness(doctor.ts)None compared against what the repo actually contains.
What We Tried
?? companies/,?? media/) inflated the lag. Untracked files are not part of the repo; "last committed to the repo" must mean committed/tracked state only.last_sync_at, else the wall-clock delta. Fall back to wall-clock when content can't be probed (non-git / unreadable path) so detection never regresses.Solution
New helpers in
src/core/source-health.ts:newestContentMs(localPath)— newest committed/tracked mtime:git log -1 --format=%ct(HEAD commit time) combined with the newest tracked working-tree modification (git status --porcelain -z, excluding untracked??entries). Returnsnullfor non-git/unreadable paths.contentRelativeLagSeconds(localPath, lastSyncMs, nowMs):nullwhenlast_sync_atis unknown.last_sync_at) is surfaced as-is so clock-skew detection upstream still fires.0if newest content ≤ last sync (caught up), else wall-clock delta.Wired into all three consumers. In
checkSyncFreshness, it composes cleanly with the v0.41.27.0localOnlygit short-circuit — the short-circuit is a strict "definitely unchanged" gate (HEAD === last_commit + clean tree + chunker match) for the local CLI; content-relative lag covers the general/remote path where the short-circuit doesn't fire.cycle_freshnessis also made content-relative (a source with no new committed content doesn't need re-cycling).Behavior Matrix
Results
Re-running
sources status --jsonagainst a real low-churn source after the fix:Other active sources continue to report correct non-zero lag.
Testing
tsc --noEmit: clean.test/source-health.test.ts(quiet-repo → 0, behind-repo → wall-clock, non-git fallback, tracked-edit counts, untracked excluded) andtest/sync-all-parallel.test.ts(caught-up → 0 staleness).source-health,sync-all-parallel,doctor-cycle-freshness,doctor: 115 pass / 0 fail.