Skip to content

links_extraction_lag never clears on Postgres: extract --stale stamps links_extracted_at, but the same UPDATE advances updated_at past it #1768

@lucha0404

Description

@lucha0404

Environment

  • gbrain 0.42.8.0, Postgres engine (Supabase pooler, session mode, port 5432)
  • gbrain doctor: no FAIL, embeddings 100%, connection OK

Symptom

gbrain extract --stale runs to completion (526/526 pages, reports 7 link(s) + 3 timeline entr(ies)) and does stamp links_extracted_at on every page — yet the links_extraction_lag doctor check still reports 526/526 pages (100%) have un-extracted edges, no matter how many times I re-run extract --stale. The lag never clears.

Root cause (verified against the DB)

The stamp succeeds, but the same UPDATE leaves updated_at a few microseconds ahead of links_extracted_at, so the staleness arm links_extracted_at < updated_at is permanently true.

DB ground-truth immediately after a full extract --stale:

null_wm | has_wm | total | max(links_extracted_at)    | max(updated_at)
0       | 526    | 526   | 2026-06-02 08:18:58.999+00 | 2026-06-02 08:18:58.999166+00
  • links_extracted_at IS NULL count = 0 (every page was stamped)
  • pages with links_extracted_at < updated_at = 525 / 526, all with updated_at - links_extracted_at < 1 second (sub-second / ~166µs gap)

So links_extracted_at (….999) ends up ~166µs behind updated_at (….999166) on essentially every page.

Where it comes from

  • src/core/engine.ts (~L1058-1066) — stampExtracted doc note says the stamp involves updated_at ("v0.42.7 D4 race fix … a concurrent edit landing between the SELECT and this stamp advances updated_at past the stamped …"). On Postgres the stamp's own UPDATE appears to advance the row's updated_at past the value being stamped (looks like a row-level updated_at trigger / timestamp-precision interaction).
  • src/commands/doctor.ts (~L2586) links_extraction_lagcountStalePagesForExtraction({ versionTs: LINK_EXTRACTOR_VERSION_TS }); staleness = links_extracted_at IS NULL OR < versionTs OR < updated_at. Here the < versionTs arm is satisfied (stamp is 06-02 > 2026-05-31), so it's specifically the < updated_at arm that stays true.

extract --stale therefore can never satisfy its own check on Postgres, because the stamping UPDATE bumps updated_at past the stamp it just wrote.

Impact

Cosmetic but persistent: extraction genuinely runs and every page is stamped, but links_extraction_lag is stuck at 100% and drags the overall health score down on every doctor run. The remediation the check itself recommends (gbrain extract --stale) cannot clear it. Likely Postgres-only (PGLite may not have the same updated_at trigger / precision behavior).

Suggested fix (any one)

  1. After the flush, stamp links_extracted_at to a value >= the post-UPDATE updated_at (or write both in one expression so they're equal); or
  2. Don't touch updated_at in the watermark UPDATE (suppress the row's updated_at trigger for this write); or
  3. Make the staleness predicate tolerant of a sub-second gap, e.g. links_extracted_at < updated_at - interval '1 second'.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions