perf(embed): cursor-paginated stale loading + rate-limit backoff + partial index by garrytan-agents · Pull Request #987 · garrytan/gbrain

garrytan-agents · 2026-05-14T12:38:35Z

What

Three fixes for embed --stale on large brains (300K+ chunks, 48K stale):

1. Cursor-paginated `listStaleChunks`

Previous: one query pulling ALL stale rows (LIMIT 100000). On 373K-row content_chunks with 48K stale, this took >2 min and hit Supabase's statement_timeout.

Fix: keyset pagination on (page_id, chunk_index), 2000 rows per batch. Each query finishes in <1s.

2. Rate-limit-aware retry (`embedBatchWithBackoff`)

Previous: OpenAI SDK's built-in retry maxes at ~4s backoff — too short for TPM limits on large pages (~90K tokens). Pages silently skipped after 3 failed attempts.

Fix: wrapper parses retry delay from 429 message (e.g. try again in 248ms), sleeps with 500ms padding, up to 5 retries. 60s conservative fallback.

3. Migration v58: partial index

CREATE INDEX idx_chunks_embedding_null
  ON content_chunks (page_id, chunk_index)
  WHERE embedding IS NULL;

Makes countStaleChunks() and paginated listStaleChunks() instant.

Testing

Verified on 99K-page / 373K-chunk brain with 48K stale chunks:

Before: embed --stale hung 2+ min then timed out (0 progress)
After: loads 2K rows in <1s, embeds concurrently, pages through all stale chunks

Files changed

src/commands/embed.ts — paginated embedAllStale + embedBatchWithBackoff
src/core/engine.ts — updated listStaleChunks interface with cursor params
src/core/postgres-engine.ts — keyset-paginated query
src/core/pglite-engine.ts — matching PGLite implementation
src/core/types.ts — added page_id to StaleChunkRow
src/core/migrate.ts — v58 partial index migration

…rtial index Three fixes for embed --stale on large brains (300K+ chunks): ## 1. Cursor-paginated listStaleChunks (embed timeout fix) The previous implementation pulled ALL stale rows (up to 100K) in one query. On a 373K-row content_chunks table with 48K stale rows, this query took >2 min and hit Supabase's 2-min statement_timeout, causing embed --stale to silently fail with zero progress. Fix: keyset pagination on (page_id, chunk_index) with a default batch size of 2000 rows. Each query finishes in <1s. The embedAllStale loop pages through batches, embeds each batch, then advances the cursor. ## 2. Rate-limit-aware retry (429 backoff) The OpenAI SDK's built-in retry has a ~4s max backoff window, which is too short for TPM (tokens-per-minute) limits on large pages (~90K tokens). The embed loop would fail after 3 SDK retries and skip the page entirely. Fix: embedBatchWithBackoff wrapper parses the retry delay from the 429 error message (e.g. 'try again in 248ms') and sleeps for that duration + 500ms padding. Up to 5 retries with parsed delays (60s fallback when unparseable). ## 3. Migration v58: partial index for NULL embeddings `CREATE INDEX idx_chunks_embedding_null ON content_chunks (page_id, chunk_index) WHERE embedding IS NULL` — makes countStaleChunks() and the paginated listStaleChunks() instant instead of full-table-scanning 373K rows. ## Testing Verified on a 99K-page / 373K-chunk brain with 48K stale chunks. Before: embed --stale hung for 2+ min then timed out (0 progress). After: loads 2K rows in <1s, embeds concurrently, pages through all stale chunks without timeout.

garrytan · 2026-05-14T15:10:03Z

Superseded by work on garrytan/santiago-v2 — cherry-picked commit 4da1c61e over there to bypass CI issues with PRs from garrytan-agents. Same change, same diff. Closing this PR; the work will ship via the santiago-v2 PR (title + description carried over verbatim). Thanks!

Resolves migration version collision: master shipped v58 (edges_backfilled_at_v0_33_3 — v0.33.3 W0c symbol-resolution backfill watermark) ahead of this branch. The embed-perf cherry-pick from PR #987 also claimed v58 for its idx_chunks_embedding_null partial index; renumbered to v59 since master landed first. Both migrations coexist unchanged at their new slots. No other conflicts. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

garrytan closed this May 14, 2026

garrytan mentioned this pull request May 15, 2026

v0.34.4.0 fix(embed): cursor-paginated --stale hardening wave (D2/D3/D4/D6/D7/D8 + regression test) #991

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(embed): cursor-paginated stale loading + rate-limit backoff + partial index#987

perf(embed): cursor-paginated stale loading + rate-limit backoff + partial index#987
garrytan-agents wants to merge 1 commit into
garrytan:masterfrom
garrytan-agents:fix/embed-perf

garrytan-agents commented May 14, 2026

Uh oh!

garrytan commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

garrytan-agents commented May 14, 2026

What

1. Cursor-paginated listStaleChunks

2. Rate-limit-aware retry (embedBatchWithBackoff)

3. Migration v58: partial index

Testing

Files changed

Uh oh!

garrytan commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. Cursor-paginated `listStaleChunks`

2. Rate-limit-aware retry (`embedBatchWithBackoff`)