Skip to content

v0.41.25.0 perf: batch sync deletes (supersedes #1538)#1568

Closed
garrytan wants to merge 2 commits into
masterfrom
garrytan/copenhagen-v6
Closed

v0.41.25.0 perf: batch sync deletes (supersedes #1538)#1568
garrytan wants to merge 2 commits into
masterfrom
garrytan/copenhagen-v6

Conversation

@garrytan

Copy link
Copy Markdown
Owner

Summary

Bulk deletes no longer jam sync for two days. A single commit deleting 73K files used to take ~5 hours (146K individual DB round-trips) and time out the sync cron every run. This release batches both halves — Phase 1 batched slug resolution + Phase 2 batched DELETE — so the same scenario lands in ~2 minutes. The per-batch shape inherits the v0.41.19 Supavisor withRetry cathedral so transient connection blips don't lose work.

Also closes a latent silent-no-op bug class on gbrain sync without --source: the pre-fix per-file resolver could pull a slug from the wrong source and then issue a delete against source_id='default', which would silently no-op. Files stayed in the brain after the file was gone from disk. The new path scopes lookup AND delete to the same source.

Scope honesty: this fixes the delete-specific hotspot only. The full RFC #1472 cathedral (per-source --timeout, --break-lock-if-stale, --independent mode) already landed in v0.41.15.0 (#1506). Renames, imports, extraction, embedding, and lock contention can still cause sync slowness in their own ways.

Provenance

Productionized from community PR #1538 (@garrytan-agents) per CLAUDE.md's community-PR-wave workflow. The original PR couldn't run CI with secrets because GitHub doesn't pass them to fork PRs from non-collaborators. Moving to a base-repo branch + 12 review-driven improvements layered on top via /plan-eng-review + codex outside voice:

# Decision
D1 Drop duplicate progress.finish() from PR #1538
D2 Batch BOTH sourceId-set AND no-sourceId Phase-1 lookups
D3 BATCH_SIZE=100 matches v0.12.1 addLinksBatch precedent
D4 Full test coverage: 12 unit + 3 E2E + D1/D9 regressions
D5 Codex outside-voice review (12 findings absorbed)
D6 Scope no-sourceId path to source_id='default' (kills latent bug)
D7 ORDER BY slug ASC for deterministic duplicate-source_path collapse
D8 Wire deletePages through withRetry(BULK_RETRY_OPTS) + audit
D9 Query-count regression test (the actual perf regression test)
D10 deletePages returns Promise<string[]> not Promise<number>
D11 Implement on both engines, non-optional interface (no fallback branch)
D12 CHANGELOG scoped honestly to delete hotspot

Engine surface

// BrainEngine — non-optional, no fallback branching
deletePages(slugs: string[], opts?: { sourceId?: string; signal?: AbortSignal }): Promise<string[]>

PostgresEngine wraps each batch in withRetry(BULK_RETRY_OPTS) with auditSite: 'deletePages'. PGLite runs the bare DELETE (single-writer, no Supavisor concern). BATCH_AUDIT_SITES enum extended; test/core/retry.test.ts regression pin updated.

Sync rewrite

src/commands/sync.ts delete loop replaced with the two-phase batched path:

  • Phase 1: SELECT slug, source_path FROM pages WHERE source_path = ANY($1::text[]) AND source_id = $2 ORDER BY slug ASC (BATCH_SIZE=100). ORDER BY pins deterministic Map collapse on duplicate source_paths.
  • Phase 2: engine.deletePages(batch, deleteOpts) (BATCH_SIZE=100). Returns actually-deleted slugs; caller's pagesAffected appends INPUT batch for parity with pre-fix (downstream extract/embed sweeps no-op on missing pages).

New sync.deletes.resolve progress event fires for Phase 1; existing sync.deletes covers Phase 2. Per-batch --timeout abort checks preserve the v0.41.13.0 D-V4-2 contract.

Tests

  • test/sync-batch-deletes.test.ts (NEW, 12 PGLite cases): engine-level contract (D10), missing slugs, abort prefix, source scoping (D6 + default), 250-slug internal batching, D7 deterministic ORDER BY, source_path-NULL fallback, D9 query-count regression, D1 structural source-text regression.
  • test/e2e/sync.test.ts (extended, 3 real-Postgres cases): FK CASCADE through content_chunks (100-row batch), end-to-end 250-file delete batches into ≤3 SELECT + ≤3 deletePages calls, abort mid-sync returns partial('timeout') with empty pagesAffected.
  • test/core/retry.test.ts regression-pin extended with 'deletePages'.

Docs

  • docs/ENGINES.md documents deletePages alongside deletePage.
  • docs/progress-events.md documents sync.deletes.resolve.

Closes / supersedes

Supersedes #1538 (same fix, base-repo branch so CI has secret access, plus the 12 improvements above).

Test plan

  • bun run typecheck — pass
  • bun run verify — 28/28 checks pass
  • bun test test/sync-batch-deletes.test.ts — 12/12 (PGLite)
  • bun test test/core/retry.test.ts — 37/37 (regression-pin updated)
  • bun test test/build-llms.test.ts — 7/7 (llms.txt regenerated for CHANGELOG)
  • bun run test:serial — 50/50 serial files
  • bun run test — 3105 pass / 1 fail (BATCH_AUDIT_SITES regression-pin, fixed in same commit)
  • E2E real-Postgres: bun test test/e2e/sync.test.ts against pgvector container — 17/17

Notes

  • VERSION + package.json bumped to 0.41.25.0 (master at 0.41.22.1; v0.41.23.0 and v0.41.24.0 claimed by sibling PRs).
  • llms-full.txt regenerated for the CHANGELOG entry per the CLAUDE.md mandatory-regen-after-CHANGELOG-edit rule.
  • Two follow-up TODOs filed in TODOS.md: composite (source_id, source_path) partial index for the optimal Phase-1 hot path on 4-source federated brains (P3, latent).

🤖 Generated with Claude Code

Bulk deletes no longer jam sync for two days. A single commit deleting
73K files used to take ~5 hours (146K individual DB round-trips) and
time out the sync cron every run. This release batches both halves:
~2 minutes on the same scenario, with the per-batch shape inheriting
the v0.41.19 Supavisor retry cathedral so transient connection blips
don't lose work.

Also closes a latent silent-no-op bug class: on brains that didn't pass
`--source`, the old per-file resolver could pull a slug from the wrong
source and then issue a delete against `source_id='default'`, which
would no-op. Files stayed in the brain after the file was gone from
disk. The new path scopes lookup AND delete to the same source.

Productionized from community PR #1538 (@garrytan-agents) per the
CLAUDE.md community-PR-wave workflow. 12 review-driven improvements
layered on top via /plan-eng-review + codex outside voice:

  D1  drop duplicate progress.finish() from PR #1538
  D2  batch BOTH sourceId-set and no-sourceId Phase-1 lookups
  D3  BATCH_SIZE=100 matches v0.12.1 addLinksBatch precedent
  D4  full test coverage: 12 unit + 3 E2E + D1/D9 regressions
  D5  codex outside-voice review (12 findings absorbed)
  D6  scope no-sourceId path to source_id='default' (kills latent bug)
  D7  ORDER BY slug ASC for deterministic duplicate-source_path collapse
  D8  wire deletePages through withRetry(BULK_RETRY_OPTS) + audit
  D9  query-count regression test (the actual perf regression test)
  D10 deletePages returns Promise<string[]> not Promise<number>
  D11 implement on both engines, non-optional interface (no fallback)
  D12 CHANGELOG scoped honestly to delete hotspot

Engine surface:
  BrainEngine.deletePages(slugs, opts?): Promise<string[]>
  PostgresEngine: withRetry wrap, audit 'deletePages', BATCH_SIZE=100
  PGLiteEngine:   raw DELETE ... WHERE slug = ANY($1::text[])
  BATCH_AUDIT_SITES enum + retry.test.ts regression pin updated

Sync rewrite: src/commands/sync.ts delete loop replaced with two-phase
batched path. New sync.deletes.resolve progress event (Phase 1 lookup),
existing sync.deletes covers Phase 2. Per-batch --timeout abort checks
preserve the v0.41.13.0 D-V4-2 contract.

Tests:
  test/sync-batch-deletes.test.ts (NEW, 12 PGLite cases): engine-level
    contract + sync-level integration (D6/D7/D9 regressions) + D1
    structural source-text regression
  test/e2e/sync.test.ts (extended, 3 real-Postgres cases): FK cascade
    through content_chunks, 250-file batch-shape verification, abort
    mid-sync returns partial(timeout) cleanly
  test/core/retry.test.ts BATCH_AUDIT_SITES regression-pin extended

Docs:
  docs/ENGINES.md documents deletePages alongside deletePage
  docs/progress-events.md documents sync.deletes.resolve

Co-authored-by: garrytan-agents <garrytan-agents@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Catches up to v0.41.23.0 (#1541). VERSION + package.json kept at
0.41.25.0 (ours, higher semver); CHANGELOG carries both entries with
v0.41.25.0 on top. llms-full.txt regenerated for the merged CHANGELOG.
No code conflicts in src/ — master's extract operator surfaces wave
(#1541) touches different files than this branch's batch-deletes
work in sync.ts.
@garrytan

garrytan commented Jun 8, 2026

Copy link
Copy Markdown
Owner Author

Superseded by #1566 (batched deletes + global page-generation clock), which merged as v0.41.25.0. Closing this duplicate.

@garrytan garrytan closed this Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant