Skip to content

perf(age): index edge endpoints in backfill + bind anonymous RELATION targets (#335)#337

Merged
jphein merged 4 commits into
mainfrom
perf/335-age-graph-followups
May 31, 2026
Merged

perf(age): index edge endpoints in backfill + bind anonymous RELATION targets (#335)#337
jphein merged 4 commits into
mainfrom
perf/335-age-graph-followups

Conversation

@jphein

@jphein jphein commented May 31, 2026

Copy link
Copy Markdown
Collaborator

Closes #335 — the three mempalace-side AGE graph-walk follow-ups surfaced by the Cat 7b hybrid-latency root-cause (palace-daemon docs/perf/2026-05-30-hybrid-graph-walk-latency.md, merged as palace-daemon#206). The per-entity graph-walk Cypher in searcher._graph_expand_* and the daemon's /search/age-fused lookup were the entire hybrid-vs-union latency delta.

1. Auto edge-endpoint indexes in backfill

AGE only btree-indexes a label table's own id (_ag_label_edge_pkey), never the start_id/end_id graphid columns the edge walks join on. So every per-entity lookup parallel-seq-scanned the whole edge table (MENTIONS 6.69M rows on prod, ~5.8s cold for a hot entity, × N query entities).

KnowledgeGraphAGE._ensure_edge_endpoint_indexes() (new, mirrors _ensure_drawer_unique_index) creates idx_mentions_{start,end}_id + idx_relation_{start,end}_id with CREATE INDEX IF NOT EXISTS, skipping any label whose backing table doesn't exist yet (the label is only created on first edge write). backfill_age.backfill() calls it after the edge pass so fresh palaces get the indexes automatically. Names/columns kept in sync with palace-daemon's scripts/age_graph_indexes.sql + the operator-online POST /backfill-age/indexes (CONCURRENTLY) route.

2. Bind the anonymous RELATION target

MATCH (a:Entity)-[r:RELATION]->() left the far endpoint anonymous, so AGE built a Parallel Append over every vertex label (Entity + Drawer + Room + Wing + _ag_label_vertex, ~1.58M rows on prod) and nested-loop-joined to validate the endpoint exists — materializing the union and spilling to /dev/shm.

RELATION is always (Entity)->(Entity), so binding the open end to :Entity is semantically identical and collapses the Append to a single Entity scan. Verified row-for-row on a 300K-edge AGE graph: ->() and ->(:Entity) both returned 1816 rows; with the start_id index the RELATION scan became a Bitmap Index Scan (152ms → 46ms). All four expand sites updated (_graph_expand_from_seeds outbound/inbound/per-entity + _graph_expand_from_entities).

3. shm-size (deploy-side — documented, not code)

mempalace-db is built from the sibling disks repo and docker run on familiar, not provisioned from this repo, so there's no in-repo compose file to patch. The required value (--shm-size=256m, off Docker's 64MB default) is documented as an operator action in docs/operators/2026-05-31-age-graph-walk-shm-size.md, with the EXPLAIN evidence and verify step.

Tests

6 new, all green:

  • test_searcher_stopwords.py::TestGraphExpandBoundEndpoint — source guards that neither expand fn leaves a -[r:RELATION]->() / ()-[r:RELATION]-> anonymous endpoint (sits alongside the existing perf(search): make graph-expand cypher directional in hybrid candidate strategy — ~100× speedup #291 directional guards).
  • test_knowledge_graph_age.py::test_edge_endpoint_index_targets_are_the_four_we_expect — pure-unit, pins the index name/label/column set in sync with the daemon SQL.
  • test_knowledge_graph_age.py::test_ensure_edge_endpoint_indexes_skips_absent_then_installs@pgmark integration test against a live AGE container: skip-when-edge-tables-absent → install-all-when-present → idempotent on re-run.

Full suite green without TEST_POSTGRES_DSN (repo default); ruff (0.15.14) clean. FORK_CHANGELOG.md + README.md re-rendered from docs/fork-changes.yaml (the row-renumber is generator output, commit: HEAD resolves at ship-prep per repo convention).

🤖 Generated with Claude Code

… targets

Three AGE graph-walk follow-ups to the Cat 7b hybrid-latency root-cause
(MemPalace#335). The per-entity graph-walk Cypher in searcher._graph_expand_*
and the daemon's /search/age-fused lookup were the entire hybrid-vs-union
latency delta.

1. Auto edge-endpoint indexes. AGE only btree-indexes a label table's own id,
   never the start_id/end_id graphid columns the edge walks join on — so every
   per-entity lookup parallel-seq-scanned the whole edge table (MENTIONS 6.69M
   rows, ~5.8s cold). Add KnowledgeGraphAGE._ensure_edge_endpoint_indexes
   (idx_mentions_{start,end}_id + idx_relation_{start,end}_id, CREATE INDEX IF
   NOT EXISTS, skips labels whose table doesn't exist yet), mirroring
   _ensure_drawer_unique_index; backfill_age.backfill calls it after the edge
   pass so fresh palaces get them automatically. Names/columns kept in sync with
   palace-daemon scripts/age_graph_indexes.sql + POST /backfill-age/indexes.

2. Bind the anonymous RELATION target. MATCH (a:Entity)-[r:RELATION]->() left the
   far endpoint anonymous, so AGE built a Parallel Append over every vertex label
   (~1.58M rows) + nested-loop to validate it, spilling to /dev/shm. RELATION is
   always (Entity)->(Entity); binding the open end to :Entity collapses the
   Append to a single Entity scan and is row-for-row identical (verified on a
   300K-edge AGE graph: ->() and ->(:Entity) both return 1816 rows; with the
   start_id index the RELATION scan became a bitmap index scan, 152ms->46ms). All
   four expand sites updated.

3. shm-size (deploy-side). mempalace-db is built from the sibling disks repo, not
   provisioned here, so the required --shm-size=256m (off Docker's 64MB default)
   is documented as an operator action in
   docs/operators/2026-05-31-age-graph-walk-shm-size.md.

Tests: 6 new (4 anonymous-target source guards + edge-index-targets unit +
@Pgmark integration verifying skip-when-absent -> install-when-present ->
idempotent against a live AGE container). Full suite green without
TEST_POSTGRES_DSN; ruff clean. FORK_CHANGELOG.md + README.md re-rendered from
docs/fork-changes.yaml (the row-renumber is generator output).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 31, 2026 01:18
@gemini-code-assist

Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

jphein and others added 3 commits May 30, 2026 18:21
The mempalace check-docs gate (step 1) was red on main: #336 added tests
but left README at 3853 while pytest collected more. Merged origin/main
(#336) into this branch and bumped the count via the canonical method
(pytest --collect-only -q → "N/M tests collected" → N), which now includes
both #336's predicate-norm tests and #335's 6 new graph-walk tests.

check-docs step 1 verified green: README 3872 == pytest collects 3872.
render-docs --check clean; ruff clean; #336's 40 predicate-norm tests pass
on the merged branch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…link

Green the check-docs / lint / lychee gates on #337 (all mechanical):

- lint: `ruff format` knowledge_graph_age.py (the _ensure_edge_endpoint_indexes
  block — multi-line arg + split f-string the formatter rejoins). ruff check
  and ruff format --check now both clean.
- check-docs: regenerate website/public/llms-full.txt (render-llms-full.py) and
  website/reference/python-api/ (render-api-docs.py) — both were stale relative
  to the merged tree.
- lychee: the fork-changes entry had `pr: 335`, which rendered an upstream-PR
  link to github.com/MemPalace/pull/335 — that PR does not exist
  (#335 is a techempower-org fork *issue*, referenced as MemPalace#335 in the
  prose). Dropped the pr/pr_state fields (the field is for upstream PRs only).
  Also changed the commit placeholder HEAD -> TBD: commit/TBD is in lychee.toml's
  exclude list (the documented "resolved to a real SHA when the entry lands"
  placeholder), commit/HEAD is not.

check-docs all 7 sub-checks green (README 3872 == pytest 3872); ruff clean.
vale (advisory prose lint) intentionally left as-is.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jphein jphein merged commit e37978e into main May 31, 2026
9 of 11 checks passed
@jphein jphein deleted the perf/335-age-graph-followups branch May 31, 2026 01:35
jphein added a commit that referenced this pull request May 31, 2026
The only real lychee error on #337 (CI run 26700068716) was
[404] https://github.com/doobidoo/mcp-memory-service in README.md +
docs/research/2026-05-24-memory-system-benchmarks.md — a comparison
pointer whose upstream repo was deleted/renamed (confirmed 404). It's
pre-existing on main (added in 5502c1d/9f8a1f7/0aceea3, not by #335) so
lychee is red on main too; this greens it for both.

Added to lychee.toml's existing "external sites that 404 from CI but are
intentionally referenced" exclude block, matching kostadis/CampaignGenerator
etc. Verified: the pattern matches the URL, commit/TBD stays excluded, real
commit SHAs are not over-excluded, TOML parses.

(My #335 commit-link fix — HEAD→TBD — already landed in d35b8ad and shows
[EXCLUDED] in the same CI log.)

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: AGE graph-walk follow-ups from Cat 7b (auto-index in backfill_age, bind ->() target, raise shm-size)

2 participants