Skip to content

Commit 3fb9428

Browse files
jpheinclaude
andauthored
fix(kg): kg_llm_extractor rewrites AGE dollar-quote tag in triples (#320)
Closes #313. Drawers indexing palace-daemon / mempalace source code contain mp_age_q verbatim (the AGE dollar-quote tag the cypher wrapper uses to delimit its outer SQL literal). The LLM extractor was passing those tag substrings straight through to add_triple, where _cypher_literal's defensive check correctly rejected them — but the rejection meant the drawer's KG presence was incomplete and the worker logged a warning per failed write. The error message itself says where the fix belongs: "reject upstream in the sanitizer". This change adds the rewrite at _validate — the boundary where the LLM hands triples back — replacing 'mp_age_q' with 'MP_AGE_Q_LIT'. The case-sensitive substring check in _cypher_literal no longer fires on the rewritten value (Python's `in` operator is case-sensitive). Predicate normalization happens before the rewrite, so the lowercased predicate's tag substring gets caught and the final predicate ends up with the upper-case placeholder — readable, KG-queryable, safe. Six new tests in tests/test_kg_extractor.py cover the rewrite on each field, leave-clean-triples-unchanged, multiple-occurrence, and an end-to-end roundtrip that proves the rewritten triple survives _cypher_literal without raising. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 16c1edc commit 3fb9428

6 files changed

Lines changed: 360 additions & 166 deletions

File tree

FORK_CHANGELOG.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -254,6 +254,38 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
254254
### Fixed
255255

256256

257+
- **kg_llm_extractor rewrites AGE dollar-quote tag in triples so drawers indexing palace source code don't fail at add_triple (#313)** ([`HEAD`](https://github.com/techempower-org/mempalace/commit/HEAD))
258+
Drawers indexing palace-daemon / mempalace source code contain
259+
``mp_age_q`` verbatim (the AGE dollar-quote tag the cypher
260+
wrapper uses to delimit its outer SQL literal). When the LLM
261+
extracted triples about those drawers — e.g. subject="LINE 1",
262+
predicate="select", object="* FROM cypher($1, $mp_age_q$" — the
263+
triple's text carried the tag substring straight to
264+
``_cypher_literal``, which raised ``Cypher literal contains the
265+
AGE dollar-quote tag 'mp_age_q'; reject upstream in the
266+
sanitizer.`` The drawer's KG presence was incomplete and the
267+
worker logged a warning per failed write.
268+
269+
The error message itself flagged the fix location ("reject
270+
upstream in the sanitizer"). This change adds a substring
271+
rewrite at ``_validate`` — the boundary where the LLM hands
272+
triples back — replacing ``mp_age_q`` with ``MP_AGE_Q_LIT``.
273+
The case-sensitive substring check in ``_cypher_literal`` no
274+
longer fires on the rewritten value (Python's ``in`` operator
275+
is case-sensitive). Predicate-normalization happens before the
276+
rewrite, so the lowercased predicate's tag substring is the
277+
one that gets caught and the final predicate ends up with the
278+
upper-case placeholder — readable, KG-queryable, safe.
279+
280+
The roundtrip test (``test_validate_output_survives_cypher_literal``)
281+
proves the end-to-end fix: a triple whose object is the verbatim
282+
``_AGE_DQ_TAG = "mp_age_q"`` source line now passes through
283+
``_cypher_literal`` for all three fields without raising.
284+
285+
*Tests:* 6 — tests/test_kg_extractor.py (rewrites subject + object + predicate, leaves clean triples unchanged, handles multiple occurrences, end-to-end roundtrip survives _cypher_literal)
286+
*Files:* `mempalace/kg_llm_extractor.py`, `tests/test_kg_extractor.py`
287+
288+
257289
- **scripts/check-docs.sh finds pytest via main checkout when run from a worktree, fails hard instead of silently skipping test-count check (#311)** ([`1d19a8b`](https://github.com/techempower-org/mempalace/commit/1d19a8b))
258290
Working a fork-ahead PR in a worktree (the standard pattern per
259291
CLAUDE.md), ``bash scripts/check-docs.sh`` reported "docs clean"

0 commit comments

Comments
 (0)