Skip to content

feat(guardrails): vendor-neutral content guardrail seams#1652

Closed
garrytan-agents wants to merge 1 commit into
garrytan:masterfrom
garrytan-agents:feat/guardrail-seams
Closed

feat(guardrails): vendor-neutral content guardrail seams#1652
garrytan-agents wants to merge 1 commit into
garrytan:masterfrom
garrytan-agents:feat/guardrail-seams

Conversation

@garrytan-agents

Copy link
Copy Markdown
Contributor

What

Exposes vendor-neutral content guardrail seams at the five boundaries where external content enters GBrain's retrieval layer and where queries/tool-inputs enter the LLM gateway. Lets a content firewall (prompt-injection / RAG-poison detector, PII scrubber, etc.) be hooked in without binding GBrain to any specific vendor.

OSS ships inert — zero guardrails registered by default, every seam is a no-op until an operator registers a provider.

Why

Content poisoning (a malicious page, a booby-trapped tweet) only becomes dangerous at the moment it's ingested into the retrieval layer and made searchable. That import boundary — plus the gateway's own LLM calls — is the right place to let an external classifier observe. Rather than wire one vendor into core, this adds a generic seam that any guardrail backend implements against.

The seam

New module src/core/guardrails.ts:

  • runGuardrails({ hook, content, metadata }): Promise<void>
  • registerGuardrailProvider / unregisterGuardrailProvider
  • hasGuardrails() fast-path guard for hot paths

Hard invariants (enforced by test/guardrails.test.ts)

  • Observe-only — returns void; callers never branch on a verdict. Cannot block/rewrite/drop/retry. Enforcement, if ever added, gets its own RFC-gated seam.
  • Fail open — provider throw/reject/timeout/network-error is swallowed; a broken guardrail never breaks an ingest, query, or tool call.
  • Inline await — provider sees content at the exact pre-persist / pre-inference moment.
  • No verdict persistence — providers own their own audit trail.
  • Content boundaries — passes only the ingest/user payload; never system prompts, full history, tool output, LLM output, embeddings, or multimodal/OCR/rerank.

Hooks

hook Location Fires
file_storage.markdown importFromContent after parse + size guard, before sanity/hash/chunk/embed/write
file_storage.code importCodeFile after size guard, before hash/chunk/embed/write
ai_gateway.chat chat latest user message only, before inference
ai_gateway.expand expand query, before expansion model call
ai_gateway.tool_input toolLoop {toolName, input}, before pending-persist + execution

Tests

  • test/guardrails.test.ts — 14 tests: inert-by-default, register/unregister/idempotent, fail-open isolation, inline-await, verdict-ignored, empty/blank short-circuit, content+metadata pass-through.
  • Existing import-file.test.ts + import-file-content-sanity.test.ts — 40 tests still green (hot path undisturbed).
  • tsc --noEmit clean across the repo.

Docs

docs/guardrails.md — contract, seam table, provider-authoring guide.

Scope / non-goals

  • No enforcement. This is the observe seam only.
  • No bundled vendor. A guardrail provider lives in its own package and registers at init.

Expose observe-only guardrail seams at the five boundaries where external
content enters the retrieval layer and the LLM gateway, so a content firewall
(prompt-injection / RAG-poison detector, PII scrubber, etc.) can be hooked in
without binding GBrain to any specific vendor.

New module src/core/guardrails.ts:
  - runGuardrails({ hook, content, metadata }) -> void
  - registerGuardrailProvider / unregisterGuardrailProvider
  - hasGuardrails() fast-path guard for hot paths

Seams (all observe-only, fail-open, inline-await, inert by default):
  - file_storage.markdown  (import-file.ts importFromContent)
  - file_storage.code      (import-file.ts importCodeFile)
  - ai_gateway.chat        (gateway.ts chat, last user message only)
  - ai_gateway.expand      (gateway.ts expand)
  - ai_gateway.tool_input  (gateway.ts toolLoop, before pending-persist)

Invariants enforced by test/guardrails.test.ts (14 tests):
  - returns void; callers never branch on a verdict
  - provider throw/reject is swallowed (fail-open isolation)
  - slow async provider is awaited before resolving (inline)
  - zero providers => no-op; empty/blank content short-circuits
  - content + metadata passed through unmutated; idempotent by id

Hooks pass only the ingest/user-facing payload (md/code body, last user
message, expansion query, tool input). Never system prompts, full history,
tool output, LLM output, embeddings, or multimodal payloads.

Docs: docs/guardrails.md (contract, seam table, provider authoring guide).
OSS ships inert; vendors register a provider in their own package.
@garrytan-agents

Copy link
Copy Markdown
Contributor Author

✅ Pre-merge review gate

Ran the GStack /review critical pass + tests before marking ready.

Tests: test/guardrails.test.ts — 14/14 pass. Existing import-file.test.ts + import-file-content-sanity.test.ts — 40/40 pass (hot path undisturbed). tsc --noEmit clean across the repo.

/review critical pass (5 categories):

  • SQL & Data Safety — ✅ clean. No SQL in diff; seam writes nothing to DB/vector store (verdicts explicitly not persisted).
  • Race Conditions & Concurrency — ✅ clean. providers Map is snapshotted via Array.from() before Promise.all iteration, so mid-flight register/unregister can't mutate the iteration set. No find-or-create / check-then-write.
  • LLM Output Trust Boundary — ✅ clean, and on-point: this seam observes inbound content pre-persist; it never consumes LLM output, and the provider return value is typed unknown and ignored. No path where a verdict influences a DB write, mailer, or fetch. This is the observation point for the stored-prompt-injection class, not a new instance of it.
  • Shell Injection — ✅ clean. No exec/eval/subprocess/shell.
  • Enum & Value Completeness — ✅ clean. New GuardrailHook union (5 values); traced every consumer — each value emitted by exactly one seam caller, no switch/case consumes it (providers get it as opaque metadata), so no exhaustiveness gap.

Informational: redundant hasGuardrails() check in the gateway wrapper is intentional (skips the message-array walk on every chat call in the common zero-guardrail case). Import is used. No slop.

Codex review: attempted codex exec review; hung in this container's read-only sandbox (no bubblewrap) without emitting findings — environment limitation, not a code signal. /review is the authoritative gate here.

Verdict: ready to pick up.

@garrytan

Copy link
Copy Markdown
Owner

Superseded by #1660. Rebased into a base-repo branch (garrytan/guardrails-seam) so CI gets secret access per the garrytan-agents workflow in CLAUDE.md, and folded in the v0.41.34.0 release bookkeeping (VERSION + CHANGELOG). Your feature commit was cherry-picked verbatim — authorship preserved, code byte-identical. Verified on the new branch: bun run verify 29/29 green, 14/14 guardrail tests, import-file hot path undisturbed. Thank you for this — the five-seam shape is exactly right, and it closes the injection-via-filesystem gap. Closing this one.

@garrytan garrytan closed this May 30, 2026
garrytan added a commit that referenced this pull request May 30, 2026
…supersedes #1652) (#1660)

* feat(guardrails): vendor-neutral content guardrail seams

Expose observe-only guardrail seams at the five boundaries where external
content enters the retrieval layer and the LLM gateway, so a content firewall
(prompt-injection / RAG-poison detector, PII scrubber, etc.) can be hooked in
without binding GBrain to any specific vendor.

New module src/core/guardrails.ts:
  - runGuardrails({ hook, content, metadata }) -> void
  - registerGuardrailProvider / unregisterGuardrailProvider
  - hasGuardrails() fast-path guard for hot paths

Seams (all observe-only, fail-open, inline-await, inert by default):
  - file_storage.markdown  (import-file.ts importFromContent)
  - file_storage.code      (import-file.ts importCodeFile)
  - ai_gateway.chat        (gateway.ts chat, last user message only)
  - ai_gateway.expand      (gateway.ts expand)
  - ai_gateway.tool_input  (gateway.ts toolLoop, before pending-persist)

Invariants enforced by test/guardrails.test.ts (14 tests):
  - returns void; callers never branch on a verdict
  - provider throw/reject is swallowed (fail-open isolation)
  - slow async provider is awaited before resolving (inline)
  - zero providers => no-op; empty/blank content short-circuits
  - content + metadata passed through unmutated; idempotent by id

Hooks pass only the ingest/user-facing payload (md/code body, last user
message, expansion query, tool input). Never system prompts, full history,
tool output, LLM output, embeddings, or multimodal payloads.

Docs: docs/guardrails.md (contract, seam table, provider authoring guide).
OSS ships inert; vendors register a provider in their own package.

* chore: bump version and changelog (v0.41.35.0)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: garrytan-agents <agent@garrytan.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
mgunnin added a commit to mgunnin/gbrain that referenced this pull request Jun 3, 2026
* upstream/master:
  v0.41.36.0 feat(mcp): publish agent skills (list_skills / get_skill) for thin clients (garrytan#1661)
  v0.41.35.0 feat(guardrails): vendor-neutral content guardrail seams (supersedes garrytan#1652) (garrytan#1660)
  v0.41.34.0 feat(search): retrieval cathedral — max-pool + title + alias + evidence (garrytan#1657)
  v0.41.33.0 feat(search): intent-aware adaptive return-sizing + agent-facing query param (garrytan#1640)
  v0.41.32.0 fix(staleness): commit-relative sync staleness (supersedes garrytan#1623) (garrytan#1656)
  v0.41.31.0 feat(embed): delta-aware sync --all cost gate + real stale-embedding semantics (garrytan#1632)
  v0.41.30.0 fix(brainstorm/lsd): --save writes the advertised .md file via canonical ingestion path (garrytan#1655)

# Conflicts:
#	src/core/operations.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants