feat(guardrails): vendor-neutral content guardrail seams#1652
feat(guardrails): vendor-neutral content guardrail seams#1652garrytan-agents wants to merge 1 commit into
Conversation
Expose observe-only guardrail seams at the five boundaries where external
content enters the retrieval layer and the LLM gateway, so a content firewall
(prompt-injection / RAG-poison detector, PII scrubber, etc.) can be hooked in
without binding GBrain to any specific vendor.
New module src/core/guardrails.ts:
- runGuardrails({ hook, content, metadata }) -> void
- registerGuardrailProvider / unregisterGuardrailProvider
- hasGuardrails() fast-path guard for hot paths
Seams (all observe-only, fail-open, inline-await, inert by default):
- file_storage.markdown (import-file.ts importFromContent)
- file_storage.code (import-file.ts importCodeFile)
- ai_gateway.chat (gateway.ts chat, last user message only)
- ai_gateway.expand (gateway.ts expand)
- ai_gateway.tool_input (gateway.ts toolLoop, before pending-persist)
Invariants enforced by test/guardrails.test.ts (14 tests):
- returns void; callers never branch on a verdict
- provider throw/reject is swallowed (fail-open isolation)
- slow async provider is awaited before resolving (inline)
- zero providers => no-op; empty/blank content short-circuits
- content + metadata passed through unmutated; idempotent by id
Hooks pass only the ingest/user-facing payload (md/code body, last user
message, expansion query, tool input). Never system prompts, full history,
tool output, LLM output, embeddings, or multimodal payloads.
Docs: docs/guardrails.md (contract, seam table, provider authoring guide).
OSS ships inert; vendors register a provider in their own package.
✅ Pre-merge review gateRan the GStack Tests:
Informational: redundant Codex review: attempted Verdict: ready to pick up. |
|
Superseded by #1660. Rebased into a base-repo branch ( |
…supersedes #1652) (#1660) * feat(guardrails): vendor-neutral content guardrail seams Expose observe-only guardrail seams at the five boundaries where external content enters the retrieval layer and the LLM gateway, so a content firewall (prompt-injection / RAG-poison detector, PII scrubber, etc.) can be hooked in without binding GBrain to any specific vendor. New module src/core/guardrails.ts: - runGuardrails({ hook, content, metadata }) -> void - registerGuardrailProvider / unregisterGuardrailProvider - hasGuardrails() fast-path guard for hot paths Seams (all observe-only, fail-open, inline-await, inert by default): - file_storage.markdown (import-file.ts importFromContent) - file_storage.code (import-file.ts importCodeFile) - ai_gateway.chat (gateway.ts chat, last user message only) - ai_gateway.expand (gateway.ts expand) - ai_gateway.tool_input (gateway.ts toolLoop, before pending-persist) Invariants enforced by test/guardrails.test.ts (14 tests): - returns void; callers never branch on a verdict - provider throw/reject is swallowed (fail-open isolation) - slow async provider is awaited before resolving (inline) - zero providers => no-op; empty/blank content short-circuits - content + metadata passed through unmutated; idempotent by id Hooks pass only the ingest/user-facing payload (md/code body, last user message, expansion query, tool input). Never system prompts, full history, tool output, LLM output, embeddings, or multimodal payloads. Docs: docs/guardrails.md (contract, seam table, provider authoring guide). OSS ships inert; vendors register a provider in their own package. * chore: bump version and changelog (v0.41.35.0) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: garrytan-agents <agent@garrytan.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* upstream/master: v0.41.36.0 feat(mcp): publish agent skills (list_skills / get_skill) for thin clients (garrytan#1661) v0.41.35.0 feat(guardrails): vendor-neutral content guardrail seams (supersedes garrytan#1652) (garrytan#1660) v0.41.34.0 feat(search): retrieval cathedral — max-pool + title + alias + evidence (garrytan#1657) v0.41.33.0 feat(search): intent-aware adaptive return-sizing + agent-facing query param (garrytan#1640) v0.41.32.0 fix(staleness): commit-relative sync staleness (supersedes garrytan#1623) (garrytan#1656) v0.41.31.0 feat(embed): delta-aware sync --all cost gate + real stale-embedding semantics (garrytan#1632) v0.41.30.0 fix(brainstorm/lsd): --save writes the advertised .md file via canonical ingestion path (garrytan#1655) # Conflicts: # src/core/operations.ts
What
Exposes vendor-neutral content guardrail seams at the five boundaries where external content enters GBrain's retrieval layer and where queries/tool-inputs enter the LLM gateway. Lets a content firewall (prompt-injection / RAG-poison detector, PII scrubber, etc.) be hooked in without binding GBrain to any specific vendor.
OSS ships inert — zero guardrails registered by default, every seam is a no-op until an operator registers a provider.
Why
Content poisoning (a malicious page, a booby-trapped tweet) only becomes dangerous at the moment it's ingested into the retrieval layer and made searchable. That import boundary — plus the gateway's own LLM calls — is the right place to let an external classifier observe. Rather than wire one vendor into core, this adds a generic seam that any guardrail backend implements against.
The seam
New module
src/core/guardrails.ts:runGuardrails({ hook, content, metadata }): Promise<void>registerGuardrailProvider/unregisterGuardrailProviderhasGuardrails()fast-path guard for hot pathsHard invariants (enforced by
test/guardrails.test.ts)void; callers never branch on a verdict. Cannot block/rewrite/drop/retry. Enforcement, if ever added, gets its own RFC-gated seam.Hooks
hookfile_storage.markdownimportFromContentfile_storage.codeimportCodeFileai_gateway.chatchatai_gateway.expandexpandai_gateway.tool_inputtoolLoop{toolName, input}, before pending-persist + executionTests
test/guardrails.test.ts— 14 tests: inert-by-default, register/unregister/idempotent, fail-open isolation, inline-await, verdict-ignored, empty/blank short-circuit, content+metadata pass-through.import-file.test.ts+import-file-content-sanity.test.ts— 40 tests still green (hot path undisturbed).tsc --noEmitclean across the repo.Docs
docs/guardrails.md— contract, seam table, provider-authoring guide.Scope / non-goals