Skip to content

feat: add guardrails system with system prompt injection/override/decorator#64

Merged
SantiagoDePolonia merged 6 commits intomainfrom
claude/add-guardrails-system-nu1RR
Feb 15, 2026
Merged

feat: add guardrails system with system prompt injection/override/decorator#64
SantiagoDePolonia merged 6 commits intomainfrom
claude/add-guardrails-system-nu1RR

Conversation

@SantiagoDePolonia
Copy link
Copy Markdown
Contributor

@SantiagoDePolonia SantiagoDePolonia commented Feb 10, 2026

Implement a pluggable guardrails pipeline that intercepts requests before
they reach providers. Guardrails can be executed sequentially (chained) or
in parallel (concurrent with ordered application). The pipeline wraps the
router as a RoutableProvider decorator, keeping handler code unchanged.

The first guardrail is a system prompt guardrail with three modes:

  • inject: adds a system message only if none exists
  • override: replaces all existing system messages
  • decorator: prepends configured content to the first system message

Both ChatCompletion and Responses API endpoints are supported. All
configuration is driven from config.yaml and environment variables.

https://claude.ai/code/session_011U6sHbthHcwY68FZhAEFVH

Summary by CodeRabbit

  • New Features

    • Added a pluggable guardrails system to inspect, modify, or block requests before they reach providers.
    • Added system-prompt guardrails with modes: inject, override, decorator.
    • Guardrails can be ordered and grouped to run sequentially or in parallel and can wrap request routing when enabled.
  • Configuration

    • New guardrails config block to enable/disable and declare guardrail rules with example entries.
  • Tests

    • Comprehensive unit tests covering pipeline, provider wrapping, and system-prompt behavior.

…orator

Implement a pluggable guardrails pipeline that intercepts requests before
they reach providers. Guardrails can be executed sequentially (chained) or
in parallel (concurrent with ordered application). The pipeline wraps the
router as a RoutableProvider decorator, keeping handler code unchanged.

The first guardrail is a system prompt guardrail with three modes:
- inject: adds a system message only if none exists
- override: replaces all existing system messages
- decorator: prepends configured content to the first system message

Both ChatCompletion and Responses API endpoints are supported. All
configuration is driven from config.yaml and environment variables.

https://claude.ai/code/session_011U6sHbthHcwY68FZhAEFVH
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 10, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a pluggable guardrails subsystem: config schema and types, a Guardrail interface, an ordered/parallel Pipeline executor, a GuardedProvider wrapper that applies guardrails before routing, a SystemPromptGuardrail implementation, tests, and app wiring to enable guardrails when configured.

Changes

Cohort / File(s) Summary
Configuration
config/config.example.yaml, config/config.go
Added guardrails config block and Guardrails field on Config; introduced GuardrailRuleConfig and SystemPromptSettings (mode, content); default config initializes empty Guardrails.
Guardrail API
internal/guardrails/guardrails.go
Added Message DTO and Guardrail interface with Name() and Process(ctx, msgs) to normalize and transform message sequences.
Pipeline + tests
internal/guardrails/pipeline.go, internal/guardrails/pipeline_test.go
Implemented Pipeline supporting ordered groups (sequential across orders, parallel within an order); tests cover ordering, parallelism, output merging, and error propagation.
Provider wrapper + tests
internal/guardrails/provider.go, internal/guardrails/provider_test.go
Added GuardedProvider that adapts concrete requests to normalized messages, runs the pipeline, and forwards modified requests to the inner RoutableProvider; tests validate chat/stream/responses paths, delegation, and error handling.
System prompt guardrail + tests
internal/guardrails/system_prompt.go, internal/guardrails/system_prompt_test.go
Added SystemPromptGuardrail with inject, override, and decorator modes, constructor validation, and processing helpers; comprehensive unit tests for modes, edge cases, and non-mutation.
App wiring
internal/app/app.go
Wires guardrails pipeline construction at startup; conditionally wraps the router/provider with GuardedProvider when pipeline has entries and logs registration details and errors.

Sequence Diagram

sequenceDiagram
    participant Client
    participant GuardedProvider
    participant Pipeline
    participant SystemPromptGuardrail
    participant InnerProvider

    Client->>GuardedProvider: ChatCompletion(req)
    GuardedProvider->>Pipeline: Process(ctx, messages)
    Pipeline->>SystemPromptGuardrail: Process(ctx, messages)
    SystemPromptGuardrail-->>Pipeline: modified messages
    Pipeline-->>GuardedProvider: processed messages
    GuardedProvider->>InnerProvider: ChatCompletion(processed req)
    InnerProvider-->>GuardedProvider: response
    GuardedProvider-->>Client: response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

enhancement

Poem

🐰 I hop through configs, prompts, and lanes,

I tuck safe words gently into chatty plains.
Orders march forward, peers run in tandem,
I whisper system prompts so responses stay random (but safe!).
A little rabbit guarding every stanza.

🚥 Pre-merge checks | ✅ 3 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main change: introducing a guardrails system with system prompt modes (injection/override/decorator). It accurately reflects the primary additions across config, app integration, guardrails pipeline, and system prompt implementation.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch claude/add-guardrails-system-nu1RR

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In `@config/config.go`:
- Around line 41-70: Add validation during configuration loading to reject
invalid values early: in the config Load() path validate
GuardrailsConfig.Execution accepts only "sequential" or "parallel" and validate
SystemPromptGuardrailConfig.Mode accepts only "inject", "override", or
"decorator"; on invalid values return a clear error describing the field and
allowed options so operators fail fast instead of relying on downstream checks
in pipeline construction or NewSystemPromptGuardrail.

In `@internal/guardrails/pipeline.go`:
- Around line 97-121: The processChatParallel function currently runs
Guardrail.ProcessChat concurrently but then overwrites the request by setting
current = r.req in the final loop, causing "last guardrail wins" and discarding
earlier modifications; also the original req pointer is shared across goroutines
(risk of future data races) and chatResult.index is unused. Fix by either (A)
documenting that Pipeline.processChatParallel is validation-only and will not
merge modifications and explicitly copying req before dispatch to guardrails to
avoid races, or (B) implement merge semantics: have each goroutine operate on a
deep copy of the input (copy the core.ChatRequest before calling
Guardrail.ProcessChat), then in the aggregation loop sequentially apply/merge
modifications into current in registration order (use a well-defined merge
function or call a method on core.ChatRequest to apply diffs) and remove the
unused chatResult.index field; ensure Guardrail.ProcessChat is invoked with the
copied request to prevent in-place mutations from racing.

In `@internal/guardrails/provider.go`:
- Around line 19-24: Update the NewGuardedProvider constructor to validate its
inputs: check if the incoming inner (core.RoutableProvider) or pipeline
(*Pipeline) is nil and fail fast with a clear error message (e.g., panic or
log.Fatalf) instead of returning a GuardedProvider that will later panic on
method calls; modify the NewGuardedProvider function to perform these nil checks
and produce a descriptive error referencing NewGuardedProvider, inner and
pipeline so wiring bugs fail at construction time.

In `@internal/guardrails/system_prompt.go`:
- Around line 55-67: The ProcessChat implementation in SystemPromptGuardrail
currently constructs a new core.ChatRequest by manually copying fields which
will drop future fields; instead, make a value copy of the incoming req (e.g.,
copied := *req) and then set copied.Messages = g.applyToMessages(req.Messages)
and return &copied, so only Messages is replaced while all existing and future
core.ChatRequest fields are preserved (update SystemPromptGuardrail.ProcessChat
accordingly).
- Around line 71-85: ProcessResponses currently constructs a new
core.ResponsesRequest by hand and omits any future fields, risking silent drops;
change SystemPromptGuardrail.ProcessResponses to clone the incoming req (e.g.,
make a copy via copy := *req) and then set copy.Instructions =
g.applyToInstructions(req.Instructions) before returning &copy so all existing
and future fields on core.ResponsesRequest are preserved; reference
SystemPromptGuardrail.ProcessResponses, core.ResponsesRequest, and
applyToInstructions when locating the change.

Comment on lines +41 to 70
// GuardrailsConfig holds configuration for the request guardrails pipeline.
type GuardrailsConfig struct {
// Enabled controls whether guardrails are active
// Default: false
Enabled bool `yaml:"enabled" env:"GUARDRAILS_ENABLED"`

// Execution controls how guardrails are executed: "sequential" or "parallel"
// Default: "sequential"
Execution string `yaml:"execution" env:"GUARDRAILS_EXECUTION"`

// SystemPrompt configures the system prompt guardrail
SystemPrompt SystemPromptGuardrailConfig `yaml:"system_prompt"`
}

// SystemPromptGuardrailConfig holds configuration for the system prompt guardrail.
type SystemPromptGuardrailConfig struct {
// Enabled controls whether the system prompt guardrail is active
// Default: false
Enabled bool `yaml:"enabled" env:"GUARDRAILS_SYSTEM_PROMPT_ENABLED"`

// Mode controls how the system prompt is applied: "inject", "override", or "decorator"
// - inject: adds a system message only if none exists
// - override: replaces all existing system messages
// - decorator: prepends to the first existing system message
// Default: "inject"
Mode string `yaml:"mode" env:"GUARDRAILS_SYSTEM_PROMPT_MODE"`

// Content is the system prompt text to apply
Content string `yaml:"content" env:"GUARDRAILS_SYSTEM_PROMPT_CONTENT"`
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider validating Execution and Mode at config load time.

Both Execution (line 49) and Mode (line 66) are free-form strings with constrained valid values ("sequential"/"parallel" and "inject"/"override"/"decorator" respectively). While downstream code (the pipeline and NewSystemPromptGuardrail) validates these, catching invalid values early in Load() would give operators a clearer error at startup rather than a failure during pipeline construction.

This is optional — the current approach works since construction-time validation exists.

🤖 Prompt for AI Agents
In `@config/config.go` around lines 41 - 70, Add validation during configuration
loading to reject invalid values early: in the config Load() path validate
GuardrailsConfig.Execution accepts only "sequential" or "parallel" and validate
SystemPromptGuardrailConfig.Mode accepts only "inject", "override", or
"decorator"; on invalid values return a clear error describing the field and
allowed options so operators fail fast instead of relying on downstream checks
in pipeline construction or NewSystemPromptGuardrail.

Comment on lines +97 to +121
func (p *Pipeline) processChatParallel(ctx context.Context, req *core.ChatRequest) (*core.ChatRequest, error) {
results := make([]chatResult, len(p.guardrails))
var wg sync.WaitGroup

for i, g := range p.guardrails {
wg.Add(1)
go func(idx int, guardrail Guardrail) {
defer wg.Done()
modified, err := guardrail.ProcessChat(ctx, req)
results[idx] = chatResult{index: idx, req: modified, err: err}
}(i, g)
}

wg.Wait()

// Check for errors and apply modifications in order
current := req
for i, r := range results {
if r.err != nil {
return nil, fmt.Errorf("guardrail %q: %w", p.guardrails[i].Name(), r.err)
}
current = r.req
}
return current, nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Parallel mode: "last guardrail wins" rather than merging modifications in order.

The loop on Lines 114-119 simply replaces current with each successive result, meaning only the last guardrail's output survives. The doc comment says "Modifications are applied sequentially in registration order" but in practice, each guardrail runs on the original request and the final loop just picks the last result — earlier guardrails' modifications are silently discarded.

This is fine when parallel mode is used purely for validation (pass/reject with no modifications), but will silently lose changes if multiple guardrails modify the request. Consider either:

  1. Documenting this limitation clearly (parallel mode is for validation-only guardrails), or
  2. Implementing actual merge semantics.

Additionally, the shared req pointer passed to concurrent goroutines (Line 105) is safe today because the system prompt guardrail creates a new request rather than mutating in-place, but future guardrails that mutate the input would cause a data race. Consider documenting this contract or defensively copying req before dispatching.

Minor: unused `index` field

The index field in chatResult (Line 89) and responsesResult (Line 125) is set but never read — the loop uses the slice index i directly.

 type chatResult struct {
-	index int
 	req   *core.ChatRequest
 	err   error
 }
 type responsesResult struct {
-	index int
 	req   *core.ResponsesRequest
 	err   error
 }

And in the goroutines:

-			results[idx] = chatResult{index: idx, req: modified, err: err}
+			results[idx] = chatResult{req: modified, err: err}
🤖 Prompt for AI Agents
In `@internal/guardrails/pipeline.go` around lines 97 - 121, The
processChatParallel function currently runs Guardrail.ProcessChat concurrently
but then overwrites the request by setting current = r.req in the final loop,
causing "last guardrail wins" and discarding earlier modifications; also the
original req pointer is shared across goroutines (risk of future data races) and
chatResult.index is unused. Fix by either (A) documenting that
Pipeline.processChatParallel is validation-only and will not merge modifications
and explicitly copying req before dispatch to guardrails to avoid races, or (B)
implement merge semantics: have each goroutine operate on a deep copy of the
input (copy the core.ChatRequest before calling Guardrail.ProcessChat), then in
the aggregation loop sequentially apply/merge modifications into current in
registration order (use a well-defined merge function or call a method on
core.ChatRequest to apply diffs) and remove the unused chatResult.index field;
ensure Guardrail.ProcessChat is invoked with the copied request to prevent
in-place mutations from racing.

Comment on lines +19 to +24
func NewGuardedProvider(inner core.RoutableProvider, pipeline *Pipeline) *GuardedProvider {
return &GuardedProvider{
inner: inner,
pipeline: pipeline,
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add nil guards in the constructor.

If inner or pipeline is nil (e.g., due to a wiring bug in app setup), every method will panic with a nil-pointer dereference at runtime. A defensive check here provides a clear error at construction time.

🛡️ Proposed fix
 func NewGuardedProvider(inner core.RoutableProvider, pipeline *Pipeline) *GuardedProvider {
+	if inner == nil {
+		panic("guardrails: inner provider must not be nil")
+	}
+	if pipeline == nil {
+		panic("guardrails: pipeline must not be nil")
+	}
 	return &GuardedProvider{
 		inner:    inner,
 		pipeline: pipeline,
 	}
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func NewGuardedProvider(inner core.RoutableProvider, pipeline *Pipeline) *GuardedProvider {
return &GuardedProvider{
inner: inner,
pipeline: pipeline,
}
}
func NewGuardedProvider(inner core.RoutableProvider, pipeline *Pipeline) *GuardedProvider {
if inner == nil {
panic("guardrails: inner provider must not be nil")
}
if pipeline == nil {
panic("guardrails: pipeline must not be nil")
}
return &GuardedProvider{
inner: inner,
pipeline: pipeline,
}
}
🤖 Prompt for AI Agents
In `@internal/guardrails/provider.go` around lines 19 - 24, Update the
NewGuardedProvider constructor to validate its inputs: check if the incoming
inner (core.RoutableProvider) or pipeline (*Pipeline) is nil and fail fast with
a clear error message (e.g., panic or log.Fatalf) instead of returning a
GuardedProvider that will later panic on method calls; modify the
NewGuardedProvider function to perform these nil checks and produce a
descriptive error referencing NewGuardedProvider, inner and pipeline so wiring
bugs fail at construction time.

Comment on lines +71 to +85
func (g *SystemPromptGuardrail) ProcessResponses(ctx context.Context, req *core.ResponsesRequest) (*core.ResponsesRequest, error) {
instructions := g.applyToInstructions(req.Instructions)
return &core.ResponsesRequest{
Model: req.Model,
Input: req.Input,
Instructions: instructions,
Tools: req.Tools,
Temperature: req.Temperature,
MaxOutputTokens: req.MaxOutputTokens,
Stream: req.Stream,
StreamOptions: req.StreamOptions,
Metadata: req.Metadata,
Reasoning: req.Reasoning,
}, nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Same fragile copy pattern for ResponsesRequest — apply the same fix.

Identical concern as ProcessChat: any new field on core.ResponsesRequest will be silently dropped.

♻️ Proposed fix
 func (g *SystemPromptGuardrail) ProcessResponses(ctx context.Context, req *core.ResponsesRequest) (*core.ResponsesRequest, error) {
 	instructions := g.applyToInstructions(req.Instructions)
-	return &core.ResponsesRequest{
-		Model:           req.Model,
-		Input:           req.Input,
-		Instructions:    instructions,
-		Tools:           req.Tools,
-		Temperature:     req.Temperature,
-		MaxOutputTokens: req.MaxOutputTokens,
-		Stream:          req.Stream,
-		StreamOptions:   req.StreamOptions,
-		Metadata:        req.Metadata,
-		Reasoning:       req.Reasoning,
-	}, nil
+	modified := *req
+	modified.Instructions = instructions
+	return &modified, nil
 }
🤖 Prompt for AI Agents
In `@internal/guardrails/system_prompt.go` around lines 71 - 85, ProcessResponses
currently constructs a new core.ResponsesRequest by hand and omits any future
fields, risking silent drops; change SystemPromptGuardrail.ProcessResponses to
clone the incoming req (e.g., make a copy via copy := *req) and then set
copy.Instructions = g.applyToInstructions(req.Instructions) before returning
&copy so all existing and future fields on core.ResponsesRequest are preserved;
reference SystemPromptGuardrail.ProcessResponses, core.ResponsesRequest, and
applyToInstructions when locating the change.

…d grouping

Instead of a global "sequential" or "parallel" execution mode, each
guardrail now has an "order" field. Guardrails with the same order
value run in parallel (concurrently), and groups execute sequentially
in ascending order. Single-entry groups skip goroutine overhead.

This enables mixed execution patterns like:
  order 0: [guardrail A, guardrail B] → run in parallel
  order 1: [guardrail C]              → runs after group 0
  order 2: [guardrail D, guardrail E] → run in parallel after C

Adds concurrency test proving same-order guardrails truly run in
parallel using atomic counters and barriers.

https://claude.ai/code/session_011U6sHbthHcwY68FZhAEFVH
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/app/app.go (1)

142-148: ⚠️ Potential issue | 🟡 Minor

Router() returns the unwrapped provider, bypassing guardrails when enabled.

When guardrails are configured, server.New() (line 137) receives the guarded provider from the chain built at lines 111-125. However, Router() returns a.providers.Router directly, unguarded. Any caller using Router() will skip the guardrails pipeline. If this is intentional for internal/admin use, add a doc comment. Otherwise, store the guarded provider and return it here instead.

🤖 Fix all issues with AI agents
In `@internal/guardrails/pipeline_test.go`:
- Around line 151-197: The test TestPipeline_MixedOrders_GroupsExecuteCorrectly
uses a shared slice variable trace that is appended to from a guardrail running
in a parallel group (mockGuardrail named "A"), which is race-prone; change the
test to avoid unsynchronized concurrent writes by protecting trace with a
sync.Mutex (lock/unlock around each append) or by switching to a channel-based
collector (have guardrails send their name into a buffered channel and drain it
after p.ProcessChat), and update the mockGuardrail closures (A and any future
parallel guardrails) to use the chosen synchronized mechanism; alternatively,
add a clear comment on the trace variable that concurrent writes must be
synchronized to prevent future races.
- Around line 109-149: The test TestPipeline_SameOrder_RunInParallel currently
spin-waits on started.Load() with no timeout and can hang; modify the wait after
launching ProcessChat to use a bounded wait (e.g., a select polling loop or
select on a time.After) so that if started does not reach 2 within a short
timeout the test fails instead of blocking indefinitely; update the waiting
logic that references started and barrier (and related done) to include a
timeout branch (use time.After) and call t.Fatal or break the test when the
timeout elapses so the outer select on done can still run.

In `@internal/guardrails/pipeline.go`:
- Around line 130-189: Both runChatGroupParallel and runResponsesGroupParallel
duplicate the same concurrency pattern; consolidate into a single generic helper
that launches goroutines, waits, checks errors, and returns the last successful
modification. Create a generic function (e.g., runGroupParallel[T any]) that
accepts the group []entry, initial request *T, and a processor func(ctx
context.Context, g Guardrail, req *T) (*T, error) (used to call
Guardrail.ProcessChat or Guardrail.ProcessResponses), allocate a results slice
of the generic result type, run the goroutines filling results, wait, then
iterate results to return the first error or last successful modified *T;
replace runChatGroupParallel and runResponsesGroupParallel to call this helper
and remove chatResult/responsesResult duplication.

In `@internal/guardrails/provider_test.go`:
- Around line 62-63: Tests call NewSystemPromptGuardrail and discard its error
return (e.g., g, _ := NewSystemPromptGuardrail(SystemPromptInject, "guardrail
system")); update each test to capture the error (g, err :=
NewSystemPromptGuardrail(...)) and assert it’s nil before proceeding — use the
test helper available in the package (e.g., require.NoError(t, err)) or a simple
if err != nil { t.Fatalf("NewSystemPromptGuardrail failed: %v", err) } so test
setup fails loudly if the constructor starts returning errors; apply the same
change for every occurrence (lines referencing NewSystemPromptGuardrail,
SystemPromptInject, pipeline.Add).

Comment on lines +109 to +149
func TestPipeline_SameOrder_RunInParallel(t *testing.T) {
p := NewPipeline()

var started atomic.Int32
barrier := make(chan struct{})

// Both at order 0 — they should run concurrently
for i := range 2 {
name := fmt.Sprintf("parallel_%d", i)
p.Add(&mockGuardrail{
name: name,
chatFn: func(_ context.Context, req *core.ChatRequest) (*core.ChatRequest, error) {
started.Add(1)
<-barrier // wait until both have started
return req, nil
},
}, 0)
}

req := &core.ChatRequest{Model: "gpt-4"}

done := make(chan struct{})
go func() {
_, _ = p.ProcessChat(context.Background(), req)
close(done)
}()

// Wait for both goroutines to start
for started.Load() < 2 {
time.Sleep(time.Millisecond)
}
// Both started concurrently — release them
close(barrier)

select {
case <-done:
// success
case <-time.After(2 * time.Second):
t.Fatal("timed out — guardrails did not run in parallel")
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unbounded spin-wait can hang the test if parallelism regresses.

The for started.Load() < 2 loop at Line 137 has no timeout. If the implementation changes to run the group sequentially, the first goroutine blocks on barrier, the second never starts, and this loop spins forever — the 2-second select on Line 143 is never reached because the main goroutine is stuck.

Consider adding a timeout to the spin-wait:

Suggested fix
-	// Wait for both goroutines to start
-	for started.Load() < 2 {
-		time.Sleep(time.Millisecond)
-	}
+	// Wait for both goroutines to start (with timeout)
+	deadline := time.After(2 * time.Second)
+	for started.Load() < 2 {
+		select {
+		case <-deadline:
+			t.Fatal("timed out waiting for both guardrails to start — not running in parallel")
+		default:
+			time.Sleep(time.Millisecond)
+		}
+	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func TestPipeline_SameOrder_RunInParallel(t *testing.T) {
p := NewPipeline()
var started atomic.Int32
barrier := make(chan struct{})
// Both at order 0 — they should run concurrently
for i := range 2 {
name := fmt.Sprintf("parallel_%d", i)
p.Add(&mockGuardrail{
name: name,
chatFn: func(_ context.Context, req *core.ChatRequest) (*core.ChatRequest, error) {
started.Add(1)
<-barrier // wait until both have started
return req, nil
},
}, 0)
}
req := &core.ChatRequest{Model: "gpt-4"}
done := make(chan struct{})
go func() {
_, _ = p.ProcessChat(context.Background(), req)
close(done)
}()
// Wait for both goroutines to start
for started.Load() < 2 {
time.Sleep(time.Millisecond)
}
// Both started concurrently — release them
close(barrier)
select {
case <-done:
// success
case <-time.After(2 * time.Second):
t.Fatal("timed out — guardrails did not run in parallel")
}
}
func TestPipeline_SameOrder_RunInParallel(t *testing.T) {
p := NewPipeline()
var started atomic.Int32
barrier := make(chan struct{})
// Both at order 0 — they should run concurrently
for i := range 2 {
name := fmt.Sprintf("parallel_%d", i)
p.Add(&mockGuardrail{
name: name,
chatFn: func(_ context.Context, req *core.ChatRequest) (*core.ChatRequest, error) {
started.Add(1)
<-barrier // wait until both have started
return req, nil
},
}, 0)
}
req := &core.ChatRequest{Model: "gpt-4"}
done := make(chan struct{})
go func() {
_, _ = p.ProcessChat(context.Background(), req)
close(done)
}()
// Wait for both goroutines to start (with timeout)
deadline := time.After(2 * time.Second)
for started.Load() < 2 {
select {
case <-deadline:
t.Fatal("timed out waiting for both guardrails to start — not running in parallel")
default:
time.Sleep(time.Millisecond)
}
}
// Both started concurrently — release them
close(barrier)
select {
case <-done:
// success
case <-time.After(2 * time.Second):
t.Fatal("timed out — guardrails did not run in parallel")
}
}
🤖 Prompt for AI Agents
In `@internal/guardrails/pipeline_test.go` around lines 109 - 149, The test
TestPipeline_SameOrder_RunInParallel currently spin-waits on started.Load() with
no timeout and can hang; modify the wait after launching ProcessChat to use a
bounded wait (e.g., a select polling loop or select on a time.After) so that if
started does not reach 2 within a short timeout the test fails instead of
blocking indefinitely; update the waiting logic that references started and
barrier (and related done) to include a timeout branch (use time.After) and call
t.Fatal or break the test when the timeout elapses so the outer select on done
can still run.

Comment on lines +151 to +197
func TestPipeline_MixedOrders_GroupsExecuteCorrectly(t *testing.T) {
p := NewPipeline()

var trace []string

// Order 0, guardrail A
p.Add(&mockGuardrail{
name: "A",
chatFn: func(_ context.Context, req *core.ChatRequest) (*core.ChatRequest, error) {
trace = append(trace, "A")
return req, nil
},
}, 0)

// Order 1, guardrail B (runs after group 0 completes)
p.Add(&mockGuardrail{
name: "B",
chatFn: func(_ context.Context, req *core.ChatRequest) (*core.ChatRequest, error) {
trace = append(trace, "B")
return req, nil
},
}, 1)

// Order 0, guardrail C (parallel with A)
p.Add(&mockGuardrail{
name: "C",
chatFn: func(_ context.Context, req *core.ChatRequest) (*core.ChatRequest, error) {
// Note: this writes to trace inside a parallel group, so we can't assert
// exact ordering of A and C. But B must come after both.
return req, nil
},
}, 0)

req := &core.ChatRequest{Model: "gpt-4"}
_, err := p.ProcessChat(context.Background(), req)
if err != nil {
t.Fatal(err)
}

// B must be last because it's order 1
if len(trace) < 2 {
t.Fatalf("expected at least 2 trace entries, got %d", len(trace))
}
if trace[len(trace)-1] != "B" {
t.Errorf("expected B to run last (order 1), got trace: %v", trace)
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Race-prone pattern: trace slice shared across a parallel group.

Guardrail C intentionally avoids appending to trace (good!), but guardrail A does append at Line 160 inside the parallel group. This works today because only one writer exists, but it's fragile — anyone adding a trace write to C would introduce a data race not caught by the compiler. Consider using an atomic or channel-based approach, or adding a comment warning future maintainers.

🤖 Prompt for AI Agents
In `@internal/guardrails/pipeline_test.go` around lines 151 - 197, The test
TestPipeline_MixedOrders_GroupsExecuteCorrectly uses a shared slice variable
trace that is appended to from a guardrail running in a parallel group
(mockGuardrail named "A"), which is race-prone; change the test to avoid
unsynchronized concurrent writes by protecting trace with a sync.Mutex
(lock/unlock around each append) or by switching to a channel-based collector
(have guardrails send their name into a buffered channel and drain it after
p.ProcessChat), and update the mockGuardrail closures (A and any future parallel
guardrails) to use the chosen synchronized mechanism; alternatively, add a clear
comment on the trace variable that concurrent writes must be synchronized to
prevent future races.

Comment on lines +130 to +189
// runChatGroupParallel runs all guardrails in a group concurrently on the same input.
// If any returns an error, the group fails. Modifications are applied
// in registration order (slice order) after all complete.
func runChatGroupParallel(ctx context.Context, group []entry, req *core.ChatRequest) (*core.ChatRequest, error) {
results := make([]chatResult, len(group))
var wg sync.WaitGroup

for i, e := range group {
wg.Add(1)
go func(idx int, g Guardrail) {
defer wg.Done()
modified, err := g.ProcessChat(ctx, req)
results[idx] = chatResult{req: modified, err: err}
}(i, e.guardrail)
}

wg.Wait()

// Check for errors and take last successful modification (registration order)
current := req
for i, r := range results {
if r.err != nil {
return nil, fmt.Errorf("guardrail %q: %w", group[i].guardrail.Name(), r.err)
}
current = r.req
}
return current, nil
}

// responsesResult holds the result of a parallel guardrail execution for responses.
type responsesResult struct {
req *core.ResponsesRequest
err error
}

// runResponsesGroupParallel runs all guardrails in a group concurrently on the same input.
func runResponsesGroupParallel(ctx context.Context, group []entry, req *core.ResponsesRequest) (*core.ResponsesRequest, error) {
results := make([]responsesResult, len(group))
var wg sync.WaitGroup

for i, e := range group {
wg.Add(1)
go func(idx int, g Guardrail) {
defer wg.Done()
modified, err := g.ProcessResponses(ctx, req)
results[idx] = responsesResult{req: modified, err: err}
}(i, e.guardrail)
}

wg.Wait()

current := req
for i, r := range results {
if r.err != nil {
return nil, fmt.Errorf("guardrail %q: %w", group[i].guardrail.Name(), r.err)
}
current = r.req
}
return current, nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Duplication between runChatGroupParallel and runResponsesGroupParallel.

These two functions follow an identical pattern (launch goroutines → wait → iterate results → return last or first error). With Go generics, this could be consolidated into a single helper, reducing the maintenance surface as more request types are added.

This is optional — the current approach is readable and there are only two variants.

🤖 Prompt for AI Agents
In `@internal/guardrails/pipeline.go` around lines 130 - 189, Both
runChatGroupParallel and runResponsesGroupParallel duplicate the same
concurrency pattern; consolidate into a single generic helper that launches
goroutines, waits, checks errors, and returns the last successful modification.
Create a generic function (e.g., runGroupParallel[T any]) that accepts the group
[]entry, initial request *T, and a processor func(ctx context.Context, g
Guardrail, req *T) (*T, error) (used to call Guardrail.ProcessChat or
Guardrail.ProcessResponses), allocate a results slice of the generic result
type, run the goroutines filling results, wait, then iterate results to return
the first error or last successful modified *T; replace runChatGroupParallel and
runResponsesGroupParallel to call this helper and remove
chatResult/responsesResult duplication.

Comment on lines +62 to +63
g, _ := NewSystemPromptGuardrail(SystemPromptInject, "guardrail system")
pipeline.Add(g, 0)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider checking errors from NewSystemPromptGuardrail in tests.

The error return from NewSystemPromptGuardrail is discarded with _ in multiple tests (lines 62, 93, 121, 142). While the inputs are known-valid, checking the error avoids silently broken test setups if the constructor contract changes.

Example fix
-	g, _ := NewSystemPromptGuardrail(SystemPromptInject, "guardrail system")
+	g, err := NewSystemPromptGuardrail(SystemPromptInject, "guardrail system")
+	if err != nil {
+		t.Fatal(err)
+	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
g, _ := NewSystemPromptGuardrail(SystemPromptInject, "guardrail system")
pipeline.Add(g, 0)
g, err := NewSystemPromptGuardrail(SystemPromptInject, "guardrail system")
if err != nil {
t.Fatal(err)
}
pipeline.Add(g, 0)
🤖 Prompt for AI Agents
In `@internal/guardrails/provider_test.go` around lines 62 - 63, Tests call
NewSystemPromptGuardrail and discard its error return (e.g., g, _ :=
NewSystemPromptGuardrail(SystemPromptInject, "guardrail system")); update each
test to capture the error (g, err := NewSystemPromptGuardrail(...)) and assert
it’s nil before proceeding — use the test helper available in the package (e.g.,
require.NoError(t, err)) or a simple if err != nil {
t.Fatalf("NewSystemPromptGuardrail failed: %v", err) } so test setup fails
loudly if the constructor starts returning errors; apply the same change for
every occurrence (lines referencing NewSystemPromptGuardrail,
SystemPromptInject, pipeline.Add).

Replace the single-struct guardrail config with a list of named rule
instances. Each rule has a name, type, order, and type-specific settings.
Multiple instances of the same type (e.g. two system_prompt guardrails
with different content) are fully supported.

SystemPromptGuardrail now takes a name parameter so each instance is
independently identifiable in logs and error messages.

Config example:
  guardrails:
    enabled: true
    rules:
      - name: "safety-prompt"
        type: "system_prompt"
        order: 0
        system_prompt:
          mode: "decorator"
          content: "Always be safe."
      - name: "compliance-prompt"
        type: "system_prompt"
        order: 0
        system_prompt:
          mode: "inject"
          content: "Follow compliance rules."

https://claude.ai/code/session_011U6sHbthHcwY68FZhAEFVH
Go strings natively support unicode, so guardrail names already accept
spaces, Cyrillic, CJK, accented Latin, emoji, etc. Added explicit test
coverage and config examples demonstrating this.

https://claude.ai/code/session_011U6sHbthHcwY68FZhAEFVH
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@internal/guardrails/system_prompt_test.go`:
- Around line 46-48: Replace discarded constructor errors for
NewSystemPromptGuardrail in the tests with a helper that fails fast: add a
mustNewSystemPromptGuardrail(t, name, mode, content) test helper which calls
NewSystemPromptGuardrail, checks err, calls t.Fatalf with a descriptive message
on error, and returns the guardrail; then replace all usages that currently do
"g, _ := NewSystemPromptGuardrail(...)" (e.g., in
TestSystemPrompt_Inject_NoExistingSystem and the other listed tests) with calls
to mustNewSystemPromptGuardrail so tests fail with a clear error instead of
panicking on nil.
- Around line 24-34: Refactor the table-driven tests
TestNewSystemPromptGuardrail_ValidModes and
TestNewSystemPromptGuardrail_UnicodeNames to use t.Run for each case: iterate
modes/names and call t.Run with a descriptive subtest name (e.g.,
fmt.Sprintf("%v", mode) or the Unicode string), then inside the subtest call
NewSystemPromptGuardrail and assert results (use t.Fatalf/t.Fatalf-variant for
setup errors and t.Errorf for assertions) referencing the existing symbols
SystemPromptMode, NewSystemPromptGuardrail and g.Name() so each case runs as an
isolated subtest and can be filtered/run individually.

Comment on lines +24 to +34
func TestNewSystemPromptGuardrail_ValidModes(t *testing.T) {
for _, mode := range []SystemPromptMode{SystemPromptInject, SystemPromptOverride, SystemPromptDecorator} {
g, err := NewSystemPromptGuardrail("my-guardrail", mode, "test")
if err != nil {
t.Fatalf("unexpected error for mode %q: %v", mode, err)
}
if g.Name() != "my-guardrail" {
t.Errorf("expected name 'my-guardrail', got %q", g.Name())
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Use t.Run for table-driven subtests.

Both TestNewSystemPromptGuardrail_ValidModes and TestNewSystemPromptGuardrail_UnicodeNames iterate over cases but don't use t.Run. Without subtests, a failure only shows the loop index context you manually include, you can't run/filter individual cases with -run, and the first t.Fatalf in ValidModes aborts the remaining modes.

Example for UnicodeNames
 	for _, tc := range tests {
+		t.Run(tc.name, func(t *testing.T) {
 		g, err := NewSystemPromptGuardrail(tc.name, SystemPromptInject, "content")
 		if err != nil {
-			t.Errorf("unexpected error for name %q: %v", tc.name, err)
-			continue
+			t.Fatalf("unexpected error: %v", err)
 		}
 		if g.Name() != tc.name {
 			t.Errorf("expected name %q, got %q", tc.name, g.Name())
 		}
+		})
 	}

Also applies to: 322-343

🤖 Prompt for AI Agents
In `@internal/guardrails/system_prompt_test.go` around lines 24 - 34, Refactor the
table-driven tests TestNewSystemPromptGuardrail_ValidModes and
TestNewSystemPromptGuardrail_UnicodeNames to use t.Run for each case: iterate
modes/names and call t.Run with a descriptive subtest name (e.g.,
fmt.Sprintf("%v", mode) or the Unicode string), then inside the subtest call
NewSystemPromptGuardrail and assert results (use t.Fatalf/t.Fatalf-variant for
setup errors and t.Errorf for assertions) referencing the existing symbols
SystemPromptMode, NewSystemPromptGuardrail and g.Name() so each case runs as an
isolated subtest and can be filtered/run individually.

Comment on lines +46 to +48
func TestSystemPrompt_Inject_NoExistingSystem(t *testing.T) {
g, _ := NewSystemPromptGuardrail("test", SystemPromptInject, "injected system prompt")
req := &core.ChatRequest{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Discarded constructor errors will cause nil-pointer panics instead of clear failures.

Throughout the file, NewSystemPromptGuardrail errors are silently discarded with _. If a regression causes the constructor to return an error, g will be nil and the test panics with an opaque nil-dereference instead of a clear diagnostic. Consider a small helper:

Proposed fix

Add a test helper at the top of the file and use it in place of the bare _ discard:

func mustNewSystemPromptGuardrail(t *testing.T, name string, mode SystemPromptMode, content string) *SystemPromptGuardrail {
	t.Helper()
	g, err := NewSystemPromptGuardrail(name, mode, content)
	if err != nil {
		t.Fatalf("NewSystemPromptGuardrail(%q, %q, %q): unexpected error: %v", name, mode, content, err)
	}
	return g
}

Then replace e.g.:

-	g, _ := NewSystemPromptGuardrail("test", SystemPromptInject, "injected system prompt")
+	g := mustNewSystemPromptGuardrail(t, "test", SystemPromptInject, "injected system prompt")

Also applies to: 71-72, 94-95, 116-117, 144-145, 166-167, 190-191, 214-215, 240-241, 253-254, 266-267, 279-280, 293-294, 306-307, 345-346

🤖 Prompt for AI Agents
In `@internal/guardrails/system_prompt_test.go` around lines 46 - 48, Replace
discarded constructor errors for NewSystemPromptGuardrail in the tests with a
helper that fails fast: add a mustNewSystemPromptGuardrail(t, name, mode,
content) test helper which calls NewSystemPromptGuardrail, checks err, calls
t.Fatalf with a descriptive message on error, and returns the guardrail; then
replace all usages that currently do "g, _ := NewSystemPromptGuardrail(...)"
(e.g., in TestSystemPrompt_Inject_NoExistingSystem and the other listed tests)
with calls to mustNewSystemPromptGuardrail so tests fail with a clear error
instead of panicking on nil.

Guardrails now operate on a normalized []Message instead of concrete
ChatRequest/ResponsesRequest types. Adapters in GuardedProvider handle
conversion between API-specific requests and the message DTO.

This eliminates the N×M coupling (N guardrails × M endpoint types):
- Guardrail interface: single Process(ctx, []Message) method
- Pipeline: single Process method (was duplicated for Chat/Responses)
- SystemPromptGuardrail: one implementation (was two parallel paths)
- New endpoint types only need an adapter pair, no guardrail changes

https://claude.ai/code/session_011U6sHbthHcwY68FZhAEFVH
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@internal/guardrails/provider.go`:
- Around line 132-147: The current applyMessagesToResponses function
concatenates all system Message entries into a single Instructions string which
can be surprising given Instructions is a single-field semantic; change
applyMessagesToResponses (working on *core.ResponsesRequest) to take only the
first system message instead of joining multiple ones: iterate msgs, when you
find the first Message with Role == "system" set result.Instructions to that
Message.Content and break out of the loop (do not append others), and update any
related comment/tests to reflect the single-message behavior; ensure you
reference applyMessagesToResponses, core.ResponsesRequest, and the Instructions
field when making the change.

Comment on lines +132 to +147
// applyMessagesToResponses returns a shallow copy of req with system messages
// applied back to the Instructions field.
func applyMessagesToResponses(req *core.ResponsesRequest, msgs []Message) *core.ResponsesRequest {
result := *req
var instructions string
for _, m := range msgs {
if m.Role == "system" {
if instructions != "" {
instructions += "\n"
}
instructions += m.Content
}
}
result.Instructions = instructions
return &result
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Multiple system messages are concatenated into a single Instructions string.

applyMessagesToResponses joins all system messages with \n into the Instructions field (Lines 137-144). If a guardrail produces multiple system messages (e.g., inject adds one while an existing one is preserved), they get concatenated. This may be surprising — consider documenting this behavior or taking only the first system message to match the single-field semantics of Instructions.

🤖 Prompt for AI Agents
In `@internal/guardrails/provider.go` around lines 132 - 147, The current
applyMessagesToResponses function concatenates all system Message entries into a
single Instructions string which can be surprising given Instructions is a
single-field semantic; change applyMessagesToResponses (working on
*core.ResponsesRequest) to take only the first system message instead of joining
multiple ones: iterate msgs, when you find the first Message with Role ==
"system" set result.Instructions to that Message.Content and break out of the
loop (do not append others), and update any related comment/tests to reflect the
single-message behavior; ensure you reference applyMessagesToResponses,
core.ResponsesRequest, and the Instructions field when making the change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants