Parent Epic
Part of #257821 — Extend @kbn/evals with advanced evaluation capabilities
Summary
Add a red-team CLI command and supporting modules for adversarial testing of Agent Builder agents. This enables systematic security validation before shipping agent capabilities.
Attack Modules
Each module generates adversarial prompts targeting a specific vulnerability:
- Prompt Injection — Direct, indirect, and multi-turn injection attempts
- Privilege Escalation — Attempts to access tools/data outside authorized scope
- Information Extraction — Tries to get the model to leak system prompts or internal data
- Jailbreaking — Attempts to bypass safety guidelines
Each module:
- Generates N adversarial prompts (optionally via LLM)
- Runs them through the
ExperimentTask
- Evaluates with security evaluators from Phase 1
- Reports findings with severity classification
Guardrail Rules Engine
Configurable pattern-based rules that scan every response:
interface GuardrailRule {
name: string;
pattern: RegExp;
action: 'block' | 'warn' | 'log';
}
CLI Interface
node scripts/evals red-team --suite agent-builder --module prompt-injection
node scripts/evals red-team --suite agent-builder --all
node scripts/evals red-team --suite agent-builder --guardrails-only
Files to Create
kbn-evals/src/red_team/index.ts
kbn-evals/src/red_team/modules/prompt_injection.ts
kbn-evals/src/red_team/modules/privilege_escalation.ts
kbn-evals/src/red_team/modules/info_extraction.ts
kbn-evals/src/red_team/modules/jailbreaking.ts
kbn-evals/src/red_team/guardrails.ts
kbn-evals/src/cli/commands/red_team.ts
Dependencies
- Depends on Phase 1 (#TBD) — security evaluators are used to score attack results
Acceptance Criteria
Parent Epic
Part of #257821 — Extend @kbn/evals with advanced evaluation capabilities
Summary
Add a
red-teamCLI command and supporting modules for adversarial testing of Agent Builder agents. This enables systematic security validation before shipping agent capabilities.Attack Modules
Each module generates adversarial prompts targeting a specific vulnerability:
Each module:
ExperimentTaskGuardrail Rules Engine
Configurable pattern-based rules that scan every response:
CLI Interface
Files to Create
kbn-evals/src/red_team/index.tskbn-evals/src/red_team/modules/prompt_injection.tskbn-evals/src/red_team/modules/privilege_escalation.tskbn-evals/src/red_team/modules/info_extraction.tskbn-evals/src/red_team/modules/jailbreaking.tskbn-evals/src/red_team/guardrails.tskbn-evals/src/cli/commands/red_team.tsDependencies
Acceptance Criteria
node scripts/evals red-teamruns all attack modules against a suite