research(testing): ATA-style meta-agent harness for adversarial behavioral test generation

## Source

Agent-Testing Agent (ATA): Meta-Agent for Adversarial Behavioral Testing
https://arxiv.org/abs/2508.17393 — August 2025

## Summary

ATA is a meta-agent that combines static analysis, designer interrogation, and persona-driven adversarial test generation with adaptive difficulty controlled by an LLM-as-judge scoring rubric. It generates behavioral test cases for conversational agents rather than relying on hand-written scenarios.

## Applicability to Zeph

**HIGH.** Zeph's continuous improvement protocol (`.claude/rules/continuous-improvement.md`) explicitly requires live agent testing but currently relies entirely on manual scenario crafting. The gap between CI unit tests and real behavioral testing is the #1 bottleneck in the CI cycle.

## Proposed integration

Build an ATA-style harness on top of `AgentTestHarness` (already in the codebase from ARCH-08):

1. **Catalog introspection**: load Zeph's skill registry + tool definitions to seed scenario generation
2. **Scenario generation**: use a separate LLM (e.g., `summary_model`) to generate adversarial prompts targeting:
   - Memory recall boundary conditions (just-expired memories, conflicting facts)
   - Tool invocation edge cases (large output → overflow, permission denial, tool chaining)
   - Skill matching precision (ambiguous queries that should/shouldn't match)
   - Security injection attempts (prompt injection in tool results, web scrape content)
3. **Adaptive difficulty**: an LLM judge scores agent responses; scenarios that score high are escalated with harder variants
4. **Output**: structured test cases in `regressions.md` format with expected behavior labels

## Location

- New binary or subcommand: `zeph test-gen` (or `--test-gen`)
- Stores generated scenarios in `.local/testing/playbooks/generated/`
- Integrates with `AgentTestHarness` for execution and response capture

## Related

- ARCH-08 (`AgentTestHarness`) — execution substrate
- Promptfoo integration (#1523) — ATA scenarios could feed Promptfoo red-team suite
- `regressions.md` — generated adversarial prompts extend the regression catalog

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

research(testing): ATA-style meta-agent harness for adversarial behavioral test generation #1823

Source

Summary

Applicability to Zeph

Proposed integration

Location

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

research(testing): ATA-style meta-agent harness for adversarial behavioral test generation #1823

Description

Source

Summary

Applicability to Zeph

Proposed integration

Location

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions