Agentic Design System is a set of installable skills, markdown templates, checks, and examples for coding agents that build UI.
It is for Claude Code, Codex, OpenClaw, Hermes, Cursor, and similar agent shells. The point is simple: before an agent declares a screen done, it should define the user-facing intent, read the project baseline, judge the artifact against a task-specific rubric, attach evidence, and revise when the result misses.
This is repo-local control flow, not a hosted design agent. Some parts are runnable skills and scripts. Some parts are templates the agent fills in. Some parts are dogfooded patterns that make the work inspectable until they become tighter automation.
Status: early public package. The skills and templates are usable now; the grader loop is still workflow-driven, not a hosted service.
Intent -> baseline -> rubric -> build with evidence -> grade and revise.
| Step | What ADS gives the agent | Status |
|---|---|---|
| 1. Define intent / outcome | templates/outcome-template.md: user, situation, accomplish, notice, operational state, stop condition |
template |
| 2. Capture background / baseline | Project Knowledge Intake, DESIGN.md-shaped project identity, presets, references, routes, screenshots, prior decisions |
skill + template |
| 3. Write the review lens | templates/grader-report-template.md: fixed quality rows plus task-specific criteria |
template + pattern |
| 4. Build with evidence | Routed skills, deterministic checks, changed files, screenshots or preview, known risks, and run report | skill + script + template |
| 5. Grade and revise | Separate grader context when available, returning satisfied, needs_revision, max_iterations, or failed |
template + pattern |
That loop is the product. Presets, checks, and archived fixtures are support machinery.
If you just want to use ADS, start at workflows/create-design-workflow.md.
It's the default "help me design or review something" entrypoint and routes you into the right
path by intent. Each runbook in workflows/ is decision-shaped — when to use it,
what to read, what to run, what evidence is required, what report to produce, and when to stop.
| Workflow | Use it to |
|---|---|
| create-design-workflow | route any design/review task to the right profile (start here) |
| mobile-review | review a mobile/responsive/app/PWA screen (judgment vs defects, two passes) |
| adversarial-design-review | critique finished UI from a context that didn't build it |
| install-usability-smoke | verify ADS installs and the bundled skills/templates are present |
| readme-docs-critique | judge whether the docs let a newcomer onboard |
| cold-agent-usage-test | test whether a brand-new agent can use ADS unaided |
Most exact path from the version you are reviewing:
git clone https://github.com/aa-on-ai/agentic-design-system.git
cd agentic-design-system
npx skills add . --yesIf you trust the repository default branch and want the shorthand:
npx skills add aa-on-ai/agentic-design-system --yesBoth paths assume a skills-compatible CLI. If your agent tool does not support npx skills, use the no-CLI install below and copy the repo's skills/ directory into the location your agent reads. The full repo also includes presets, templates, integration docs, smoke tests, and archived early eval fixtures for reference.
Default path for a project:
- Paste
templates/agents-snippet.mdinto the agent instruction file for your tool. - Pick or create a baseline for the project.
- For substantial UI work, start from
templates/outcome-template.mdand grade withtemplates/grader-report-template.md. - Prompt normally, then require the report/evidence before accepting the work.
Day one file to paste: templates/agents-snippet.md.
The agent needs something to judge against. Use the lightest baseline that fits the task.
| Baseline | Use when | Artifact |
|---|---|---|
| Existing project context | The repo already has design docs, components, tokens, screenshots, or prior decisions | Agent reads the source files directly |
| Project Knowledge Intake | The project needs shared taste/context before UI work | templates/project-identity-template.md or DESIGN.md |
| Reference Intake | A screenshot, site, CodePen, "make it feel like...", or prior miss matters | templates/reference-intake-contract.md |
No project context yet? See presets/ for utilitarian, dashboard, or editorial starters. Replace them with real project context as soon as you have it.
First-class docs:
| Tool | Put the instructions here | Notes |
|---|---|---|
| Claude Code | CLAUDE.md or AGENTS.md |
Paste the snippet, add a baseline, prompt normally. |
| Codex CLI | AGENTS.md or codex.md |
Large context helps with full-chain review and reference comparisons. |
| Cursor | Rules or agent instructions | Keep the skills readable and paste the snippet. |
Local/experimental docs also exist for OpenClaw and Hermes. They follow the same snippet/gate pattern, but they depend more on the local agent runtime.
- Custom rubric generation is template-driven. The agent fills in task-specific criteria from the outcome and baseline.
- A separate grader is recommended when the host workflow supports it. ADS does not yet run a hosted grader service.
- Screenshot review depends on the project and agent environment. The templates require evidence; the runner is still your toolchain.
- The system raises the floor and makes misses inspectable. It does not replace taste or product judgment.
Agents are better at checking UI against explicit criteria than spontaneously holding every design constraint in mind while generating. ADS exploits that asymmetry, but starts with context instead of generic polish.
Define intent, gather the baseline, calibrate references when needed, build with routed skills, attach evidence, then grade the result. The report is the difference between "my agent got better" and "here is what changed, what passed, and what still needs human judgment."
git clone https://github.com/aa-on-ai/agentic-design-system.git
# Global, if your agent reads shared skills
cp -r agentic-design-system/skills ~/.claude/skills/
# Or project-level
cp -r agentic-design-system/skills your-project/skills/testing/install-smoke.shThe smoke test installs from the local repo into a temporary project and verifies all 9 skills, the bundled outcome/grader templates, and the workflow runbooks bundled under the orchestrator skill (and that those runbooks stay byte-identical to the canonical top-level workflows/). Success ends with install smoke passed: 9 skills, bundled outcome/grader templates, and 6 workflow runbooks (in sync).
docs/loop-demo/ is a real run of the executable loop (workflows/new-page-component.mjs) on an Orders screen gated at 390 / 768 / 1280px (preserved as a sample — generated evidence/ is otherwise ignored and regenerated on demand). It took three passes to converge: iter1 failed with 12 axe violations and 114 sub-44px touch targets measured from the rendered DOM; iter2 cleared most of the touch targets (114 → 12); iter3 closed the rest and every axe violation (→ 0), and only then did the independent grader return satisfied. The verdict rests on rendered evidence, not source — which is exactly why a comment cannot satisfy it, and why the same loop returns failed rather than ship a screen that still misses a hard gate.
- Depends on agents actually following skill instructions. Works best with frontier models.
- Verification scripts catch structural issues, not aesthetic ones. Reference Intake and screenshot review are the craft layer.
- Creative passes can over-steer utilitarian UI. That is why they are opt-in.
- Separate grader context is a workflow recommendation, not a hosted service.
- Custom rubric generation is not fully automated yet.
- Intent Engineering - intent before output: user, situation, accomplish, notice, operational state
- Anthropic Managed Agents: Define outcomes - outcome, rubric, separate grader, iteration loop
- Agentic Rubrics as Contextual Verifiers for SWE Agents - repository-grounded rubrics and verifier signals for agent patches
- Karpathy autoresearch - fixed runtime, single editable target, metric, experiment log, and repeated agent loop; this is the closest reference for how ADS defines its loop structure
- DESIGN.md - structured project identity for agents
- make-interfaces-feel-better - micro-detail heuristics
- PHILOSOPHY.md - design philosophy behind the system
- docs/influences.md - what ADS borrows from each source and what it does not copy
- docs/archive/ - pre-spine eval fixtures and earlier control-plane notes, kept for provenance
If you find a recurring anti-pattern, a better routing rule, or a skill that should exist, open a PR.