Agentic Design System

Agentic Design System is a set of installable skills, markdown templates, checks, and examples for coding agents that build UI.

It is for Claude Code, Codex, OpenClaw, Hermes, Cursor, and similar agent shells. The point is simple: before an agent declares a screen done, it should define the user-facing intent, read the project baseline, judge the artifact against a task-specific rubric, attach evidence, and revise when the result misses.

This is repo-local control flow, not a hosted design agent. Some parts are runnable skills and scripts. Some parts are templates the agent fills in. Some parts are dogfooded patterns that make the work inspectable until they become tighter automation.

Status: early public package. The skills and templates are usable now; the grader loop is still workflow-driven, not a hosted service.

How it works

Intent -> baseline -> rubric -> build with evidence -> grade and revise.

Step	What ADS gives the agent	Status
1. Define intent / outcome	`templates/outcome-template.md`: user, situation, accomplish, notice, operational state, stop condition	template
2. Capture background / baseline	Project Knowledge Intake, `DESIGN.md`-shaped project identity, presets, references, routes, screenshots, prior decisions	skill + template
3. Write the review lens	`templates/grader-report-template.md`: fixed quality rows plus task-specific criteria	template + pattern
4. Build with evidence	Routed skills, deterministic checks, changed files, screenshots or preview, known risks, and run report	skill + script + template
5. Grade and revise	Separate grader context when available, returning `satisfied`, `needs_revision`, `max_iterations`, or `failed`	template + pattern

That loop is the product. Presets, checks, and archived fixtures are support machinery.

Workflows

If you just want to use ADS, start at workflows/create-design-workflow.md. It's the default "help me design or review something" entrypoint and routes you into the right path by intent. Each runbook in workflows/ is decision-shaped — when to use it, what to read, what to run, what evidence is required, what report to produce, and when to stop.

Workflow	Use it to
create-design-workflow	route any design/review task to the right profile (start here)
mobile-review	review a mobile/responsive/app/PWA screen (judgment vs defects, two passes)
adversarial-design-review	critique finished UI from a context that didn't build it
install-usability-smoke	verify ADS installs and the bundled skills/templates are present
readme-docs-critique	judge whether the docs let a newcomer onboard
cold-agent-usage-test	test whether a brand-new agent can use ADS unaided

Install

Most exact path from the version you are reviewing:

git clone https://github.com/aa-on-ai/agentic-design-system.git
cd agentic-design-system
npx skills add . --yes

If you trust the repository default branch and want the shorthand:

npx skills add aa-on-ai/agentic-design-system --yes

Both paths assume a skills-compatible CLI. If your agent tool does not support npx skills, use the no-CLI install below and copy the repo's skills/ directory into the location your agent reads. The full repo also includes presets, templates, integration docs, smoke tests, and archived early eval fixtures for reference.

Default path for a project:

Paste templates/agents-snippet.md into the agent instruction file for your tool.
Pick or create a baseline for the project.
For substantial UI work, start from templates/outcome-template.md and grade with templates/grader-report-template.md.
Prompt normally, then require the report/evidence before accepting the work.

Day one file to paste: templates/agents-snippet.md.

Choose a baseline

The agent needs something to judge against. Use the lightest baseline that fits the task.

Baseline	Use when	Artifact
Existing project context	The repo already has design docs, components, tokens, screenshots, or prior decisions	Agent reads the source files directly
Project Knowledge Intake	The project needs shared taste/context before UI work	`templates/project-identity-template.md` or `DESIGN.md`
Reference Intake	A screenshot, site, CodePen, "make it feel like...", or prior miss matters	`templates/reference-intake-contract.md`

No project context yet? See presets/ for utilitarian, dashboard, or editorial starters. Replace them with real project context as soon as you have it.

Integration paths

First-class docs:

Tool	Put the instructions here	Notes
Claude Code	`CLAUDE.md` or `AGENTS.md`	Paste the snippet, add a baseline, prompt normally.
Codex CLI	`AGENTS.md` or `codex.md`	Large context helps with full-chain review and reference comparisons.
Cursor	Rules or agent instructions	Keep the skills readable and paste the snippet.

Local/experimental docs also exist for OpenClaw and Hermes. They follow the same snippet/gate pattern, but they depend more on the local agent runtime.

What is still a pattern

Custom rubric generation is template-driven. The agent fills in task-specific criteria from the outcome and baseline.
A separate grader is recommended when the host workflow supports it. ADS does not yet run a hosted grader service.
Screenshot review depends on the project and agent environment. The templates require evidence; the runner is still your toolchain.
The system raises the floor and makes misses inspectable. It does not replace taste or product judgment.

Why this works

Agents are better at checking UI against explicit criteria than spontaneously holding every design constraint in mind while generating. ADS exploits that asymmetry, but starts with context instead of generic polish.

Define intent, gather the baseline, calibrate references when needed, build with routed skills, attach evidence, then grade the result. The report is the difference between "my agent got better" and "here is what changed, what passed, and what still needs human judgment."

No-CLI install

git clone https://github.com/aa-on-ai/agentic-design-system.git

# Global, if your agent reads shared skills
cp -r agentic-design-system/skills ~/.claude/skills/

# Or project-level
cp -r agentic-design-system/skills your-project/skills/

Verify the package

testing/install-smoke.sh

The smoke test installs from the local repo into a temporary project and verifies all 9 skills, the bundled outcome/grader templates, and the workflow runbooks bundled under the orchestrator skill (and that those runbooks stay byte-identical to the canonical top-level workflows/). Success ends with install smoke passed: 9 skills, bundled outcome/grader templates, and 6 workflow runbooks (in sync).

A worked example

docs/loop-demo/ is a real run of the executable loop (workflows/new-page-component.mjs) on an Orders screen gated at 390 / 768 / 1280px (preserved as a sample — generated evidence/ is otherwise ignored and regenerated on demand). It took three passes to converge: iter1 failed with 12 axe violations and 114 sub-44px touch targets measured from the rendered DOM; iter2 cleared most of the touch targets (114 → 12); iter3 closed the rest and every axe violation (→ 0), and only then did the independent grader return satisfied. The verdict rests on rendered evidence, not source — which is exactly why a comment cannot satisfy it, and why the same loop returns failed rather than ship a screen that still misses a hard gate.

Limitations

Depends on agents actually following skill instructions. Works best with frontier models.
Verification scripts catch structural issues, not aesthetic ones. Reference Intake and screenshot review are the craft layer.
Creative passes can over-steer utilitarian UI. That is why they are opt-in.
Separate grader context is a workflow recommendation, not a hosted service.
Custom rubric generation is not fully automated yet.

Influences and prior art

Intent Engineering - intent before output: user, situation, accomplish, notice, operational state
Anthropic Managed Agents: Define outcomes - outcome, rubric, separate grader, iteration loop
Agentic Rubrics as Contextual Verifiers for SWE Agents - repository-grounded rubrics and verifier signals for agent patches
Karpathy autoresearch - fixed runtime, single editable target, metric, experiment log, and repeated agent loop; this is the closest reference for how ADS defines its loop structure
DESIGN.md - structured project identity for agents
make-interfaces-feel-better - micro-detail heuristics

Contributing

If you find a recurring anti-pattern, a better routing rule, or a skill that should exist, open a PR.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
.claude-plugin		.claude-plugin
ci		ci
demos		demos
docs		docs
integrations		integrations
presets		presets
routing		routing
schemas		schemas
skills		skills
templates		templates
testing		testing
workflows		workflows
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
PHILOSOPHY.md		PHILOSOPHY.md
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agentic Design System

How it works

Workflows

Install

Choose a baseline

Integration paths

What is still a pattern

Why this works

No-CLI install

Verify the package

A worked example

Limitations

Influences and prior art

Further reading

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agentic Design System

How it works

Workflows

Install

Choose a baseline

Integration paths

What is still a pattern

Why this works

No-CLI install

Verify the package

A worked example

Limitations

Influences and prior art

Further reading

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages