Engineering rigor for AI assistants

Ship with confidence.

17 skills that turn AI coding assistants into careful senior engineers — catching bugs, validating changes, and maintaining code health with concrete, repo-grounded evidence.

Get Started Explore Skills

Works with

Claude
Codex
Cursor
Gemini CLI
GitHub Copilot
OpenClaw

swe:pr-risk-review #247

$ swe:pr-risk-review #247

Scope 12 files changed across api/ and lib/

Risk Missing validation on new endpoint

api/users.ts:47

Risk Schema migration has no rollback

db/migrate/20260401.sql

Action Add input validation + write rollback

Action Add integration test for POST /users

Evidence-led Concrete file paths, not vague advice

Actionable Smallest next step, not a rewrite

The upgrade

From vague suggestions to evidence-backed action.

Before

Generic AI Output

Looks fine to me, maybe add some tests.
Consider refactoring this function.
You might want to check for edge cases.
This could have performance implications.
Make sure to update the docs.

After

With swe-skills

Missing null check at lib/auth.ts:34 — will crash on expired tokens.
Migration 20260401.sql adds NOT NULL with no default — will fail on existing rows.
Bus factor 1 on billing/ — only contributor inactive 4 months.
p99 latency +180ms since PR #231 merged — N+1 query in fetchOrders.
README install section references removed --legacy flag.

Concrete. Grounded. Actionable.

The framework

17 skills across the full engineering lifecycle.

Understand

2 skills

Map unfamiliar repos and capture implicit knowledge before making changes.

Validate

4 skills

Review PRs for risk, shepherd live review cycles, hunt recent regressions, and plan validation paths.

Maintain

7 skills

Monitor deploys, audit dependencies, find observability gaps, track ownership risk.

Improve

4 skills

Close test gaps, set repo-local defaults, find refactor opportunities, and build new skills.

The skill set

Every skill, with example prompts and output.

Understand

Start here

`swe:repo-introspection`

Map unfamiliar codebases before editing — structure, tooling, entry points, boundaries, and safe places to start.

Example prompt

Help me understand this repo before I start working on it.

Sample output

Structure Monorepo — 3 services + shared lib

Entry api/src/server.ts, worker/index.ts

Tests Jest (unit), Playwright (e2e), 94% passing

CI GitHub Actions — lint, test, deploy on merge

Safe start api/src/routes/ — well-tested, active area

Caution worker/ — no tests, single contributor

`swe:capture-knowledge`

Convert implicit repo patterns into explicit agent-facing guidance.

Example prompt

What conventions should agents know before working here?

Sample output

Missing No AGENTS.md — 3 implicit conventions found

Convention All API routes use zod validation middleware

Convention DB migrations must have matching rollback

Convention Feature flags via LaunchDarkly, not env vars

Draft 3 entries ready for review → approve to write

Validate

`swe:pr-risk-review`

Review open PRs for engineering risk before merge — missing validation, hidden coupling, rollout gaps.

Example prompt

Review PR #247 for risk before I merge it.

Sample output

Scope 12 files across api/ and lib/

Risk No input validation on POST /users (api/users.ts:47)

Risk Migration adds NOT NULL, no default — will fail

Note Feature flag present but no rollback documented

Action Add zod schema + write rollback migration

`swe:babysit-pr`

Watch one PR in a live loop and handle comments, review requests, scores, and CI until it is ready to merge.

Example prompt

Babysit PR #247 until reviewers are happy and CI is green.

Sample output

Loop Polling PR #247 every 60s

Ack Reacted to new review thread with eyes

Fix Pushed follow-up for failing auth test + reviewer note

Score 4/5 → 5/5 after second revision

Ready CI green, no unresolved feedback, safe to merge

`swe:change-validation-planner`

Plan the narrowest trustworthy validation path for a scoped code change.

Example prompt

What's the minimum I should run to validate this diff?

Sample output

Changed api/users.ts, lib/validate.ts, 1 migration

Step 1 npm test -- users.test.ts (proves route logic)

Step 2 npm run lint (catches import errors)

Step 3 npx migrate:dry-run (proves migration)

Unverified No e2e test covers POST /users flow

`swe:recent-commit-bug-hunt`

Scan recent commits for likely bugs using concrete repo evidence.

Example prompt

Scan the last 3 days of commits for regressions.

Sample output

Strong PR #231 removed null guard at lib/auth.ts:34

Will throw TypeError on expired tokens in prod

Moderate PR #228 changed status enum values

No migration for 12k existing rows with old values

Fix Restore guard + add enum migration script

Maintain

`swe:merged-pr-monitoring`

Check whether merged PRs reached production and compare pre/post-deploy signals.

Example prompt

Did yesterday's merges cause any production issues?

Sample output

PR #244 Deployed 14:32 UTC — no impact observed

PR #245 Deployed 15:01 UTC — p99 latency +180ms

GET /orders: 220ms → 400ms post-deploy

Likely N+1 in fetchOrders (added in this PR)

PR #246 Not yet deployed — queued behind staging

`swe:security-audit`

Audit dependencies for vulnerabilities, outdated packages, and license issues.

Example prompt

Run a security audit across all services.

Sample output

Critical jsonwebtoken@8.5.1 — CVE-2022-23529

api/package.json — upgrade to >=9.0.0

High express@4.17.1 — 3 known vulns, 2 years old

License worker uses AGPL dep (pdf-parse) — review

Clean shared-lib — 0 findings, lockfile present

`swe:observability-gap-hunt`

Find missing logs, metrics, traces, alerts, and dashboards.

Example prompt

What telemetry gaps do we have in the payment service?

Sample output

Blind No error logging in payment/retry.ts

Silent failures on Stripe webhook retries

Blind No latency metric on /checkout endpoint

Weak Alert on error rate but threshold is 10% (too high)

Covered Traces present on charge + refund flows

`swe:ownership-risk-map`

Map bus factor, orphaned files, and ownership gaps from git evidence.

Example prompt

Which areas of this repo have dangerous bus factor?

Sample output

Bus 1 billing/ — sole contributor left 4 months ago

47 files, 0 CODEOWNERS, 12% test coverage

Bus 1 infra/terraform/ — 1 active contributor

Moderate worker/ — 2 contributors, low recent activity

Healthy api/src/routes/ — 6 contributors, active

`swe:performance-hunt`

Find real performance bottlenecks with profiler data and measured evidence.

Example prompt

Why is the /orders endpoint slow? Profile it.

Sample output

Bottleneck N+1 query in fetchOrders — 340ms of 420ms

1 query per line item × avg 18 items = 306 queries

Secondary JSON serialization — 55ms (large payload)

Fix Add eager load: include: { lineItems: true }

Expected p50: 420ms → ~80ms after fix

`swe:docs-drift-audit`

Find documentation that drifted from recent code, config, or interface changes.

Example prompt

Are our docs still accurate after last week's refactor?

Sample output

Stale README install section references --legacy flag

Flag removed in PR #239 — delete from step 3

Stale API docs list /v1/users (renamed to /v2/users)

Missing No runbook for new payment retry flow

Current CONTRIBUTING.md — matches repo conventions

`swe:incident-followup-audit`

Verify post-incident engineering follow-through — tests, monitors, runbooks, tickets.

Example prompt

Did we finish all follow-up from the auth outage?

Sample output

Done Regression test added (auth/token-expiry.test.ts)

Done Alert threshold lowered from 10% to 2% error rate

Missing No runbook for token rotation procedure

Missing JIRA-1847 still open — rate limiter not shipped

Partial Monitoring added but no dashboard link in wiki

Improve

`swe:test-gap-hunt`

Incrementally close the highest-value test coverage gaps.

Example prompt

Find the weakest tests and biggest coverage gaps.

Sample output

Gap billing/charge.ts — 0% coverage, 14 recent PRs

Gap worker/process.ts — 0% coverage, error-prone

Weak auth/login.test.ts — happy path only, no edge cases

Plan 4 tests to add, narrowest first, est. 25 min

Backlog 6 more opportunities ranked for next pass

`swe:init`

Create a small repo-local .ai/swe.json so later swe: skills match how you like to work.

Example prompt

Run swe:init --quick --gitignore for this repo.

Sample output

Wrote .ai/swe.json with quick defaults

Ignored Added .ai/swe.json to .gitignore

Mode quick --gitignore

Saved Only non-default overrides

Rule Explicit user requests still outrank local prefs

`swe:refactor-opportunities`

Find small, parallelizable refactor tickets with clear boundaries.

Example prompt

Give me 5 low-risk refactors I can hand to agents.

Sample output

#1 Extract shared validation into lib/validate.ts

3 routes duplicate the same 40-line schema — low risk

#2 Remove dead feature flag ENABLE_V1_AUTH

Flag always true in prod for 6 months — 4 files

#3 Collapse OrderStatus enum (3 unused values since v2)

Extensible

`swe:create-skill`

Author new swe: skills with matching eval suites. Build your own.

Example prompt

Create a new swe: skill for auditing API contract changes.

Sample output

Created skills/api-contract-audit/SKILL.md

Created evals/api-contract-audit/cases.json (6 cases)

Created evals/api-contract-audit/rubric.md

Triggers "check API contracts", "breaking change review"

Non-goals Runtime testing, load testing, docs generation

Suggested schedules

Run once, or run continuously.

Every PR

swe:pr-risk-review
swe:babysit-pr
swe:change-validation-planner

Catch risk early, then shepherd the PR across the finish line.

Daily

swe:recent-commit-bug-hunt
swe:merged-pr-monitoring
swe:test-gap-hunt
swe:docs-drift-audit
swe:security-audit
swe:observability-gap-hunt
swe:performance-hunt
swe:refactor-opportunities
swe:capture-knowledge
swe:incident-followup-audit

Catch regressions, gaps, and drift while context is fresh.

Weekly

swe:ownership-risk-map
swe:repo-introspection

Structural checks that don't change day-to-day.

Get started

One command. Every major AI harness.

Install the full SWE skills framework, then start with swe:repo-introspection to understand your codebase or swe:pr-risk-review on your next PR.

Works with Claude, Codex, Cursor, Gemini CLI, GitHub Copilot, and OpenClaw-compatible setups
Language and framework agnostic — works on any codebase
Evidence-led: every finding cites files, lines, commits, or metrics
Designed for recurring use, not one-off runs

Install

npx skills install ckorhonen/swe-skills

Installs all 17 skills into your project. Works with any agent that supports the skills standard.

Frequently asked

For engineers moving from curiosity to practice.

Who is this for?

Software engineers, tech leads, and platform teams who want their AI coding assistants to be more rigorous — catching real bugs, validating changes with evidence, and maintaining code health systematically.

Does it work with any language or framework?

Yes. Skills are language-agnostic and adapt to whatever tooling your repo already uses — npm, cargo, pip, bundler, go modules, or anything else. They read your repo, not a config file.

How is this different from linters or CI checks?

Linters check syntax and style. CI runs predefined tests. These skills reason about your code — finding bugs linters miss, validating changes holistically, and producing actionable engineering judgment rather than pass/fail signals.

Can I run these on a schedule?

Absolutely. Skills like swe:recent-commit-bug-hunt, swe:test-gap-hunt, and swe:security-audit are designed to run repeatedly. Output formats support comparison across runs.

Where should I start?

Run swe:repo-introspection to understand your codebase, then try swe:pr-risk-review on your next pull request. Both produce immediate value.