Introduction

Your AI agent is only as good as your repo. AgentLint finds what's broken — file structure, instruction quality, build setup, session continuity, security posture — and fixes it. 51 deterministic core checks (6 dimensions) + 7 opt-in AI-native analyzers, every one backed by data. Works across Claude Code, Codex, Cursor, Copilot, Gemini CLI, Windsurf, and Cline.

We analyzed 265 versions of Anthropic's Claude Code system prompt, documented the hard limits, audited thousands of real repos, and reviewed the academic research. The result: a single command that tells you exactly what your AI agent is struggling with and why.

Install

npm install -g agentlint-ai

Then in any git repo:

agentlint check

In Claude Code: run /al for the interactive scan-fix-report flow.

Using an AI coding agent? Point it at INSTALL.mdarrow-up-right — it's written to be read once and acted on.

CLI commands

Once installed, you can also use AgentLint from any shell:

agentlint check                                    # diagnose (51 core checks, 6 dimensions)
agentlint fix W11                                 # fix a specific check directly
agentlint setup --lang ts .                       # bootstrap CI/hooks/templates

setup installs the full AI-native stack: 12 CI workflows, git hooks, CLAUDE.md template, plan format, compliance tests. check finds what's broken — file structure, instruction quality, build setup, continuity, security. fix applies the fixes automatically.

GitHub Action

Create .github/workflows/agentlint.yml:

This runs the default local-only core scan and does not fail the build on score thresholds. Add fail-below after the first passing run if you want a hard quality gate.

SARIF integration

To get AgentLint findings in your repo's Security tab and as inline PR annotations, enable SARIF upload:

Note: SARIF upload requires Code scanning enabled (free for public repos, GHAS for private). Inline PR annotations via ::warning commands work on all repos regardless.

Platform requirements

The scanner is a bash script, so the host needs a POSIX shell:

Platform
Requirement

macOS

Works out of the box (system bash).

Linux

Works out of the box. jq and git must be on PATH.

Windows

Requires Git Bash (from Git for Windowsarrow-up-right) or WSL (install guidearrow-up-right). Run npx agentlint-ai from inside the bash shell. Installing from cmd.exe or PowerShell will exit with a message pointing to one of those two options.

Node.js 20+ is required on every platform.

Supported AI coding agents

AgentLint auto-detects the entry file for major AI coding agents. Claude-specific checks skip gracefully for other platforms so they aren't penalized unfairly.

Agent
Entry file
Notes

Claude Code

CLAUDE.md

Full check coverage including F7 @include and C5 CLAUDE.local.md

OpenAI Codex / Agents

AGENTS.md

Core checks apply

Cursor

.cursorrules or .cursor/rules/*.mdc

Core checks apply

GitHub Copilot

.github/copilot-instructions.md

Core checks apply

Google Gemini CLI

GEMINI.md

Core checks apply

Windsurf

.windsurfrules

Core checks apply

Cline

.clinerules

Core checks apply

If multiple entry files exist, priority order is CLAUDE.md → AGENTS.md → .cursorrules → copilot-instructions.md → GEMINI.md → .windsurfrules → .clinerules → .cursor/rules/*.mdc. The winning file is reported in F1's measured_value along with all detected files.

What you get

The HTML report shows a segmented gauge, expandable dimension breakdowns with per-check detail, and a prioritized issues list. Before/after comparison when fixes are applied.

Why this matters

AI coding agents read your repo structure, docs, CI config, and handoff notes. They git push, trigger pipelines, and write files. A well-structured repo gets dramatically better AI output. A poorly structured one wastes tokens, ignores rules, repeats mistakes, and may expose secrets.

AgentLint is built on data most developers never see:

  • 265 versions of Anthropic's Claude Code system prompt — every word added, deleted, and rewritten

  • Claude Code internals — hard limits (40K char max, 256KB file read limit, pre-commit hook behavior) that silently break your setup

  • Production security audits across open-source codebases — the gaps AI agents walk into

  • 4,533-repo corpus analysis — hook/permission anti-patterns across 739 hooks and 1,562 settings.json files

  • 6 academic papers on instruction-following, context files, and documentation decay

How scoring works

Each check produces a 0-1 score, weighted by dimension, scaled to 100. The total is averaged only over dimensions whose checks actually ran — dimensions with no evidence show n/a, not 0/10, and don't drag the total down.

Core dimensions (always run)

These 6 dimensions are evaluated by the deterministic scanner. They run in CLI, the GitHub Action, and Claude Code /al by default. The weights below sum to 100%.

Dimension
Weight
Why?

Instructions

25%

Unique value. No other tool checks CLAUDE.md quality

Findability

20%

AI can't follow rules it can't find

Workability

18%

Can AI actually run your code?

Safety

15%

Is AI working without exposing secrets or triggering vulnerabilities?

Continuity

12%

Does knowledge survive across sessions?

Harness

10%

Are your Claude Code hooks/permissions actually configured correctly?

Default total line: Score: NN/100 (core).

Extended dimensions (opt-in)

These two require runtime conditions that CI and a plain CLI can't provide: Deep needs an AI sub-agent, Session needs local Claude Code logs. They are opt-in via Claude Code /al. When they run, the total header shows (core+extended) and includes their contributions.

Dimension
When it runs
What it adds

Deep

/al with "AI Deep Analysis" selected

Contradictions, dead-weight rules, vague decision boundaries

Session

/al with "Session Analysis" selected

Patterns from your Claude Code session logs

Scores are measurements, not judgments. Reference values come from Anthropic's own data. You decide what to fix.

Evidence

Every check cites its source. Full citations in standards/evidence.jsonarrow-up-right.

Source
Type

corpus-4533arrow-up-right analysis of 4,533 Claude Code repos

First-party data

Claude Code internals

Hard limits and observed behavior

Instruction compliance at scale

Do context files help coding agents?

Stale content as #1 failure mode

Concrete vs abstract effectiveness

Requirements

Update

Last updated

Was this helpful?