prompt-lint

A fast, dependency-free prompt injection and skill presence detector for agentic pipelines.

Intent — What This Is For

prompt-lint is not primarily about detecting malicious intent. A document can be completely benign in motivation and still pose an injection risk to a sub-agent that processes it.

The core threat model: a research sub-agent fetching content from the web, email, documents, or tool outputs may encounter text that contains agent instructions, skill-file syntax, or behavioral overrides — whether placed there maliciously or not. An unhardened agent that processes this content without filtering may execute those instructions.

The correct defense is not intent detection — it is presence detection. Any skill or instruction set that the agent wasn't explicitly given should be treated as a potential injection, regardless of whether it looks "malicious." The only skills an agent should execute are those on a pre-approved allowlist, ideally with cryptographic verification of their integrity and provenance. Everything else — however innocent it may appear — should be quarantined before reaching the agent.

prompt-lint implements the detection side of this: scanning external content for the presence of instruction-like patterns before a sub-agent sees them.

Why

Sub-agents delegated to research tasks are typically less hardened against prompt injection than a primary assistant. They may have access to tools, credentials, or actions that an attacker could exploit by embedding instructions in fetched content.

A lint pass on external content before an agent processes it is a lightweight but meaningful safety layer. It is not a complete solution — cryptographic skill verification and strict allowlisting are also required — but it is a practical first line of defense deployable today.

Usage

# Scan a file
python3 prompt_lint.py suspicious_page.md

# Scan stdin
cat fetched_content.md | python3 prompt_lint.py -

# JSON output for pipeline integration
python3 prompt_lint.py page.md --format json

# Only report high/critical findings (lower false positives)
python3 prompt_lint.py page.md --threshold high

# Exit code 1 if flagged (for shell pipelines)
python3 prompt_lint.py page.md --threshold high --exit-code

Risk Levels

Score	Level	Recommended action
0	CLEAN	Safe to process
1–4	LOW	Review if sensitive context
5–14	MEDIUM	Summarize rather than pass raw to agent
15–29	HIGH	Quarantine; human review recommended
30+	CRITICAL	Block; likely active injection attempt

Detection Categories

Category	Examples
`INSTRUCTION_OVERRIDE`	"Ignore previous instructions", system prompt replacement
`ROLE_HIJACK`	"You are now DAN", persona replacement, jailbreak patterns
`PERMISSION_ESCALATION`	Claimed grants, "developer mode", safety-disable claims
`EXFILTRATION`	System prompt extraction, file/credential leaks
`TOOL_ABUSE`	Direct tool invocation, embedded code execution
`CONTEXT_SPOOF`	Fake Human:/Assistant: turns, false document boundaries
`URGENCY_OVERRIDE`	IMPORTANT: override framing, hidden instruction headers
`SKILL_INJECTION`	Skill file mimicry, agent behavior override via skill syntax

Pipeline Integration

import subprocess, json

def lint_before_processing(filepath: str, threshold: str = "high") -> dict:
    result = subprocess.run(
        ["python3", "/path/to/prompt_lint.py", filepath,
         "--format", "json", "--threshold", threshold],
        capture_output=True, text=True
    )
    report = json.loads(result.stdout)
    if report["risk_level"] in ("HIGH", "CRITICAL"):
        # Quarantine: summarize instead of passing raw content to agent
        raise ValueError(f"Injection risk {report['risk_level']} in {filepath}")
    return report

Statistical Layer

In addition to pattern matching, prompt-lint includes a statistical scoring layer (n-gram LLR model) trained on a corpus of 11 malicious and 161 benign documents. The statistical score is additive — it complements rule-based findings without replacing them.

Retrain the model after expanding the corpus:

python3 corpus_analysis.py --output model.json

Limitations

Pattern-based; a sufficiently novel or obfuscated attack may evade detection
Does not detect purely semantic injection (paraphrased attacks without keywords)
Unicode homoglyphs, Base64 encoding, and split-token attacks not yet covered
Not a substitute for allowlisting and cryptographic skill verification

Related Work

See POSITION.md for a broader treatment of the skill trust problem: how skills should be allowlisted, cryptographically verified, and isolated, and how prompt-lint fits into that security model.

Contributing

Patterns live in the PATTERNS list in prompt_lint.py. The test suite is in tests/ with labeled benign, malicious, and ambiguous examples.

Run tests: python3 tests/run_tests.py

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
__pycache__		__pycache__
tests		tests
POSITION.md		POSITION.md
README.md		README.md
build_benign_corpus.py		build_benign_corpus.py
corpus_analysis.py		corpus_analysis.py
model.json		model.json
prompt_lint.py		prompt_lint.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

prompt-lint

Intent — What This Is For

Why

Usage

Risk Levels

Detection Categories

Pipeline Integration

Statistical Layer

Limitations

Related Work

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

prompt-lint

Intent — What This Is For

Why

Usage

Risk Levels

Detection Categories

Pipeline Integration

Statistical Layer

Limitations

Related Work

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages