Releases: blackwell-systems/mcp-assert
v0.12.3
v0.12.3: Lint False Positive Fix
Fixed
E105/E107 false positive reduction (#23)
On servers with many tools sharing path/ID/content params (e.g., 65-tool code intelligence servers), E105 and E107 were generating thousands of unusable findings. This release eliminates the noise:
- E105 now skips params that are unconstrained by design: filesystem paths (
*_path,*_file,*_dir), identifiers (*_id,*_uri), and free-text content (new_text,query,filter,command,description,scope, etc.) - E107 collapses cycle variants to one finding per unique cycle (was reporting every path permutation)
- E107 downgraded from error to warning (circular deps are advisory)
Before (65-tool server): 2,784 E105 errors + 751 E107 errors = 3,535 unusable findings
After: 0 E105 errors, 2 E107 warnings
Real issues (missing descriptions, missing examples, schema drift) are no longer buried.
Upgrade Notes
If you pinned --threshold to a high number to work around E105/E107 noise, you can lower it now. --threshold 0 should work for well-documented servers.
Full Changelog: v0.12.2...v0.12.3
v0.12.2
Fix npm publish for scoped plugin packages. Scope plugins to @blackwell-systems org.
v0.12.1
Fixed
- E112 false positives:
token_budget,max_tokens,page_token, and similar non-sensitive token parameters no longer flagged as secrets. Added allowlist for common non-auth "token" params. - E107 false positives: Circular dependency detection now requires strong field name match (>=0.8 similarity). Description cross-references ("see also
blast_radius") no longer create dependency edges.
Added
--skip-rulesflag: Suppress specific rule codes in CI. Example:mcp-assert lint --server "..." --skip-rules E107,E112
Full Changelog: v0.12.0...v0.12.1
v0.12.0
v0.12.0: Static Analysis Engine + Auto-Fix
24 Lint Rules (was 10)
The lint command now has 24 rules that catch agent-breaking schema issues before runtime. 14 new rules added this release:
| Code | What it catches |
|---|---|
| E105 | Unconstrained strings flowing between tools (free text propagation) |
| E107 | Circular dependencies in tool graph (agents loop forever) |
| E112 | Sensitive parameters exposed (password, api_key, token) |
| E113 | Duplicate tool names |
| W107 | Non-deterministic output (same input, different results) |
| W108 | Hidden side effects (name says "create" but description doesn't acknowledge) |
| W109 | Missing examples on user-facing params (query, email, url) |
| W110 | Schema-description drift (>50% params not mentioned in description) |
| W111 | Description too short (<20 chars) or too long (>500 chars) |
| W112 | Server exposes >20 tools (LLM accuracy degrades) |
| W114 | Input schema nested >3 levels (LLMs struggle with deep nesting) |
| W115 | Single tool consumes >1000 tokens of context |
| W116 | Description doesn't mention what tool returns |
| (overloaded) | >3 action verbs in description (tool does too many things) |
Auto-Fix (--fix)
Generate schema improvement suggestions automatically:
mcp-assert lint --server "npx my-server" --fix
memory-server: 9 tools, 25 findings, 23 auto-fixable
E103 create_entities Add description: "The entities value (array)"
W109 search_nodes Add examples to "query": [search term]
W116 read_graph Append: "Returns the graph data as JSON."
23 fixes generated.Infers descriptions from tool names, formats from param patterns (email, uuid, date-time), examples from common names, and return clauses from verbs. JSON output with --fix --json.
--strict Mode
Promote all warnings to errors for CI gates:
mcp-assert lint --server "..." --strict
# 16 error(s), 0 warning(s)--detect-nondeterminism
Calls each tool 3x with identical inputs, compares output hashes. Flags tools that produce different results across runs.
Tool Dependency Graph
Infers data-flow dependencies between tools by matching parameter names, types, and description tokens. Powers E105 (free text propagation) and E107 (circular dependency). Generic parameters excluded to prevent false positives.
Unified Error Taxonomy
All commands now share a single error code registry. Audit output shows structured codes:
✓ read_query 1ms [E000] responds, returns content
✗ create_table 0ms [E201] internal error: panic: nil pointer...
Scorecard Validation
Tested on 6 servers: memory (92% fix rate), filesystem (72%), sqlite (94%), time (60%), antvis-chart, fetch (75%).
Full Changelog: v0.11.0...v0.12.0
v0.11.0
v0.11.0: Server Reuse
--reuse-server flag
Assertions with the same server config now share a single server process and fixture copy. One cold start instead of N.
mcp-assert ci --suite evals/ --reuse-serverOn agent-lsp's 87-test suite: 12 minutes to ~2.5 minutes locally.
How it works:
ServerKey()hashes the server config (command, args, env, transport) for grouping- Assertions in the same group share one MCP client and fixture copy
- Stateful tools (rename_symbol, apply_edit, restart_lsp, activate_skill) are auto-detected and run isolated
- Trajectory assertions and serverless tests are excluded from sharing automatically
- Panic recovery provides defense-in-depth if a shared server dies unexpectedly
Available on run and ci commands. Opt-in, default off.
Fixes
-
Multi-content MCP response handling.
json_pathandmin_max_resultscheckers now handle responses with multiple content items (e.g., a JSON result in Content[0] and a hint in Content[1]). Previously the concatenated text broke JSON parsing. -
Panic recovery in intercept goroutines. Added
defer recover()to intercept proxy goroutines. 38 new tests.
Also
- Updated social preview logo
v0.10.0
What's new in lint
Three new lint rules that catch issues no other MCP testing tool detects:
W104: Generic parameter names
Flags parameters named data, value, input, payload, options, etc. that have no description. These names give agents zero signal about what to pass. Only fires when the name is generic AND the description is missing.
W105: Tool similarity detection
Compares all tool descriptions pairwise using Dice coefficient bigram similarity. Flags pairs with >80% overlap that agents will confuse.
W105 list_directory Tool "list_directory" and "list_directory_with_sizes" have 94% similar descriptions.
If two tools have nearly identical descriptions, agents pick between them randomly. This catches a class of bug that only manifests in production when agents make wrong tool choices.
W106: Schema bloat
Warns when the total tools/list response exceeds 8K tokens (~32KB JSON). Large schemas consume a significant chunk of the agent's context window before any work begins.
Lint codes summary (now 10 total)
| Code | Severity | What it catches |
|---|---|---|
| E101 | Error | Tool has no description |
| E102 | Error | Parameter has no type |
| E103 | Error | Required parameter has no description |
| E301 | Error | Response exceeds size limit (with --call-tools) |
| W101 | Warning | Description too vague |
| W102 | Warning | Optional parameter undescribed |
| W103 | Warning | Required string with no enum/pattern/example |
| W104 | Warning | Generic parameter name with no description |
| W105 | Warning | Tool descriptions >80% similar |
| W106 | Warning | Schema exceeds 8K tokens |
Full changelog
https://github.com/blackwell-systems/mcp-assert/blob/main/CHANGELOG.md
v0.9.0
What's new
lint command
Static schema analysis for agent usability. Connects to any MCP server, calls tools/list, and checks each tool's schema for issues that cause agents to misuse tools.
mcp-assert lint --server "npx -y @modelcontextprotocol/server-filesystem /tmp"7 lint codes covering missing descriptions, untyped parameters, free-text strings without constraints, and oversized responses:
| Code | Severity | What it catches |
|---|---|---|
| E101 | Error | Tool has no description |
| E102 | Error | Parameter has no type |
| E103 | Error | Required parameter has no description |
| E301 | Error | Response exceeds size limit (with --call-tools) |
| W101 | Warning | Description too vague |
| W102 | Warning | Optional parameter undescribed |
| W103 | Warning | Required string with no enum/pattern/example |
Results so far: 254 findings across 11 servers. The official filesystem server has 16 undescribed required parameters. The GitHub MCP server has 112 schema quality issues.
Internal improvements
- Shared server flags across audit, fuzz, and lint (no flag drift)
- Shared
connectAndInitialize()connection logic - Expanded source comments across 11 files
Full changelog
https://github.com/blackwell-systems/mcp-assert/blob/main/CHANGELOG.md
v0.8.0
mcp-assert fuzz
New command: adversarial input testing for MCP servers. Zero setup, no YAML.
mcp-assert fuzz --server "npx my-mcp-server"
Generates category-based adversarial inputs from each tool's JSON Schema:
empty strings, null values, wrong types, missing required fields, boundary
numbers, injection payloads, deeply nested objects, and seeded random
mutations. Reports crashes, hangs, and protocol errors.
First run found a bug in the MCP TypeScript SDK
On its very first test against the official reference server, fuzz
discovered that every server built on @modelcontextprotocol/sdk (12K stars)
crashes on null arguments with -32603 (InternalError) instead of accepting
them. Fix submitted: modelcontextprotocol/typescript-sdk#2013.
Fuzz sweep results
12 servers fuzzed, 93 tools, 930 adversarial inputs:
- Python SDK servers: 8 servers, 535 runs, zero crashes
- TypeScript SDK servers: all hit the null args SDK bug
- puppeteer: 15 additional crashes (unvalidated URLs) + 2 hangs
CI integration
mcp-assert fuzz --server "..." --junit results.xml --markdown summary.md
JUnit XML and GitHub Step Summary output, same as audit/ci/run.
Auto-detects $GITHUB_STEP_SUMMARY. Reproducible via --seed.
Progression
mcp-assert audit --server "..." # 1 call per tool, happy path
mcp-assert fuzz --server "..." # 50 calls per tool, adversarial
mcp-assert run --suite evals/ # custom assertions per tool
Other changes
Added
- phpunit-mcp-assert: PHPUnit plugin (Packagist)
- bun-mcp-assert: Bun test plugin (npm)
- assert_notifications block type
- Dynamic download stats SVG (hides zero-count channels)
Fixed
- Intercept race condition on cmd variable (channel-based handoff)
- CI threshold counts skipped assertions in denominator
- Notification assertions missing delivery window
- Audit stderr output in --json mode
- JSON marshal error handling in audit/fuzz
- Intercept process leak on timeout
- SSE client leak, JSONPath ordering, file_contains substitution,
snapshot nil dereference, watch mode formatting, Go plugin recursion
v0.7.3
Changelog
- 391a5ab chore: stamp v0.7.3 in changelog
- c26fd4c chore: update download stats [skip ci]
- dd42546 chore: update download stats [skip ci]
- 87bec37 fix: address all inspector findings (3 major, 5 minor, 2 suggestions)
- 1a7a112 fix: cache logic now accepts fresh values when cache is empty or invalid
- d4b7ca2 fix: second inspector pass — duration_ms contract, matrix ordering, markdown Close
v0.7.2
Changelog
- 95c08c1 chore: stamp v0.7.2 in changelog
- aa7ddf1 chore: update download stats (6,621 total)
- ca73230 chore: update download stats (6,621 total) [skip ci]
- 6b3c4e3 chore: update download stats [skip ci]
- 723e025 chore: update download stats [skip ci]
- bd0c2bd chore: update download stats [skip ci]
- 1569965 chore: update download stats [skip ci]
- adb434a chore: update download stats [skip ci]
- d0bfba9 chore: update download stats [skip ci]
- 6e55754 chore: update download stats [skip ci]
- 03d5bff chore: update download stats [skip ci]
- 5cb0925 chore: update download stats [skip ci]
- d72bc90 chore: update download stats frequency to hourly
- 0390bf4 feat: add Docker distribution (GHCR + Docker Hub) and docker pull stats
- d9da4b5 feat: add Snap and Docker distribution channels
- 9e85554 feat: add jest-mcp-assert plugin, Docker/Snap distribution, download stats updates
- bc38459 feat: add mcpassert Go test plugin
- 757dd03 fix: add high-water mark cache to download stats
- 81f69e5 fix: update Dockerfile to Go 1.25 (matches go.mod)