Releases · blackwell-systems/mcp-assert

v0.12.3: Lint False Positive Fix

Fixed

E105/E107 false positive reduction (#23)

On servers with many tools sharing path/ID/content params (e.g., 65-tool code intelligence servers), E105 and E107 were generating thousands of unusable findings. This release eliminates the noise:

E105 now skips params that are unconstrained by design: filesystem paths (*_path, *_file, *_dir), identifiers (*_id, *_uri), and free-text content (new_text, query, filter, command, description, scope, etc.)
E107 collapses cycle variants to one finding per unique cycle (was reporting every path permutation)
E107 downgraded from error to warning (circular deps are advisory)

Before (65-tool server): 2,784 E105 errors + 751 E107 errors = 3,535 unusable findings
After: 0 E105 errors, 2 E107 warnings

Real issues (missing descriptions, missing examples, schema drift) are no longer buried.

Upgrade Notes

If you pinned --threshold to a high number to work around E105/E107 noise, you can lower it now. --threshold 0 should work for well-documented servers.

Full Changelog: v0.12.2...v0.12.3

@blackwell-systems

Fix npm publish for scoped plugin packages. Scope plugins to @blackwell-systems org.

Fixed

E112 false positives: token_budget, max_tokens, page_token, and similar non-sensitive token parameters no longer flagged as secrets. Added allowlist for common non-auth "token" params.
E107 false positives: Circular dependency detection now requires strong field name match (>=0.8 similarity). Description cross-references ("see also blast_radius") no longer create dependency edges.

Added

--skip-rules flag: Suppress specific rule codes in CI. Example: mcp-assert lint --server "..." --skip-rules E107,E112

Full Changelog: v0.12.0...v0.12.1

v0.12.0: Static Analysis Engine + Auto-Fix

24 Lint Rules (was 10)

The lint command now has 24 rules that catch agent-breaking schema issues before runtime. 14 new rules added this release:

Code	What it catches
E105	Unconstrained strings flowing between tools (free text propagation)
E107	Circular dependencies in tool graph (agents loop forever)
E112	Sensitive parameters exposed (password, api_key, token)
E113	Duplicate tool names
W107	Non-deterministic output (same input, different results)
W108	Hidden side effects (name says "create" but description doesn't acknowledge)
W109	Missing examples on user-facing params (query, email, url)
W110	Schema-description drift (>50% params not mentioned in description)
W111	Description too short (<20 chars) or too long (>500 chars)
W112	Server exposes >20 tools (LLM accuracy degrades)
W114	Input schema nested >3 levels (LLMs struggle with deep nesting)
W115	Single tool consumes >1000 tokens of context
W116	Description doesn't mention what tool returns
(overloaded)	>3 action verbs in description (tool does too many things)

Auto-Fix (`--fix`)

Generate schema improvement suggestions automatically:

mcp-assert lint --server "npx my-server" --fix

memory-server: 9 tools, 25 findings, 23 auto-fixable

  E103   create_entities   Add description: "The entities value (array)"
  W109   search_nodes      Add examples to "query": [search term]
  W116   read_graph        Append: "Returns the graph data as JSON."

23 fixes generated.

Infers descriptions from tool names, formats from param patterns (email, uuid, date-time), examples from common names, and return clauses from verbs. JSON output with --fix --json.

`--strict` Mode

Promote all warnings to errors for CI gates:

mcp-assert lint --server "..." --strict
# 16 error(s), 0 warning(s)

`--detect-nondeterminism`

Calls each tool 3x with identical inputs, compares output hashes. Flags tools that produce different results across runs.

Tool Dependency Graph

Infers data-flow dependencies between tools by matching parameter names, types, and description tokens. Powers E105 (free text propagation) and E107 (circular dependency). Generic parameters excluded to prevent false positives.

Unified Error Taxonomy

All commands now share a single error code registry. Audit output shows structured codes:

✓ read_query      1ms  [E000] responds, returns content
✗ create_table    0ms  [E201] internal error: panic: nil pointer...

Scorecard Validation

Tested on 6 servers: memory (92% fix rate), filesystem (72%), sqlite (94%), time (60%), antvis-chart, fetch (75%).

Full Changelog: v0.11.0...v0.12.0

v0.11.0: Server Reuse

--reuse-server flag

Assertions with the same server config now share a single server process and fixture copy. One cold start instead of N.

mcp-assert ci --suite evals/ --reuse-server

On agent-lsp's 87-test suite: 12 minutes to ~2.5 minutes locally.

How it works:

ServerKey() hashes the server config (command, args, env, transport) for grouping
Assertions in the same group share one MCP client and fixture copy
Stateful tools (rename_symbol, apply_edit, restart_lsp, activate_skill) are auto-detected and run isolated
Trajectory assertions and serverless tests are excluded from sharing automatically
Panic recovery provides defense-in-depth if a shared server dies unexpectedly

Available on run and ci commands. Opt-in, default off.

Fixes

Multi-content MCP response handling. json_path and min_max_results checkers now handle responses with multiple content items (e.g., a JSON result in Content[0] and a hint in Content[1]). Previously the concatenated text broke JSON parsing.
Panic recovery in intercept goroutines. Added defer recover() to intercept proxy goroutines. 38 new tests.

Also

Updated social preview logo

What's new in lint

Three new lint rules that catch issues no other MCP testing tool detects:

W104: Generic parameter names

Flags parameters named data, value, input, payload, options, etc. that have no description. These names give agents zero signal about what to pass. Only fires when the name is generic AND the description is missing.

W105: Tool similarity detection

Compares all tool descriptions pairwise using Dice coefficient bigram similarity. Flags pairs with >80% overlap that agents will confuse.

W105  list_directory  Tool "list_directory" and "list_directory_with_sizes" have 94% similar descriptions.

If two tools have nearly identical descriptions, agents pick between them randomly. This catches a class of bug that only manifests in production when agents make wrong tool choices.

W106: Schema bloat

Warns when the total tools/list response exceeds 8K tokens (~32KB JSON). Large schemas consume a significant chunk of the agent's context window before any work begins.

Lint codes summary (now 10 total)

Code	Severity	What it catches
E101	Error	Tool has no description
E102	Error	Parameter has no type
E103	Error	Required parameter has no description
E301	Error	Response exceeds size limit (with --call-tools)
W101	Warning	Description too vague
W102	Warning	Optional parameter undescribed
W103	Warning	Required string with no enum/pattern/example
W104	Warning	Generic parameter name with no description
W105	Warning	Tool descriptions >80% similar
W106	Warning	Schema exceeds 8K tokens

Full changelog

https://github.com/blackwell-systems/mcp-assert/blob/main/CHANGELOG.md

What's new

`lint` command

Static schema analysis for agent usability. Connects to any MCP server, calls tools/list, and checks each tool's schema for issues that cause agents to misuse tools.

mcp-assert lint --server "npx -y @modelcontextprotocol/server-filesystem /tmp"

7 lint codes covering missing descriptions, untyped parameters, free-text strings without constraints, and oversized responses:

Code	Severity	What it catches
E101	Error	Tool has no description
E102	Error	Parameter has no type
E103	Error	Required parameter has no description
E301	Error	Response exceeds size limit (with --call-tools)
W101	Warning	Description too vague
W102	Warning	Optional parameter undescribed
W103	Warning	Required string with no enum/pattern/example

Results so far: 254 findings across 11 servers. The official filesystem server has 16 undescribed required parameters. The GitHub MCP server has 112 schema quality issues.

Internal improvements

Shared server flags across audit, fuzz, and lint (no flag drift)
Shared connectAndInitialize() connection logic
Expanded source comments across 11 files

Full changelog

https://github.com/blackwell-systems/mcp-assert/blob/main/CHANGELOG.md

mcp-assert fuzz

New command: adversarial input testing for MCP servers. Zero setup, no YAML.

mcp-assert fuzz --server "npx my-mcp-server"

Generates category-based adversarial inputs from each tool's JSON Schema:
empty strings, null values, wrong types, missing required fields, boundary
numbers, injection payloads, deeply nested objects, and seeded random
mutations. Reports crashes, hangs, and protocol errors.

First run found a bug in the MCP TypeScript SDK

On its very first test against the official reference server, fuzz
discovered that every server built on @modelcontextprotocol/sdk (12K stars)
crashes on null arguments with -32603 (InternalError) instead of accepting
them. Fix submitted: modelcontextprotocol/typescript-sdk#2013.

Fuzz sweep results

12 servers fuzzed, 93 tools, 930 adversarial inputs:

Python SDK servers: 8 servers, 535 runs, zero crashes
TypeScript SDK servers: all hit the null args SDK bug
puppeteer: 15 additional crashes (unvalidated URLs) + 2 hangs

CI integration

mcp-assert fuzz --server "..." --junit results.xml --markdown summary.md

JUnit XML and GitHub Step Summary output, same as audit/ci/run.
Auto-detects $GITHUB_STEP_SUMMARY. Reproducible via --seed.

Progression

mcp-assert audit --server "..."     # 1 call per tool, happy path
mcp-assert fuzz  --server "..."     # 50 calls per tool, adversarial
mcp-assert run   --suite evals/     # custom assertions per tool

Other changes

Added

phpunit-mcp-assert: PHPUnit plugin (Packagist)
bun-mcp-assert: Bun test plugin (npm)
assert_notifications block type
Dynamic download stats SVG (hides zero-count channels)

Fixed

Intercept race condition on cmd variable (channel-based handoff)
CI threshold counts skipped assertions in denominator
Notification assertions missing delivery window
Audit stderr output in --json mode
JSON marshal error handling in audit/fuzz
Intercept process leak on timeout
SSE client leak, JSONPath ordering, file_contains substitution,
snapshot nil dereference, watch mode formatting, Go plugin recursion

Changelog

391a5ab chore: stamp v0.7.3 in changelog
c26fd4c chore: update download stats [skip ci]
dd42546 chore: update download stats [skip ci]
87bec37 fix: address all inspector findings (3 major, 5 minor, 2 suggestions)
1a7a112 fix: cache logic now accepts fresh values when cache is empty or invalid
d4b7ca2 fix: second inspector pass — duration_ms contract, matrix ordering, markdown Close

Changelog

95c08c1 chore: stamp v0.7.2 in changelog
aa7ddf1 chore: update download stats (6,621 total)
ca73230 chore: update download stats (6,621 total) [skip ci]
6b3c4e3 chore: update download stats [skip ci]
723e025 chore: update download stats [skip ci]
bd0c2bd chore: update download stats [skip ci]
1569965 chore: update download stats [skip ci]
adb434a chore: update download stats [skip ci]
d0bfba9 chore: update download stats [skip ci]
6e55754 chore: update download stats [skip ci]
03d5bff chore: update download stats [skip ci]
5cb0925 chore: update download stats [skip ci]
d72bc90 chore: update download stats frequency to hourly
0390bf4 feat: add Docker distribution (GHCR + Docker Hub) and docker pull stats
d9da4b5 feat: add Snap and Docker distribution channels
9e85554 feat: add jest-mcp-assert plugin, Docker/Snap distribution, download stats updates
bc38459 feat: add mcpassert Go test plugin
757dd03 fix: add high-water mark cache to download stats
81f69e5 fix: update Dockerfile to Go 1.25 (matches go.mod)

Releases: blackwell-systems/mcp-assert

v0.12.3

v0.12.3: Lint False Positive Fix

Fixed

Upgrade Notes

Uh oh!

v0.12.2

Contributors

Uh oh!

v0.12.1

Fixed

Added

Uh oh!

v0.12.0

v0.12.0: Static Analysis Engine + Auto-Fix

24 Lint Rules (was 10)

Auto-Fix (--fix)

--strict Mode

--detect-nondeterminism

Tool Dependency Graph

Unified Error Taxonomy

Scorecard Validation

Uh oh!

v0.11.0

v0.11.0: Server Reuse

--reuse-server flag

Fixes

Also

Uh oh!

v0.10.0

What's new in lint

W104: Generic parameter names

W105: Tool similarity detection

W106: Schema bloat

Lint codes summary (now 10 total)

Full changelog

Uh oh!

v0.9.0

What's new

lint command

Internal improvements

Full changelog

Uh oh!

v0.8.0

mcp-assert fuzz

First run found a bug in the MCP TypeScript SDK

Fuzz sweep results

CI integration

Progression

Other changes

Added

Fixed

Uh oh!

v0.7.3

Changelog

Uh oh!

v0.7.2

Changelog

Uh oh!

Auto-Fix (`--fix`)

`--strict` Mode

`--detect-nondeterminism`

`lint` command