Skip to content

test: align MCP plugin codegen eval with plugin-mcp tools config API#16895

Closed
GermanJablo wants to merge 1 commit into
payloadcms:mainfrom
GermanJablo:fix/eval-mcp-collection-tools-shape
Closed

test: align MCP plugin codegen eval with plugin-mcp tools config API#16895
GermanJablo wants to merge 1 commit into
payloadcms:mainfrom
GermanJablo:fix/eval-mcp-collection-tools-shape

Conversation

@GermanJablo

@GermanJablo GermanJablo commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

The plugins/official/codegen/mcp eval expected the per-collection shape { enabled: { find, create } }, but @payloadcms/plugin-mcp configures per-collection tools via { tools: { find, create } }. The enabled key was removed in the plugin refactor (#16726), so the rubric was penalizing output that correctly follows the current API. This updates the expected shape to tools.

Open question

How should evals (and skills) be maintained across versions, given v4 is still in beta while most users run 3.x? Some options:

  1. Track latest only (v4, even in beta). Single source of truth and GA-ready, but most users are on 3.x today, so answers/evals serve a version almost nobody runs in prod yet.
  2. Track 3.x now, switch to v4 at GA. Matches what most users run today, but doesn't help v4 early adopters and needs a coordinated switch at GA.
  3. Version-aware breaking-changes layer. Feed a breaking-changes registry from the official v3 -> v4 migration guide and per-release notes (rather than maintaining it by hand); on each question the agent checks it and, if an API changed, warns that earlier versions may differ.

My preference is option 3, but that can be implemented in a later PR. This is necessary to unlock the CI of the Payload Agent that runs these evaluations.

The plugins/official/codegen/mcp eval expected the per-collection config
shape `{ enabled: { find, create } }`, but `@payloadcms/plugin-mcp` exposes
per-collection tools via `{ tools: { find, create } }` (see
`MCPPluginCollectionConfig` in packages/plugin-mcp/src/types.ts and the
examples in docs/plugins/mcp.mdx). There is no `enabled` key, so the rubric
was penalizing output that correctly follows the documented API. Update the
expected shape to `tools`.
@GermanJablo GermanJablo closed this by deleting the head repository Jun 5, 2026
GermanJablo added a commit that referenced this pull request Jun 8, 2026
…16896)

The `plugins/official/codegen/mcp` eval expected the per-collection
shape `{ enabled: { find, create } }`, but `@payloadcms/plugin-mcp`
configures per-collection tools via `{ tools: { find, create } }`. The
`enabled` key was removed in the plugin refactor (#16726), so the rubric
was penalizing output that correctly follows the current API. This
updates the expected shape to `tools`.

> Re-opened directly on the repo (previous PR #16895 was closed when the
source fork was deleted).

## Open question

How should evals (and skills) be maintained across versions, given v4 is
still in beta while most users run 3.x? Some options:

1. **Track latest only (v4, even in beta).** Single source of truth and
GA-ready, but most users are on 3.x today, so answers/evals serve a
version almost nobody runs in prod yet.
2. **Track 3.x now, switch to v4 at GA.** Matches what most users run
today, but doesn't help v4 early adopters and needs a coordinated switch
at GA.
3. **Version-aware breaking-changes layer.** Feed a breaking-changes
registry from the official v3 -> v4 migration guide and per-release
notes (rather than maintaining it by hand); on each question the agent
checks it and, if an API changed, warns that earlier versions may
differ.

My preference is option 3, but that can be implemented in a later PR.
This is necessary to unlock the CI of the Payload Agent that runs these
evaluations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants