test: align MCP plugin codegen eval with plugin-mcp tools config API by GermanJablo · Pull Request #16895 · payloadcms/payload

GermanJablo · 2026-06-05T09:38:51Z

The plugins/official/codegen/mcp eval expected the per-collection shape { enabled: { find, create } }, but @payloadcms/plugin-mcp configures per-collection tools via { tools: { find, create } }. The enabled key was removed in the plugin refactor (#16726), so the rubric was penalizing output that correctly follows the current API. This updates the expected shape to tools.

Open question

How should evals (and skills) be maintained across versions, given v4 is still in beta while most users run 3.x? Some options:

Track latest only (v4, even in beta). Single source of truth and GA-ready, but most users are on 3.x today, so answers/evals serve a version almost nobody runs in prod yet.
Track 3.x now, switch to v4 at GA. Matches what most users run today, but doesn't help v4 early adopters and needs a coordinated switch at GA.
Version-aware breaking-changes layer. Feed a breaking-changes registry from the official v3 -> v4 migration guide and per-release notes (rather than maintaining it by hand); on each question the agent checks it and, if an API changed, warns that earlier versions may differ.

My preference is option 3, but that can be implemented in a later PR. This is necessary to unlock the CI of the Payload Agent that runs these evaluations.

The plugins/official/codegen/mcp eval expected the per-collection config shape `{ enabled: { find, create } }`, but `@payloadcms/plugin-mcp` exposes per-collection tools via `{ tools: { find, create } }` (see `MCPPluginCollectionConfig` in packages/plugin-mcp/src/types.ts and the examples in docs/plugins/mcp.mdx). There is no `enabled` key, so the rubric was penalizing output that correctly follows the documented API. Update the expected shape to `tools`.

…16896) The `plugins/official/codegen/mcp` eval expected the per-collection shape `{ enabled: { find, create } }`, but `@payloadcms/plugin-mcp` configures per-collection tools via `{ tools: { find, create } }`. The `enabled` key was removed in the plugin refactor (#16726), so the rubric was penalizing output that correctly follows the current API. This updates the expected shape to `tools`. > Re-opened directly on the repo (previous PR #16895 was closed when the source fork was deleted). ## Open question How should evals (and skills) be maintained across versions, given v4 is still in beta while most users run 3.x? Some options: 1. **Track latest only (v4, even in beta).** Single source of truth and GA-ready, but most users are on 3.x today, so answers/evals serve a version almost nobody runs in prod yet. 2. **Track 3.x now, switch to v4 at GA.** Matches what most users run today, but doesn't help v4 early adopters and needs a coordinated switch at GA. 3. **Version-aware breaking-changes layer.** Feed a breaking-changes registry from the official v3 -> v4 migration guide and per-release notes (rather than maintaining it by hand); on each question the agent checks it and, if an API changed, warns that earlier versions may differ. My preference is option 3, but that can be implemented in a later PR. This is necessary to unlock the CI of the Payload Agent that runs these evaluations.

github-actions Bot added the created-by: Payload team label Jun 5, 2026

GermanJablo requested a review from denolfe June 5, 2026 09:51

GermanJablo assigned denolfe Jun 5, 2026

denolfe approved these changes Jun 5, 2026

View reviewed changes

GermanJablo closed this by deleting the head repository Jun 5, 2026

GermanJablo mentioned this pull request Jun 5, 2026

test: align MCP plugin codegen eval with plugin-mcp tools config API #16896

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: align MCP plugin codegen eval with plugin-mcp tools config API#16895

test: align MCP plugin codegen eval with plugin-mcp tools config API#16895
GermanJablo wants to merge 1 commit into
payloadcms:mainfrom
GermanJablo:fix/eval-mcp-collection-tools-shape

GermanJablo commented Jun 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

GermanJablo commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Open question

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

GermanJablo commented Jun 5, 2026 •

edited

Loading