ci(e2e): stabilize MCP/CLI flows and cancel stale main runs#4039
Conversation
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
419efe5 to
e4959fe
Compare
There was a problem hiding this comment.
Pull request overview
This pull request updates the MCP message-flow E2E to validate the specific MCP add tool call/result pairing by tool_use_id, while no longer assuming the MCP call must be the first tool-related event in the assistant stream (to allow tool_search / discovery to occur earlier).
Changes:
- Makes the prompt explicitly require calling the
addtool to reduce “no-tool-call” model variance. - Locates the MCP
addtool_useby name and records itstool_use_id. - Verifies a matching non-error
tool_resultexists for that sametool_use_id, instead of asserting tool ordering.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
e4959fe to
42bb496
Compare
The model consistently picks list_directory over run_shell_command for file-listing prompts. Make the prompt explicit about which tool to use, matching the approach taken for the MCP tool flow test.
tanzhenxin
left a comment
There was a problem hiding this comment.
Review
Small, focused, and the diagnosis matches the fix in each case. The MCP test correctly addresses the documented race where ToolSearch may legitimately fire before the MCP call, and the monitor test's Promise.all pattern is leak-free because waitForToolCall resolves to false on timeout. Workflow concurrency traced through all four trigger paths cleanly.
Verdict
APPROVE — clean.
pomelo-nwu
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — deepseek-v4-pro via Qwen Code /review
) * test(e2e): stabilize MCP tool message flow * ci(e2e): cancel stale main E2E runs * test(e2e): accept paired MCP tool results * test(e2e): stabilize monitor tool check * test(e2e): stabilize run_shell_command file-listing assertion The model consistently picks list_directory over run_shell_command for file-listing prompts. Make the prompt explicit about which tool to use, matching the approach taken for the MCP tool flow test.
* test(e2e): stabilize MCP tool message flow * ci(e2e): cancel stale main E2E runs * test(e2e): accept paired MCP tool results * test(e2e): stabilize monitor tool check * test(e2e): stabilize run_shell_command file-listing assertion The model consistently picks list_directory over run_shell_command for file-listing prompts. Make the prompt explicit about which tool to use, matching the approach taken for the MCP tool flow test.
Summary
main.tool_usemust bemcp__test-math-server__add, but MCP tools are deferred and ToolSearch can legitimately appear first. The CLImonitortest could miss the tool call when the model finished beforewaitForToolCallstarted, and the CLIrun_shell_commandtest let the model picklist_directoryon some runs and was POSIX-only. Separately, consecutive pushes tomaincurrently keep superseded E2E runs alive, wasting runner time.tool_useand matchingtool_resultbytool_use_id;monitor.test.tsawaitsrig.runandwaitForToolCallin parallel with a 60s timeout;run_shell_command.test.tspins the prompt torun_shell_commandand usesdiron Windows; the workflow concurrency cancels only stalemainpush E2E runs and leavesfeat/e2e/**plusmerge_groupruns independent.Validation
Use the add tool to calculate 2 + 3. You must call the tool. Just give me the result.tool_searchappears before the MCP call.mainE2E pushes cancel older in-progressmainE2E workflow runs, while branch and merge queue E2E runs remain independent.git diff --check,npm run build, YAML parsing, and actionlint passed locally.[API Error: 无效的api key]before any MCP call.npx tsc --noEmit --project integration-tests/tsconfig.jsonfails on existing unrelated integration-test type errors outside this PR.isError === falsewas too strict for a message-flow test, so the test now validates the target call/result pairing without asserting tool execution success in this specific flow test.576bd8e0andcb7059f5were both in progress concurrently before the concurrency change.Scope / Risk
mainE2E runs may be canceled before completion when a newer main commit arrives; this is intentional because the newer run supersedes the older one.Testing Matrix
Testing matrix notes:
npm run buildpassed locally.npx vitestreached the MCP test but was blocked by local model/API key configuration before any MCP tool call.Linked Issues / Bugs