Skip to content

fix: clean stale MCP servers on install/update/uninstall and prevent .claude folder creation#201

Merged
danielmeppiel merged 11 commits intomicrosoft:mainfrom
sergio-sisternes-epam:fix/review-install-update
Mar 9, 2026
Merged

fix: clean stale MCP servers on install/update/uninstall and prevent .claude folder creation#201
danielmeppiel merged 11 commits intomicrosoft:mainfrom
sergio-sisternes-epam:fix/review-install-update

Conversation

@sergio-sisternes-epam
Copy link
Collaborator

@sergio-sisternes-epam sergio-sisternes-epam commented Mar 8, 2026

Summary

Fixes two bugs in the install/update/uninstall lifecycle in complex package scenarios:

  1. Stale MCP servers persist after updates or uninstalls — When an upstream package renames or removes an MCP server, the old entry was left behind in .vscode/mcp.json and Copilot CLI config.
  2. .claude/ folder created in VS Code-only projects — During uninstall, the re-integration phase called Claude-specific integrators unconditionally, creating .claude/commands/ and .claude/agents/ directories even when the project only targets VS Code.

Root Causes & Fixes

Stale MCP servers (3 sub-issues)

Sub-issue Root cause Fix
No tracking No record of which MCP servers APM previously installed Added mcp_servers field to lockfile to track managed server names
Parse cache stale on --update _apm_yml_cache in apm_package.py returned old parsed data after --update overwrote module files Call clear_apm_yml_cache() after _install_apm_dependencies when update=True
Wrong read ordering old_mcp_servers was read from lockfile after _install_apm_dependencies regenerated it (losing the mcp_servers field) Moved the read to before _install_apm_dependencies

.claude folder in VS Code projects

Root cause Fix
uninstall re-integration called command_integrator.integrate_package_commands(), hook_integrator.integrate_package_hooks_claude(), and agent_integrator.integrate_package_agents_claude() without checking the integration target Added detect_target / should_integrate_claude guard before the re-integration loop; Claude-specific calls now only run when integrate_claude=True

Changes

  • src/apm_cli/cli.py (+162 lines)
    • New helpers: _get_mcp_dep_names(), _remove_stale_mcp_servers(), _update_lockfile_mcp_servers()
    • install(): reads old MCP set before dep install, clears parse cache on update, computes diff, removes stale servers
    • uninstall(): target detection before re-integration, MCP recomputation and stale cleanup
  • src/apm_cli/deps/lockfile.py (+4 lines)
    • Added mcp_servers: List[str] field to LockFile dataclass with serialization support

Testing

Full end-to-end test cycle verified:

Step Result
Fresh install (14 deps, 4 MCP servers) ✅ No .claude/, correct lockfile
Update with MCP rename (githubgithub-http) ✅ Stale github removed
Update with MCP revert (github-httpgithub) ✅ Stale github-http removed
Incremental install (xxxxx xxxxx, +17 deps, +3 MCP) ✅ 31 deps, 7 MCP servers
Uninstall (xxxxx) ✅ Back to 14 deps, 4 MCP servers, 3 stale removed
No .claude/ at any step

….claude folder creation

- Track managed MCP servers in lockfile (mcp_servers field) to detect stale entries
- Remove stale MCP servers from .vscode/mcp.json and Copilot CLI config on update
- Clear apm_yml parse cache after --update to avoid stale MCP dependency names
- Read old MCP server set before lockfile regeneration to preserve diff baseline
- Guard Claude-specific re-integration in uninstall with target detection
- Recompute MCP deps from remaining packages on uninstall and clean stale servers

Fixes: .claude folder appearing in VS Code-only projects after uninstall,
stale MCP servers persisting after package updates or removals.
@sergio-sisternes-epam sergio-sisternes-epam marked this pull request as ready for review March 8, 2026 10:47
Copilot AI review requested due to automatic review settings March 8, 2026 10:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses install/update/uninstall lifecycle bugs by tracking APM-managed MCP servers in apm.lock, removing stale MCP server entries after dependency changes, and preventing Claude-specific re-integration from creating .claude/ artifacts in VS Code-only projects.

Changes:

  • Extend apm.lock schema with mcp_servers and wire it into YAML serialization/deserialization.
  • Update install to capture prior MCP state, clear apm.yml parse cache on --update, compute stale MCP server diffs, remove stale entries, and persist the new set.
  • Update uninstall re-integration to guard Claude-specific integration via target detection, and add best-effort stale MCP cleanup after uninstall.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
src/apm_cli/deps/lockfile.py Adds mcp_servers tracking to the lockfile format to support stale MCP cleanup.
src/apm_cli/cli.py Implements MCP stale removal + lockfile updates during install/uninstall, and guards Claude re-integration based on detected target.

You can also share your feedback on Copilot code review. Take the survey.

@sergio-sisternes-epam sergio-sisternes-epam self-assigned this Mar 8, 2026
- Clarify stale_names docstring: short server names, not full refs
- Make _remove_stale_mcp_servers runtime-aware (respect --runtime/--exclude)
- Add Codex CLI (~/.codex/config.toml) stale server cleanup
- Add lockfile mcp_servers round-trip unit tests (3 new tests)
- Document mcp_servers field in manifest-schema.md and dependencies.md
- Preserve mcp_servers in lockfile when --only=apm is used
@sergio-sisternes-epam sergio-sisternes-epam marked this pull request as draft March 8, 2026 11:22
…ckages

When running 'apm install <pkg>', the lockfile was regenerated from only
the filtered package set, wiping all other dependencies and mcp_servers.

Now merges new entries into the existing lockfile via LockFile.read() +
add_dependency() when only_packages is set.
@sergio-sisternes-epam
Copy link
Collaborator Author

sergio-sisternes-epam commented Mar 8, 2026

Additional fix: lockfile wipe when installing individual packages (fbfd627)

During end-to-end testing I discovered a pre-existing bug: running apm install <pkg> (with a specific package argument) would wipe the entire apm.lock — all other dependencies and mcp_servers entries were lost.

Root cause: _install_apm_dependencies() filters deps_to_install down to only the requested package(s) when only_packages is set. Then LockFile.from_installed_packages() regenerates the lockfile from that single result, discarding everything else.

Fix: When only_packages is set, the new code reads the existing lockfile and merges new/updated entries into it via LockFile.read() + add_dependency(), preserving all pre-existing dependencies and mcp_servers.

Validated: Ran apm install --trust-transitive-mcp .../general/xxxxxx on a project with 14 dependencies + 7 MCP servers — all entries preserved after install.

…ame collision

The bare set() call was invoking Click's 'config set' command instead of
creating a Python set, causing 'Got unexpected extra argument' errors
after MCP server installation.
@sergio-sisternes-epam
Copy link
Collaborator Author

sergio-sisternes-epam commented Mar 8, 2026

Fix: builtins.set() collision in _remove_stale_mcp_servers (8abe356)

When implementing review comment 2 (make _remove_stale_mcp_servers runtime-aware), we introduced a bare set(all_runtimes) call at line 2791. In this codebase, set is shadowed by Click's config set subcommand, so the call was dispatching into Click instead of creating a Python set — producing:

Usage: apm [OPTIONS] KEY VALUE
Error: Got unexpected extra argument (copilot)

Every other set() in the file already uses builtins.set() to avoid this collision. Fixed the one we missed.

@danielmeppiel danielmeppiel marked this pull request as ready for review March 8, 2026 13:15
@sergio-sisternes-epam sergio-sisternes-epam marked this pull request as draft March 8, 2026 19:32
Thorough integration testing uncovered that `apm install <package>`
failed to collect MCP servers (and other primitives) from packages
deeper than one level in the dependency tree.

Root cause: the only_packages filter in _install_apm_dependencies()
kept only the explicitly named package, discarding all its transitive
dependencies from deps_to_install. Because the lockfile is built from
installed_packages, those transitive deps were never recorded and
_collect_transitive_mcp_deps() only scans packages present in the
lockfile, so their MCP server definitions were silently skipped.

Fix: after building the identity set from user-supplied specs, walk
the dependency tree to expand the set with all transitive descendants
of the requested packages before filtering deps_to_install.

Validation:
- 436 related unit tests pass (install, dependency, transitive, mcp,
  lockfile keywords)
- Manual integration test: installing a squad package that depends on
  an infrastructure package with MCP servers now correctly collects
  and configures those servers (7 vs 4 previously)
- Uninstall correctly removes the transitive MCP servers when the
  parent package is removed
@sergio-sisternes-epam
Copy link
Collaborator Author

@microsoft-github-policy-service agree

@sergio-sisternes-epam
Copy link
Collaborator Author

Additional fix: transitive deps excluded by only_packages filter

During thorough integration testing of the install/uninstall cycle, I uncovered an additional bug in the same code path.

Problem: When running apm install <package>, if the target package declares mcp: [] but depends on another package that defines MCP servers, those transitive MCP servers were never collected or configured.

Root cause: The only_packages filter in _install_apm_dependencies() kept only the explicitly named package in deps_to_install, discarding all its transitive dependencies. Since the lockfile is built from installed_packages, those transitive deps were never recorded — and _collect_transitive_mcp_deps() only scans packages present in the lockfile, so their MCP server definitions were silently skipped.

Fix: After building the identity set from user-supplied specs, walk the dependency tree to expand the set with all transitive descendants of the requested packages before applying the filter. (+18 lines in _install_apm_dependencies)

Validation:

  • 436 related unit tests pass (install, dependency, transitive, mcp, lockfile)
  • Integration test: installing a squad package that depends on an infrastructure package with MCP servers now correctly collects and configures those servers (7 vs 4 previously)
  • Uninstall correctly removes the transitive MCP servers when the parent package is removed

Pushed as f644317.

Cover 7 scenarios with 15 tests exercising the full MCP chain:
- Selective install with transitive MCP dep collection
- Orphan package exclusion via lockfile scoping
- Stale server removal from .vscode/mcp.json after uninstall
- Lockfile mcp_servers list update and clearance
- MCP server rename detection (stale old + new present)
- MCP removal detection and cleanup
- Root vs transitive deduplication (root wins)
- Virtual-path package MCP collection
- Self-defined server trust_private gating

All use synthetic package names (no private identifiers).
…ckage install

Coverage hardening for the transitive MCP fix:
- Unit: deep chain (A→B→C→D), diamond dependency, MCP at every level
- Integration: deep chain, diamond, multi-package selective install
  through CLI CliRunner

All 27 MCP lifecycle tests pass (19 unit + 8 integration).
@sergio-sisternes-epam
Copy link
Collaborator Author

Test coverage hardening (938a388)

Added edge-case tests to close the gaps that motivated this PR:

Unit tests (tests/unit/test_mcp_lifecycle_e2e.py — 4 new tests):

Scenario What it validates
TestDeepTransitiveChainMCP A→B→C→D chain: MCP at depth 4 is collected; MCP at every level is collected
TestDiamondDependencyMCP A→{B,C}→D diamond: shared leaf MCP deduplicates correctly; branch + leaf MCP all present

Integration tests (tests/integration/test_selective_install_mcp.py — 8 tests total, 3 new):

Scenario What it validates
TestDeepChainIntegration CLI apm install acme/pkg-a with 4-level chain records depth-4 MCP in lockfile
TestDiamondDependencyIntegration CLI diamond install: shared MCP appears once, no duplicates in lockfile
TestMultiPackageSelectiveInstallIntegration CLI apm install pkg-x pkg-y: both transitive MCP trees merged correctly

These specifically exercise the _collect_descendants / only_identities expansion logic added in the fix — without the fix, all three integration tests fail because transitive deps get filtered out of deps_to_install.

Full suite: 27 MCP lifecycle tests pass (19 unit + 8 integration), 1707 total pass with no regressions.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.


You can also share your feedback on Copilot code review. Take the survey.

Address Copilot review: use single quotes inside f-strings for
getattr(result, 'stderr', '') to improve readability and
compatibility with Python < 3.12.
…nd uninstall race

- docs: update mcp_servers field description from 'short names' to
  'MCP dependency references' in manifest-schema.md and dependencies.md
- fix: _remove_stale_mcp_servers now expands stale names to include
  last-segment variants so Copilot CLI and Codex config keys (derived
  from server_url.split('/')[-1]) are matched correctly
- fix: uninstall captures old_mcp_servers before lockfile mutation so
  stale-MCP cleanup works even when all deps are removed and the
  lockfile is deleted
- test: add TestStaleCleanupKeyNormalization with 3 tests covering
  last-segment matching, full-ref matching, and short-name fallback
Copy link
Collaborator

@danielmeppiel danielmeppiel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid work, Sergio. The bugs are real, the root-cause analysis is thorough, the fixes are correct, and the test coverage is impressive (27 MCP lifecycle tests covering deep chains, diamonds, dedup, stale cleanup, --only=apm preservation, and key normalization).

A few non-blocking observations for future hardening — I've opened #209 to track them:

  • Silent except Exception: pass — The best-effort pattern makes sense for cleanup, but logger.debug() in those blocks would help when debugging issues in the field.
  • _collect_descendants cycle guard — A visited set would be cheap insurance against stack overflow if a cycle ever slips through the resolver.
  • Lockfile double read-modify-write_update_lockfile_mcp_servers re-reads the lockfile that _install_apm_dependencies just wrote. Ordering is correct today but fragile; worth a comment or passing the object through.
  • cli.py size — The three new helpers are good candidates to extract into mcp_lifecycle.py as MCP features grow.

None of these block the merge. CI is green, all review comments addressed. Thanks for the diligent E2E testing and the clear commit-by-commit narrative.

@danielmeppiel danielmeppiel merged commit 0e23d2e into microsoft:main Mar 9, 2026
7 checks passed
@sergio-sisternes-epam sergio-sisternes-epam deleted the fix/review-install-update branch March 9, 2026 16:46
@sergio-sisternes-epam sergio-sisternes-epam restored the fix/review-install-update branch March 9, 2026 16:46
@sergio-sisternes-epam sergio-sisternes-epam deleted the fix/review-install-update branch March 9, 2026 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants