fix(cli): pass --clear to uv venv in rebuild_venv to avoid Windows brick#37895
fix(cli): pass --clear to uv venv in rebuild_venv to avoid Windows brick#37895jackjin1997 wants to merge 1 commit into
Conversation
On Windows, `hermes update` can leave a half-deleted venv that bricks the install. `rebuild_venv` calls `shutil.rmtree(venv_dir, ignore_errors=True)` on the venv that hosts the currently running interpreter — `python.exe` is locked by Windows, so `Scripts/python.exe` survives while `pyvenv.cfg` is deleted. `uv venv` is then called without `--clear` and aborts with "A directory already exists", leaving the venv unusable (no `pyvenv.cfg`) and every subsequent `uv pip install -e .` failing with "No virtual environment found". The CLI itself dies with `ModuleNotFoundError: hermes_cli`. Add `--clear` to the `uv venv` invocation so uv replaces the leftover directory itself. `shutil.rmtree(ignore_errors=True)` is retained as a defense-in-depth fast path for POSIX systems where it succeeds outright. Fixes NousResearch#37881
|
this 1 line change (add --clear) still bricks it. → Rebuilding venv (old Python may lack FTS5)... |
`hermes update` can brick a Windows install. cmd_update guards against
concurrent hermes.exe processes and exits, but `hermes update --force` skips
that guard (and the guard's own message tells the user to pass --force).
Forced past it, rebuild_venv runs while the venv is still in use:
shutil.rmtree(venv_dir, ignore_errors=True) # deletes site-packages +
# certifi, fails on the
# locked python.exe ->
# half-gutted venv
uv venv ... (no --clear) # aborts "directory already
# exists" -> never recreated
The venv is left with no pyvenv.cfg; every later HTTPS call dies with
FileNotFoundError (missing cacert.pem) and there is no recovery.
Move the old venv aside atomically with os.replace instead of deleting it in
place, pass --clear, and restore the moved-aside venv if the rebuild fails.
Verified on Windows 11: os.replace of a venv whose interpreter is live
succeeds (Windows tracks a running .exe by handle), so the rebuild actually
completes while the running gateway keeps using the moved-aside copy until it
restarts -- the update succeeds instead of bricking. If the venv genuinely
cannot be moved, the rebuild aborts cleanly and leaves it fully intact.
Root-cause fix for NousResearch#37881. The --clear-only approaches in NousResearch#37895 / NousResearch#38051
don't cover the real lock scenario: when the locked interpreter is inside the
venv being rebuilt, neither rmtree nor `uv venv --clear` can delete it.
- hermes_cli/managed_uv.py: atomic move-aside + --clear + restore-on-failure
- tests/hermes_cli/test_managed_uv.py: in-use venv left intact with no rebuild
attempted; failed rebuild restored; success path asserts --clear
- website/docs/getting-started/updating.md: document --force venv-rebuild safety
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Closing as already fixed on The Windows venv brick (#37881) was fixed in c136eb4 ("fix(update): harden venv rebuild + verify core deps after install"), which uses the same Thanks for the clean fix and the precise root-cause writeup — you correctly diagnosed the locked- |
What does this PR do?
On Windows,
hermes updatecan permanently brick the install. The fix is a one-line change torebuild_venv: pass--cleartouv venvso it cleans up a residual venv directory thatshutil.rmtree(ignore_errors=True)couldn't fully delete because the running interpreter has its files locked.Related Issue
Fixes #37881
Type of Change
Changes Made
hermes_cli/managed_uv.py: Add--clearto theuv venvinvocation inrebuild_venv(line ~114). The leadingshutil.rmtree(venv_dir, ignore_errors=True)is retained as a fast path for POSIX systems where it succeeds outright;--clearis the Windows-safe guarantee that uv will replace the directory regardless of what rmtree left behind.tests/hermes_cli/test_managed_uv.py: Add 2 regression tests toTestRebuildVenv:test_uv_venv_called_with_clear_flag— verifies the subprocess command list contains--cleartest_rebuild_succeeds_when_rmtree_leaves_residue— models the Windows lock scenario:shutil.rmtreeis a no-op (file lock eaten byignore_errors=True), venv directory and residual files survive, butrebuild_venvstill returnsTruebecause--clearlets uv replace the directory.How to Test
All 17 tests pass (15 existing + 2 new). The new regression tests directly verify the
--clearflag is present and that the rebuild succeeds even when the directory cannot be fully removed beforehand.Reproducing the original brick scenario requires Windows with an existing venv whose Python predates managed uv (so
fresh_bootstrap = Truetriggers therebuild_venvpath).Root Cause (from issue #37881)
rebuild_venvcallsshutil.rmtree(venv_dir, ignore_errors=True)on the venv containing the running interpreter.Scripts\python.exe(the process drivinghermes update) is locked →shutil.rmtreepartially succeeds (pyvenv.cfgdeleted) but the directory andpython.exesurvive. The error is silently swallowed byignore_errors=True.uv venv <dir> --python 3.11is called without--clearand aborts: "A directory already exists at: …\venv". uv's own hint even suggests "Use the --clear flag or set UV_VENV_CLEAR=1".uv pip install -e .calls (git path and ZIP fallback) fail with "No virtual environment found" becausepyvenv.cfgis gone.ModuleNotFoundError: No module named 'hermes_cli'. Re-runninghermes updaterepeats step 5 indefinitely.Checklist
Code
hermes_cli/managed_uv.pysimple_term_menu, no\033[K, no cross-toolset schema refs, no cache-breaking, no hardcoded~/.hermespathsTests
TestRebuildVenv/TestUpdateManagedUvstill passAI Disclosure
This fix was diagnosed and implemented with AI assistance based on the reporter's detailed root cause analysis.