Skip to content

fix(cli): pass --clear to uv venv in rebuild_venv to avoid Windows brick#37895

Closed
jackjin1997 wants to merge 1 commit into
NousResearch:mainfrom
jackjin1997:fix/managed-uv-rebuild-clear
Closed

fix(cli): pass --clear to uv venv in rebuild_venv to avoid Windows brick#37895
jackjin1997 wants to merge 1 commit into
NousResearch:mainfrom
jackjin1997:fix/managed-uv-rebuild-clear

Conversation

@jackjin1997

Copy link
Copy Markdown
Contributor

What does this PR do?

On Windows, hermes update can permanently brick the install. The fix is a one-line change to rebuild_venv: pass --clear to uv venv so it cleans up a residual venv directory that shutil.rmtree(ignore_errors=True) couldn't fully delete because the running interpreter has its files locked.

Related Issue

Fixes #37881

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

  • hermes_cli/managed_uv.py: Add --clear to the uv venv invocation in rebuild_venv (line ~114). The leading shutil.rmtree(venv_dir, ignore_errors=True) is retained as a fast path for POSIX systems where it succeeds outright; --clear is the Windows-safe guarantee that uv will replace the directory regardless of what rmtree left behind.
  • tests/hermes_cli/test_managed_uv.py: Add 2 regression tests to TestRebuildVenv:
    • test_uv_venv_called_with_clear_flag — verifies the subprocess command list contains --clear
    • test_rebuild_succeeds_when_rmtree_leaves_residue — models the Windows lock scenario: shutil.rmtree is a no-op (file lock eaten by ignore_errors=True), venv directory and residual files survive, but rebuild_venv still returns True because --clear lets uv replace the directory.

How to Test

pytest tests/hermes_cli/test_managed_uv.py -v

All 17 tests pass (15 existing + 2 new). The new regression tests directly verify the --clear flag is present and that the rebuild succeeds even when the directory cannot be fully removed beforehand.

Reproducing the original brick scenario requires Windows with an existing venv whose Python predates managed uv (so fresh_bootstrap = True triggers the rebuild_venv path).

Root Cause (from issue #37881)

  1. rebuild_venv calls shutil.rmtree(venv_dir, ignore_errors=True) on the venv containing the running interpreter.
  2. On Windows, Scripts\python.exe (the process driving hermes update) is locked → shutil.rmtree partially succeeds (pyvenv.cfg deleted) but the directory and python.exe survive. The error is silently swallowed by ignore_errors=True.
  3. uv venv <dir> --python 3.11 is called without --clear and aborts: "A directory already exists at: …\venv". uv's own hint even suggests "Use the --clear flag or set UV_VENV_CLEAR=1".
  4. Subsequent uv pip install -e . calls (git path and ZIP fallback) fail with "No virtual environment found" because pyvenv.cfg is gone.
  5. The CLI is dead: ModuleNotFoundError: No module named 'hermes_cli'. Re-running hermes update repeats step 5 indefinitely.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits
  • I searched for existing PRs covering this issue (none found at PR creation time)
  • My code follows the existing style in hermes_cli/managed_uv.py
  • No simple_term_menu, no \033[K, no cross-toolset schema refs, no cache-breaking, no hardcoded ~/.hermes paths
  • Tool handlers (N/A — this is CLI plumbing) still return JSON strings

Tests

  • Added 2 regression tests covering the fix
  • Existing 15 tests in TestRebuildVenv / TestUpdateManagedUv still pass

AI Disclosure

This fix was diagnosed and implemented with AI assistance based on the reporter's detailed root cause analysis.

On Windows, `hermes update` can leave a half-deleted venv that bricks the
install. `rebuild_venv` calls `shutil.rmtree(venv_dir, ignore_errors=True)`
on the venv that hosts the currently running interpreter — `python.exe` is
locked by Windows, so `Scripts/python.exe` survives while `pyvenv.cfg` is
deleted. `uv venv` is then called without `--clear` and aborts with
"A directory already exists", leaving the venv unusable (no `pyvenv.cfg`)
and every subsequent `uv pip install -e .` failing with "No virtual
environment found". The CLI itself dies with `ModuleNotFoundError:
hermes_cli`.

Add `--clear` to the `uv venv` invocation so uv replaces the leftover
directory itself. `shutil.rmtree(ignore_errors=True)` is retained as a
defense-in-depth fast path for POSIX systems where it succeeds outright.

Fixes NousResearch#37881
@alt-glitch alt-glitch added type/bug Something isn't working comp/cli CLI entry point, hermes_cli/, setup wizard area/config Config system, migrations, profiles P1 High — major feature broken, no workaround labels Jun 3, 2026
@rdnot

rdnot commented Jun 3, 2026

Copy link
Copy Markdown

this 1 line change (add --clear) still bricks it.

→ Rebuilding venv (old Python may lack FTS5)...
✗ venv rebuild failed: Using CPython 3.11.15
Creating virtual environment at: AppData\Local\hermes\hermes-agent\venv
warning: The --clear option will remove the existing directory at AppData\Local\hermes\hermes-agent\venv even though it is not a virtual environment. This will become an error in a future release. Use --force to suppress this warning, or --preview-features venv-safe-clear to error on this now.
error: Failed to create virtual environment
Caused by: failed to remove directory \\?\C:\Users\xxxx\AppData\Local\hermes\hermes-agent\venv\Scripts: Access is denied. (os error 5)

lEWFkRAD pushed a commit to lEWFkRAD/hermes-agent that referenced this pull request Jun 3, 2026
`hermes update` can brick a Windows install. cmd_update guards against
concurrent hermes.exe processes and exits, but `hermes update --force` skips
that guard (and the guard's own message tells the user to pass --force).
Forced past it, rebuild_venv runs while the venv is still in use:

    shutil.rmtree(venv_dir, ignore_errors=True)   # deletes site-packages +
                                                  # certifi, fails on the
                                                  # locked python.exe ->
                                                  # half-gutted venv
    uv venv ...            (no --clear)           # aborts "directory already
                                                  # exists" -> never recreated

The venv is left with no pyvenv.cfg; every later HTTPS call dies with
FileNotFoundError (missing cacert.pem) and there is no recovery.

Move the old venv aside atomically with os.replace instead of deleting it in
place, pass --clear, and restore the moved-aside venv if the rebuild fails.
Verified on Windows 11: os.replace of a venv whose interpreter is live
succeeds (Windows tracks a running .exe by handle), so the rebuild actually
completes while the running gateway keeps using the moved-aside copy until it
restarts -- the update succeeds instead of bricking. If the venv genuinely
cannot be moved, the rebuild aborts cleanly and leaves it fully intact.

Root-cause fix for NousResearch#37881. The --clear-only approaches in NousResearch#37895 / NousResearch#38051
don't cover the real lock scenario: when the locked interpreter is inside the
venv being rebuilt, neither rmtree nor `uv venv --clear` can delete it.

- hermes_cli/managed_uv.py: atomic move-aside + --clear + restore-on-failure
- tests/hermes_cli/test_managed_uv.py: in-use venv left intact with no rebuild
  attempted; failed rebuild restored; success path asserts --clear
- website/docs/getting-started/updating.md: document --force venv-rebuild safety

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@teknium1

teknium1 commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Closing as already fixed on main.

The Windows venv brick (#37881) was fixed in c136eb4 ("fix(update): harden venv rebuild + verify core deps after install"), which uses the same --clear mechanism you identified — implemented as a conditional retry (only retries with --clear when uv's stderr reports "already exists", so it doesn't mask real failures like disk-full or interpreter-download errors), plus a post-install dependency-verification pass as belt-and-suspenders.

Thanks for the clean fix and the precise root-cause writeup — you correctly diagnosed the locked-python.exe / rmtree interaction. Credit to you for independently landing on the right mechanism.

@teknium1 teknium1 closed this Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/config Config system, migrations, profiles comp/cli CLI entry point, hermes_cli/, setup wizard P1 High — major feature broken, no workaround type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: hermes update bricks the install on Windows — venv rebuild leaves a venv with no pyvenv.cfg, then ModuleNotFoundError: hermes_cli

4 participants