Skip to content

fix(update): abort after managed uv venv rebuild failure#38511

Closed
f3rs3n wants to merge 1 commit into
NousResearch:mainfrom
f3rs3n:fix/managed-uv-venv-rebuild
Closed

fix(update): abort after managed uv venv rebuild failure#38511
f3rs3n wants to merge 1 commit into
NousResearch:mainfrom
f3rs3n:fix/managed-uv-venv-rebuild

Conversation

@f3rs3n

@f3rs3n f3rs3n commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Summary

  • recreate managed-uv venvs with uv venv --clear so partial/existing venv dirs are replaced safely
  • verify the rebuilt venv interpreter exists before reporting success
  • abort both update dependency-install paths if the fresh managed-uv venv rebuild fails

Relationship to existing PRs

There are already related open PRs for #37881:

This PR overlaps on the --clear piece, but adds two guards that are still useful with either approach:

  1. rebuild_venv() must not report success unless the expected venv interpreter exists.
  2. update must abort before dependency installation when rebuild_venv() returns False.

If maintainers prefer #38397 as the main fix, these call-site/interpreter checks can be folded into that PR instead.

Test Plan

  • venv/bin/python -m pytest tests/hermes_cli/test_managed_uv.py -q
  • venv/bin/python -m pytest tests/hermes_cli/test_update_autostash.py::test_cmd_update_aborts_when_fresh_managed_uv_rebuild_fails tests/hermes_cli/test_update_autostash.py::test_cmd_update_retries_optional_extras_individually_when_all_fails tests/hermes_cli/test_update_autostash.py::test_cmd_update_succeeds_with_extras -q
  • venv/bin/python -m pytest tests/hermes_cli/test_cmd_update.py tests/hermes_cli/test_uv_tool_update.py tests/hermes_cli/test_update_autostash.py -q

Note: I also tried the full tests/hermes_cli -q; it currently has unrelated environment/order/config-sensitive failures on this machine, while the targeted updater suites above pass.

@alt-glitch alt-glitch added type/bug Something isn't working comp/cli CLI entry point, hermes_cli/, setup wizard python:uv Pull requests that update python:uv code P1 High — major feature broken, no workaround labels Jun 3, 2026
lEWFkRAD pushed a commit to lEWFkRAD/hermes-agent that referenced this pull request Jun 3, 2026
Fold in the call-site guards from NousResearch#38511 (@f3rs3n) so they compose with the
move-aside fix from the previous commit:

- rebuild_venv() no longer reports success if `uv venv` exits 0 without
  producing an interpreter; it restores the moved-aside venv and returns False.
- both `hermes update` venv-rebuild call sites now abort (RuntimeError) instead
  of silently continuing into dependency install when rebuild_venv() returns
  False.

Tests: rebuild_venv returns False when the rebuilt interpreter is missing;
cmd_update raises and skips pip install when the fresh managed-uv rebuild fails.

Co-Authored-By: f3rs3n <32328813+f3rs3n@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@teknium1

teknium1 commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Thanks — the interpreter-exists guard and abort-on-failure call-site change were the right instincts.

Already fixed on main by #38887 ("fix(update): harden venv rebuild + verify core deps after install"): it adds the --clear retry in rebuild_venv() and a post-install _verify_core_dependencies check that aborts on partial installs. Closing as redundant. Thanks for contributing!

@teknium1 teknium1 closed this Jun 4, 2026
ethernet8023 pushed a commit that referenced this pull request Jun 4, 2026
…build

hermes update can brick a Windows install. When 'hermes update --force' runs
past the concurrent-process guard, rebuild_venv runs while the venv is still in
use: shutil.rmtree(ignore_errors=True) deletes site-packages + certifi's cert
bundle but can't remove the locked python.exe, leaving a half-gutted venv that
uv venv then refuses to overwrite. Every later HTTPS call dies with
FileNotFoundError for the missing cacert and there is no recovery.

--clear alone (the c136eb4 retry path) does not fix the real lock case: when
the locked interpreter is *inside* the venv being rebuilt, neither rmtree nor
uv venv --clear can delete it. os.replace of the parent directory *is* allowed
on Windows (a running .exe is tracked by handle, not path), so we move the old
venv aside atomically to <venv>.old, rebuild with --clear in its place, and the
still-running gateway/desktop keep using the moved-aside copy until they
restart. If the venv genuinely can't be moved, we abort cleanly and leave it
fully intact; if the rebuild fails, we restore the moved-aside copy.

Folds in the call-site guards from #38511 (@f3rs3n):
- rebuild_venv() returns False (and restores the backup) if uv exits 0 without
  producing an interpreter.
- both hermes update venv-rebuild call sites abort with RuntimeError instead of
  continuing into dependency install when rebuild_venv() returns False.

Also gitignore /venv.old/ so the update autostash (git stash --include-untracked)
doesn't sweep the moved-aside venv on every run.

Root-cause fix for #37881. Supersedes the --clear-only retry from c136eb4.

Co-authored-by: f3rs3n <32328813+f3rs3n@users.noreply.github.com>
davidgut1982 pushed a commit to davidgut1982/hermes-agent that referenced this pull request Jun 5, 2026
…build

hermes update can brick a Windows install. When 'hermes update --force' runs
past the concurrent-process guard, rebuild_venv runs while the venv is still in
use: shutil.rmtree(ignore_errors=True) deletes site-packages + certifi's cert
bundle but can't remove the locked python.exe, leaving a half-gutted venv that
uv venv then refuses to overwrite. Every later HTTPS call dies with
FileNotFoundError for the missing cacert and there is no recovery.

--clear alone (the c136eb4 retry path) does not fix the real lock case: when
the locked interpreter is *inside* the venv being rebuilt, neither rmtree nor
uv venv --clear can delete it. os.replace of the parent directory *is* allowed
on Windows (a running .exe is tracked by handle, not path), so we move the old
venv aside atomically to <venv>.old, rebuild with --clear in its place, and the
still-running gateway/desktop keep using the moved-aside copy until they
restart. If the venv genuinely can't be moved, we abort cleanly and leave it
fully intact; if the rebuild fails, we restore the moved-aside copy.

Folds in the call-site guards from NousResearch#38511 (@f3rs3n):
- rebuild_venv() returns False (and restores the backup) if uv exits 0 without
  producing an interpreter.
- both hermes update venv-rebuild call sites abort with RuntimeError instead of
  continuing into dependency install when rebuild_venv() returns False.

Also gitignore /venv.old/ so the update autostash (git stash --include-untracked)
doesn't sweep the moved-aside venv on every run.

Root-cause fix for NousResearch#37881. Supersedes the --clear-only retry from c136eb4.

Co-authored-by: f3rs3n <32328813+f3rs3n@users.noreply.github.com>
changman pushed a commit to changman/hermes-agent that referenced this pull request Jun 10, 2026
…build

hermes update can brick a Windows install. When 'hermes update --force' runs
past the concurrent-process guard, rebuild_venv runs while the venv is still in
use: shutil.rmtree(ignore_errors=True) deletes site-packages + certifi's cert
bundle but can't remove the locked python.exe, leaving a half-gutted venv that
uv venv then refuses to overwrite. Every later HTTPS call dies with
FileNotFoundError for the missing cacert and there is no recovery.

--clear alone (the c136eb4 retry path) does not fix the real lock case: when
the locked interpreter is *inside* the venv being rebuilt, neither rmtree nor
uv venv --clear can delete it. os.replace of the parent directory *is* allowed
on Windows (a running .exe is tracked by handle, not path), so we move the old
venv aside atomically to <venv>.old, rebuild with --clear in its place, and the
still-running gateway/desktop keep using the moved-aside copy until they
restart. If the venv genuinely can't be moved, we abort cleanly and leave it
fully intact; if the rebuild fails, we restore the moved-aside copy.

Folds in the call-site guards from NousResearch#38511 (@f3rs3n):
- rebuild_venv() returns False (and restores the backup) if uv exits 0 without
  producing an interpreter.
- both hermes update venv-rebuild call sites abort with RuntimeError instead of
  continuing into dependency install when rebuild_venv() returns False.

Also gitignore /venv.old/ so the update autostash (git stash --include-untracked)
doesn't sweep the moved-aside venv on every run.

Root-cause fix for NousResearch#37881. Supersedes the --clear-only retry from c136eb4.

Co-authored-by: f3rs3n <32328813+f3rs3n@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cli CLI entry point, hermes_cli/, setup wizard P1 High — major feature broken, no workaround python:uv Pull requests that update python:uv code type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants