fix(update): abort after managed uv venv rebuild failure#38511
Closed
f3rs3n wants to merge 1 commit into
Closed
Conversation
lEWFkRAD
pushed a commit
to lEWFkRAD/hermes-agent
that referenced
this pull request
Jun 3, 2026
Fold in the call-site guards from NousResearch#38511 (@f3rs3n) so they compose with the move-aside fix from the previous commit: - rebuild_venv() no longer reports success if `uv venv` exits 0 without producing an interpreter; it restores the moved-aside venv and returns False. - both `hermes update` venv-rebuild call sites now abort (RuntimeError) instead of silently continuing into dependency install when rebuild_venv() returns False. Tests: rebuild_venv returns False when the rebuilt interpreter is missing; cmd_update raises and skips pip install when the fresh managed-uv rebuild fails. Co-Authored-By: f3rs3n <32328813+f3rs3n@users.noreply.github.com> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
13 tasks
Contributor
|
Thanks — the interpreter-exists guard and abort-on-failure call-site change were the right instincts. Already fixed on |
ethernet8023
pushed a commit
that referenced
this pull request
Jun 4, 2026
…build hermes update can brick a Windows install. When 'hermes update --force' runs past the concurrent-process guard, rebuild_venv runs while the venv is still in use: shutil.rmtree(ignore_errors=True) deletes site-packages + certifi's cert bundle but can't remove the locked python.exe, leaving a half-gutted venv that uv venv then refuses to overwrite. Every later HTTPS call dies with FileNotFoundError for the missing cacert and there is no recovery. --clear alone (the c136eb4 retry path) does not fix the real lock case: when the locked interpreter is *inside* the venv being rebuilt, neither rmtree nor uv venv --clear can delete it. os.replace of the parent directory *is* allowed on Windows (a running .exe is tracked by handle, not path), so we move the old venv aside atomically to <venv>.old, rebuild with --clear in its place, and the still-running gateway/desktop keep using the moved-aside copy until they restart. If the venv genuinely can't be moved, we abort cleanly and leave it fully intact; if the rebuild fails, we restore the moved-aside copy. Folds in the call-site guards from #38511 (@f3rs3n): - rebuild_venv() returns False (and restores the backup) if uv exits 0 without producing an interpreter. - both hermes update venv-rebuild call sites abort with RuntimeError instead of continuing into dependency install when rebuild_venv() returns False. Also gitignore /venv.old/ so the update autostash (git stash --include-untracked) doesn't sweep the moved-aside venv on every run. Root-cause fix for #37881. Supersedes the --clear-only retry from c136eb4. Co-authored-by: f3rs3n <32328813+f3rs3n@users.noreply.github.com>
davidgut1982
pushed a commit
to davidgut1982/hermes-agent
that referenced
this pull request
Jun 5, 2026
…build hermes update can brick a Windows install. When 'hermes update --force' runs past the concurrent-process guard, rebuild_venv runs while the venv is still in use: shutil.rmtree(ignore_errors=True) deletes site-packages + certifi's cert bundle but can't remove the locked python.exe, leaving a half-gutted venv that uv venv then refuses to overwrite. Every later HTTPS call dies with FileNotFoundError for the missing cacert and there is no recovery. --clear alone (the c136eb4 retry path) does not fix the real lock case: when the locked interpreter is *inside* the venv being rebuilt, neither rmtree nor uv venv --clear can delete it. os.replace of the parent directory *is* allowed on Windows (a running .exe is tracked by handle, not path), so we move the old venv aside atomically to <venv>.old, rebuild with --clear in its place, and the still-running gateway/desktop keep using the moved-aside copy until they restart. If the venv genuinely can't be moved, we abort cleanly and leave it fully intact; if the rebuild fails, we restore the moved-aside copy. Folds in the call-site guards from NousResearch#38511 (@f3rs3n): - rebuild_venv() returns False (and restores the backup) if uv exits 0 without producing an interpreter. - both hermes update venv-rebuild call sites abort with RuntimeError instead of continuing into dependency install when rebuild_venv() returns False. Also gitignore /venv.old/ so the update autostash (git stash --include-untracked) doesn't sweep the moved-aside venv on every run. Root-cause fix for NousResearch#37881. Supersedes the --clear-only retry from c136eb4. Co-authored-by: f3rs3n <32328813+f3rs3n@users.noreply.github.com>
changman
pushed a commit
to changman/hermes-agent
that referenced
this pull request
Jun 10, 2026
…build hermes update can brick a Windows install. When 'hermes update --force' runs past the concurrent-process guard, rebuild_venv runs while the venv is still in use: shutil.rmtree(ignore_errors=True) deletes site-packages + certifi's cert bundle but can't remove the locked python.exe, leaving a half-gutted venv that uv venv then refuses to overwrite. Every later HTTPS call dies with FileNotFoundError for the missing cacert and there is no recovery. --clear alone (the c136eb4 retry path) does not fix the real lock case: when the locked interpreter is *inside* the venv being rebuilt, neither rmtree nor uv venv --clear can delete it. os.replace of the parent directory *is* allowed on Windows (a running .exe is tracked by handle, not path), so we move the old venv aside atomically to <venv>.old, rebuild with --clear in its place, and the still-running gateway/desktop keep using the moved-aside copy until they restart. If the venv genuinely can't be moved, we abort cleanly and leave it fully intact; if the rebuild fails, we restore the moved-aside copy. Folds in the call-site guards from NousResearch#38511 (@f3rs3n): - rebuild_venv() returns False (and restores the backup) if uv exits 0 without producing an interpreter. - both hermes update venv-rebuild call sites abort with RuntimeError instead of continuing into dependency install when rebuild_venv() returns False. Also gitignore /venv.old/ so the update autostash (git stash --include-untracked) doesn't sweep the moved-aside venv on every run. Root-cause fix for NousResearch#37881. Supersedes the --clear-only retry from c136eb4. Co-authored-by: f3rs3n <32328813+f3rs3n@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
uv venv --clearso partial/existing venv dirs are replaced safelyRelationship to existing PRs
There are already related open PRs for #37881:
--cleartouv venvThis PR overlaps on the
--clearpiece, but adds two guards that are still useful with either approach:rebuild_venv()must not report success unless the expected venv interpreter exists.rebuild_venv()returnsFalse.If maintainers prefer #38397 as the main fix, these call-site/interpreter checks can be folded into that PR instead.
Test Plan
venv/bin/python -m pytest tests/hermes_cli/test_managed_uv.py -qvenv/bin/python -m pytest tests/hermes_cli/test_update_autostash.py::test_cmd_update_aborts_when_fresh_managed_uv_rebuild_fails tests/hermes_cli/test_update_autostash.py::test_cmd_update_retries_optional_extras_individually_when_all_fails tests/hermes_cli/test_update_autostash.py::test_cmd_update_succeeds_with_extras -qvenv/bin/python -m pytest tests/hermes_cli/test_cmd_update.py tests/hermes_cli/test_uv_tool_update.py tests/hermes_cli/test_update_autostash.py -qNote: I also tried the full
tests/hermes_cli -q; it currently has unrelated environment/order/config-sensitive failures on this machine, while the targeted updater suites above pass.