Skip to content

convert install.ps1 to new bootstrap protocol#27224

Merged
teknium1 merged 4 commits into
mainfrom
jq/install-ps1-stage-protocol
May 17, 2026
Merged

convert install.ps1 to new bootstrap protocol#27224
teknium1 merged 4 commits into
mainfrom
jq/install-ps1-stage-protocol

Conversation

@jquesnelle

@jquesnelle jquesnelle commented May 17, 2026

Copy link
Copy Markdown
Collaborator

Adds an opt-in stage protocol that lets programmatic drivers (the desktop GUI's onboarding wizard, CI, future install.sh parity) drive install.ps1 one step at a time with structured JSON results. Default invocation (irm | iex one-liner) behaves unchanged.

Entry points

    | Flag             | Purpose                                                             |
    |------------------|---------------------------------------------------------------------|
    | install.ps1      | Today's interactive install (unchanged)                             |
    | -ProtocolVersion | Emit protocol version integer                                       |
    | -Manifest        | Emit JSON manifest of available stages                              |
    | -Stage <name>    | Run one stage, emit JSON result                                     |
    | -NonInteractive  | Suppress Read-Host prompts (skips setup wizard + gateway autostart) |
    | -Json            | Machine-readable completion frame                                   |

Manifest exposes 14 stages across prereqs/install/finalize/post-install categories. Two stages (configure, gateway) flag needs_user_input=true so GUI drivers can skip them and handle the equivalent UX themselves.

@alt-glitch alt-glitch added type/refactor Code restructuring, no behavior change P2 Medium — degraded but workaround exists comp/cli CLI entry point, hermes_cli/, setup wizard labels May 17, 2026
Adds an opt-in stage protocol that lets programmatic drivers (the
desktop GUI's onboarding wizard, CI, future install.sh parity) drive
install.ps1 one step at a time with structured JSON results. Default
invocation (`irm | iex` one-liner) behaves unchanged.

Entry points:
  install.ps1                  Today's interactive install (unchanged)
  install.ps1 -ProtocolVersion Emit protocol version integer
  install.ps1 -Manifest        Emit JSON manifest of available stages
  install.ps1 -Stage <name>    Run one stage, emit JSON result
  install.ps1 -NonInteractive  Suppress Read-Host prompts (skips the
                               setup wizard and gateway autostart)
  install.ps1 -Json            Machine-readable completion frame

Manifest exposes 14 stages across prereqs/install/finalize/post-install
categories, with 2 (configure, gateway) flagged needs_user_input=true
so GUI drivers can skip them and handle the equivalent UX themselves.

Along the way, clean-VM testing on stock Windows 10/11 surfaced a
series of latent install.ps1 bugs that were never exercised by
developer machines. Fixed in the same commit:

* Encoding: file is now pure ASCII with no BOM. Windows PowerShell
  5.1 reads BOM-less files as Windows-1252 and chokes on em-dashes
  (and other UTF-8 sequences), while iex chokes on a leading U+FEFF.
  Pure-ASCII satisfies both invocation paths.

* EAP=Stop + native `2>&1` captures: PowerShell wraps stderr lines
  from native commands as ErrorRecord objects under EAP=Stop and
  throws even when the command exits 0. Relaxed to EAP=Continue
  around the astral.sh uv installer, `uv python install`, `npm
  install`, `npx playwright install`, the venv import probes, and
  the Node winget fallback. Check $LASTEXITCODE for the real signal.

* Cross-process state: each `-Stage <name>` invocation spawns a
  fresh powershell child. $script:UvCmd set by Stage-Uv was invisible
  to Stage-Python; PATH updated by Stage-Git/Stage-Node was invisible
  to subsequent stages spawned by the driver shell. Added Resolve-UvCmd
  helper called at the top of every stage that needs uv, and a
  Sync-EnvPath helper called at the top of Invoke-Stage to refresh
  PATH from the registry.

* UAC avoidance: `winget install OpenJS.NodeJS.LTS` triggers a UAC
  prompt that often appears minimized in the taskbar -- looks like a
  hang. Switched Test-Node to prefer the official portable Node zip
  dropped into %LOCALAPPDATA%\hermes\node\ (mirrors the PortableGit
  pattern Install-Git already uses). winget kept as fallback.

* npx hangs on confirmation: `npx playwright install chromium` blocks
  on stdin waiting for "Need to install playwright@X.Y.Z (y/N)" when
  playwright isn't in local node_modules. Tee-Object pipelines
  disconnect stdin from the user's TTY so the install hangs forever.
  Pass `--yes` to auto-accept.

* Silent long-running installs: `*> $logPath` redirected every stream
  to disk and left the user staring at a frozen "Installing..." line
  for the 5-10 minutes Playwright Chromium takes to download. Switched
  to `2>&1 | ForEach-Object { "$_" } | Tee-Object -FilePath $log` so
  output streams live to the console AND captures to log for failure
  diagnostics. ForEach-Object coercion strips PowerShell's red
  NativeCommandError formatter from stderr items.

* Console encoding: forced [Console]::OutputEncoding to UTF-8 so
  playwright/git/npm progress bars, box-drawing, and check marks render
  correctly instead of as IBM437/Windows-1252 mojibake.

* Performance: set $ProgressPreference = "SilentlyContinue" so
  Invoke-WebRequest doesn't paint its per-chunk progress bar. The
  PS 5.1 progress UI throttles downloads by 10-100x (a 57MB PortableGit
  grab takes 5 minutes with the bar on vs ~20 seconds with it off,
  same network). Affects PortableGit, Node portable zip, and the
  Hermes repo zip fallback.

Tests: scripts/tests/test-install-ps1-stage-protocol.ps1 provides 19
metadata-only assertions covering -ProtocolVersion, -Manifest schema,
and unknown -Stage error frame. No install side effects.

End-to-end validated on a clean Windows 10 VM via:
  1. `irm <branch>/scripts/install.ps1 | iex` (canonical CLI path)
  2. `powershell -File install.ps1 -Stage X` iterated through every
     stage (GUI driver path, exercises cross-process fixes)
@jquesnelle jquesnelle force-pushed the jq/install-ps1-stage-protocol branch from 60c881e to 5165819 Compare May 17, 2026 04:47
@jquesnelle jquesnelle marked this pull request as ready for review May 17, 2026 05:05
@OutThisLife OutThisLife requested a review from Copilot May 17, 2026 05:09

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-in "stage protocol" to scripts/install.ps1 so programmatic drivers (the desktop GUI's onboarding wizard, CI, future install.sh) can drive the Windows installer one step at a time and receive structured JSON results, while the default interactive irm | iex flow is unchanged. The PR also replaces non-ASCII glyphs in console output with ASCII equivalents (for PS 5.1 parser/codepage robustness), forces UTF-8 console output for child commands, silences Invoke-WebRequest progress bars, demotes the winget Node install behind the portable-zip path, and adds several $ErrorActionPreference relaxations around native commands whose stderr would otherwise be wrapped as terminating errors.

Changes:

  • New stage protocol surface in install.ps1: -ProtocolVersion, -Manifest, -Stage <name>, -NonInteractive, -Json, plus a 14-stage $InstallStages table, per-stage workers, Invoke-Stage/Invoke-AllStages, and Resolve-UvCmd/Sync-EnvPath helpers for cross-process driver mode.
  • Operational hardening: UTF-8 console encoding, $ProgressPreference = SilentlyContinue, EAP relaxations around uv, winget, python -c, npm install, and playwright install, Tee-Object live output for npm/playwright, persisted User-PATH entry for the portable Node install.
  • New PowerShell smoke test (scripts/tests/test-install-ps1-stage-protocol.ps1) covering -ProtocolVersion, -Manifest shape and required stage names, and unknown-stage error framing.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
scripts/install.ps1 Adds the stage-protocol parameters, dispatch block, stage table/workers, helper functions, and various EAP/UTF-8/Tee operational changes; replaces non-ASCII banner/log glyphs with ASCII.
scripts/tests/test-install-ps1-stage-protocol.ps1 New metadata-only smoke test that runs -ProtocolVersion, -Manifest, and an unknown -Stage and asserts exit codes plus JSON shape.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/install.ps1
Comment on lines +1977 to +1989
} catch {
$result.ok = $false
$result.reason = "$_"
throw
} finally {
$result.duration_ms = [int]([DateTime]::UtcNow - $start).TotalMilliseconds
if ($Json -or $Stage) {
# In stage-driver mode every stage emits a JSON line so the
# caller can stream progress. In default interactive mode we
# stay silent here (the worker already wrote human output).
$result | ConvertTo-Json -Compress | Write-Output
}
}
Comment thread scripts/install.ps1
Comment on lines +2063 to +2074
if ($Json -or $Stage) {
# Stage-driver mode: caller wants JSON they can parse. Emit a
# structured error frame and exit non-zero.
$err = @{
ok = $false
stage = if ($Stage) { $Stage } else { $null }
reason = "$_"
}
$err | ConvertTo-Json -Compress | Write-Output
exit 1
}

Comment thread scripts/install.ps1
Comment on lines +183 to +185
# Restore EAP in case the try block threw before the assignment
if ($prevEAP) { $ErrorActionPreference = $prevEAP }
Write-Err "Failed to install uv: $_"
alt-glitch

This comment was marked as resolved.

@alt-glitch alt-glitch left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline comments on the two bugs found during Windows E2E testing.

Three issues flagged by the Copilot review on this PR:

1. Double JSON emit on stage failure (Copilot #1, #2). When -Stage <name>
   ran a worker that threw, Invoke-Stage's finally emitted a JSON result
   frame AND the entry-point catch emitted a second error frame --
   producing two concatenated JSON objects on stdout and breaking the
   one-line-per-invocation contract that drivers parse against. Same
   issue applied to -Json mode on a full install (every stage's finally
   plus a final error frame missing duration_ms/skipped).

   Fix: Invoke-Stage's finally now sets $script:_StageEmittedErrorFrame
   when it emits a failure frame; the entry-point catch checks the flag
   and skips its own emit, still exit 1.

2. $prevEAP uninitialized on early try-block throw (Copilot #3). In
   Install-Uv, Test-Python, Test-Node's winget fallback,
   _Run-NpmInstall, and the playwright block, '$prevEAP =
   $ErrorActionPreference' lived as the first statement INSIDE the
   try. If anything between 'try {' and that line threw (Write-Info on
   an unusual host, the npx-finding loop, etc.), the catch's
   'if ($prevEAP) { ... }' restore was a no-op and EAP could remain
   relaxed.

   Fix: hoist '$prevEAP = $ErrorActionPreference' to the line
   immediately before 'try {' in all five sites. Catch's restore is
   now always meaningful regardless of where in the try the throw
   originated.

No change to Invoke-Stage's success path or to the four lint-clean EAP
sites (Test-Node was the only winget-related catch). All 19 metadata
smoke tests still pass.

@alt-glitch alt-glitch left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐛 Request Changes: Double JSON on failed stages

Tested on Windows 11 PS 5.1.26100 via SSH. The stage protocol metadata surface is clean (19/19 smoke tests pass, cross-process driver works for uv/python/git/node), but there's a protocol contract bug that will bite programmatic drivers:

When a stage fails, the caller gets two JSON lines instead of one:

{"skipped":false,"ok":false,"reason":"Cannot find path ...","duration_ms":46,"stage":"venv"}
{"ok":false,"reason":"Cannot find path ...","stage":"venv"}

Root cause: Invoke-Stage emits JSON in its finally block (line 1987), then re-throws (line 1980). The outer catch at line 2062 also emits a JSON error frame. Both fire for the same failure.

Suggested fix — suppress the re-throw when running a single stage, since the JSON frame from finally is already the structured result:

# In Invoke-Stage, line 1980:
    } catch {
        $result.ok = $false
        $result.reason = "$_"
        if (-not $Stage) { throw }  # Only re-throw in full-install mode
    } finally {

The full-install path (MainInvoke-AllStages) still needs the re-throw so the outer catch can surface the error to the user. Single-stage mode (-Stage venv) doesn't — the JSON frame with ok=false is the contract, and the exit code from the finallycatch chain already propagates correctly.

Also two cosmetic nits (not blocking):

  • Completion banner line 1756 is 62 chars wide vs 59-char borders ([OK] is 3 wider than , trailing spaces not adjusted)
  • Test file line 8 has an em-dash () — only non-ASCII byte across both files

@alt-glitch alt-glitch left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial review round 1 — two new bugs found (in addition to the double-JSON already flagged).

@NousResearch NousResearch deleted a comment from github-actions Bot May 17, 2026
@NousResearch NousResearch deleted a comment from cardtest15-coder May 17, 2026
@NousResearch NousResearch deleted a comment from OutThisLife May 17, 2026
@alt-glitch alt-glitch dismissed their stale review May 17, 2026 05:36

Consolidating into single review

@alt-glitch alt-glitch left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — PR #27224: install.ps1 stage protocol + Windows clean-VM hardening

Tested on real Windows 11 PS 5.1.26100 via SSH (sidbin@vespyr), plus two parallel adversarial Claude Code Sonnet reviewers with independent focus areas.

Testing performed

Test Result
Smoke tests (19 assertions: -ProtocolVersion, -Manifest schema, unknown -Stage) Pass
Cross-process stage driver (uv/python/git/node as separate PS processes) Pass
-NonInteractive suppresses Read-Host (configure stage returns immediately) Pass
Encoding: pure ASCII, no BOM (main had 1,409 non-ASCII bytes) Pass
Existing pytest suite (test_windows_native_support.py, 58 tests) Pass
Failed stage JSON framing Bug
Stage-Node success reporting Bug
Empty -Stage "" dispatch Bug

What's good

The stage protocol design is clean — single source of truth, thin stage workers, Resolve-UvCmd + Sync-EnvPath for cross-process state. The hardening fixes (EAP guards, ProgressPreference 10-100x speedup, portable Node over winget UAC, npx --yes, Tee-Object streaming, console encoding) are all correct and well-documented. The commit message is excellent — every fix traced to a specific clean-VM failure.

Bugs — see inline comments

  1. Double JSON on failed stages (HIGH) — line 1980
  2. Stage-Node reports ok=true when Node install fails (MEDIUM) — line 1916
  3. -Stage "" runs full install instead of erroring (MEDIUM) — line 2043
  4. Completion banner misalignment (LOW) — line 1756

Adversarial findings rejected (verified false on PS 5.1)

  • ConvertTo-Json -Compress multi-line → single-line confirmed
  • Move-Item fails with spaces → works fine
  • PATH duplication in Sync-EnvPath → cosmetic, safe
  • Concurrent stage PATH race → stages are sequential by design

Ready to ship once bugs 1-3 are addressed.

Comment thread scripts/install.ps1
} catch {
$result.ok = $false
$result.reason = "$_"
throw

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug 1: Double JSON on failed stages.

Invoke-Stage emits JSON in the finally block (line 1987), then re-throws here. The outer catch at line 2062 also emits a JSON error frame. Drivers get two lines for one failure:

{"skipped":false,"ok":false,"reason":"...","duration_ms":46,"stage":"venv"}
{"ok":false,"reason":"...","stage":"venv"}

Reproduced on PS 5.1.26100 with -Stage venv (no repo at default path).

Fix — suppress re-throw in single-stage mode:

if (-not $Stage) { throw }  # Only re-throw in full-install mode

Comment thread scripts/install.ps1 Outdated
function Stage-Uv { if (-not (Install-Uv)) { throw "uv installation failed" } }
function Stage-Python { Resolve-UvCmd; if (-not (Test-Python)) { throw "Python $PythonVersion not available" } }
function Stage-Git { if (-not (Install-Git)) { throw "Git not available and auto-install failed -- install from https://git-scm.com/download/win then re-run" } }
function Stage-Node { [void](Test-Node) }

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug 2: Stage-Node always reports ok=true even when Node install fails.

Test-Node returns $true unconditionally (line 684). Stage-Node does [void](Test-Node) — never throws. In stage-driver mode the JSON says ok=true when Node isn't actually installed.

The default-install path works because $script:HasNode gates downstream behavior, but the stage protocol contract lies to the GUI driver.

Fix — surface the failure:

function Stage-Node { [void](Test-Node); if (-not $script:HasNode) { throw "Node.js not available (optional — browser tools disabled)" } }

Or if the GUI should treat this as a soft skip, set $result.skipped = $true instead of throwing.

Comment thread scripts/install.ps1 Outdated
exit 0
}

if ($Stage) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug 3: -Stage "" silently runs a full install.

PS treats "" as falsy, so if ($Stage) is false and dispatch falls through to Main. A GUI driver passing empty string gets an entire interactive install instead of an error.

Verified on PS 5.1.26100: "" = falsy, " " = truthy.

Fix — use $PSBoundParameters instead of truthy test:

if ($PSBoundParameters.ContainsKey('Stage')) {
    if ([string]::IsNullOrWhiteSpace($Stage)) {
        @{ ok = $false; stage = $Stage; reason = "Stage name cannot be empty." } | ConvertTo-Json -Compress | Write-Output
        exit 2
    }
    # ... existing dispatch
}

Comment thread scripts/install.ps1 Outdated
Write-Host " Installation Complete! " -ForegroundColor Green
Write-Host "└─────────────────────────────────────────────────────────┘" -ForegroundColor Green
Write-Host "+---------------------------------------------------------+" -ForegroundColor Green
Write-Host "| [OK] Installation Complete! |" -ForegroundColor Green

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: banner misalignment. [OK] is 3 chars wider than but trailing spaces weren't reduced. This line is 62 chars, borders are 59.

+---------------------------------------------------------+   (59)
|              [OK] Installation Complete!                   |   (62)
+---------------------------------------------------------+   (59)

Address the two cosmetic items from review:

- Completion banner middle line was 62 chars vs 59-char top/bottom borders
  (replacing the 1-char checkmark with [OK] added width that wasn't
  reflected in the trailing whitespace).  Drop 3 trailing spaces.
- Smoke test file had a single em-dash in a comment -- the only
  non-ASCII byte across both files.  Replace with -- for consistency
  with install.ps1's pure-ASCII goal.
@teknium1 teknium1 dismissed alt-glitch’s stale review May 17, 2026 05:52

Bug #1 (double JSON) was already addressed in this PR via the $script:_StageEmittedErrorFrame guard at lines 2005-2007 and 2091 — reviewer was looking at an older revision. Bug #2 (banner width) and Nit #1 (em-dash in test) just fixed in 9eb9bee.

@github-actions

github-actions Bot commented May 17, 2026

Copy link
Copy Markdown
Contributor

🔎 Lint report: jq/install-ps1-stage-protocol vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 8350 on HEAD, 8350 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 4366 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

Two protocol-correctness gaps from review:

1. Stage-Node used [void](Test-Node) which discarded Test-Node's return
   value, so the JSON frame always reported ok=true even when Node
   install fully failed.  A GUI driver consuming the manifest couldn't
   tell 'node ready' from 'node missing'.  Wire a soft-skip channel
   ($script:_StageSkippedReason) that workers can populate to surface
   'ran, but the thing it was supposed to set up is not available' as
   skipped=true with a reason in the JSON, without aborting the install
   (Node is optional -- browser tools degrade gracefully, matches
   Write-Completion's existing 'Note: Node.js could not be installed'
   behavior).  Reset before each stage so a prior reason can't leak.

2. The -Stage dispatch used 'if ($Stage)' which is falsy for empty
   string, so 'install.ps1 -Stage ""' fell through to Main and silently
   kicked off a full destructive install.  Switch to
   PSBoundParameters.ContainsKey('Stage') so an explicit empty value
   surfaces as unknown-stage exit 2 with a structured JSON frame, the
   way every other bad stage name does.
@teknium1 teknium1 merged commit fb138d9 into main May 17, 2026
10 checks passed
teknium1 pushed a commit that referenced this pull request May 17, 2026
Three issues flagged by the Copilot review on this PR:

1. Double JSON emit on stage failure (Copilot #1, #2). When -Stage <name>
   ran a worker that threw, Invoke-Stage's finally emitted a JSON result
   frame AND the entry-point catch emitted a second error frame --
   producing two concatenated JSON objects on stdout and breaking the
   one-line-per-invocation contract that drivers parse against. Same
   issue applied to -Json mode on a full install (every stage's finally
   plus a final error frame missing duration_ms/skipped).

   Fix: Invoke-Stage's finally now sets $script:_StageEmittedErrorFrame
   when it emits a failure frame; the entry-point catch checks the flag
   and skips its own emit, still exit 1.

2. $prevEAP uninitialized on early try-block throw (Copilot #3). In
   Install-Uv, Test-Python, Test-Node's winget fallback,
   _Run-NpmInstall, and the playwright block, '$prevEAP =
   $ErrorActionPreference' lived as the first statement INSIDE the
   try. If anything between 'try {' and that line threw (Write-Info on
   an unusual host, the npx-finding loop, etc.), the catch's
   'if ($prevEAP) { ... }' restore was a no-op and EAP could remain
   relaxed.

   Fix: hoist '$prevEAP = $ErrorActionPreference' to the line
   immediately before 'try {' in all five sites. Catch's restore is
   now always meaningful regardless of where in the try the throw
   originated.

No change to Invoke-Stage's success path or to the four lint-clean EAP
sites (Test-Node was the only winget-related catch). All 19 metadata
smoke tests still pass.
@teknium1 teknium1 deleted the jq/install-ps1-stage-protocol branch May 17, 2026 05:55
alt-glitch added a commit that referenced this pull request May 18, 2026
…sh,ps1}

Eliminates 687 lines of duplicated browser bootstrap code by routing all
bootstrap paths through dep_ensure.py -> install.{sh,ps1} --ensure.

install.sh:
- New ensure_browser() with agent-browser + camofox install, system browser
  detection + .env writing, per-distro Playwright deps (apt/arch/fedora/suse)
- macOS app-bundle paths added to find_system_browser()
- configure_browser_env_from_system_browser() creates .env if missing
- postinstall_mode() uses ensure_browser() instead of inline duplication

install.ps1:
- New -Ensure and -PostInstall params (coexists with stage protocol)
- New functions: Resolve-NpmCmd, Resolve-NpxCmd, Find-SystemBrowser,
  Write-BrowserEnv, Install-AgentBrowser (with -SkipPlaywright)
- Invoke-EnsureMode dispatches node/browser/ripgrep/ffmpeg
- Invoke-PostInstallMode runs full post-pip-install bootstrap
- ErrorActionPreference guards on all native command calls
- ASCII-only convention maintained (no Unicode)
- Mutual exclusion guard: -Ensure + -Stage = error

dep_ensure.py:
- Windows-aware: _IS_WINDOWS, _find_install_script returns (path, shell) tuple
- PowerShell invocation with powershell/pwsh guard + -ExecutionPolicy Bypass
- _has_hermes_agent_browser() checks platform-correct paths
- _has_system_browser() checks Windows browser names (chrome, msedge, chromium)
- env_extra parameter for forwarding install flags

config.py:
- stamp_install_method() writes ~/.hermes/.install_method
- detect_install_method() checks stamp first (before heuristics)

acp_adapter:
- _run_setup_browser() rewritten: ensure_dependency('node') + ensure_dependency('browser')
- acp_adapter/bootstrap/ deleted (399 + 288 lines)

Rebased onto main -- drops #26620 dependency (upstream stage protocol merged
via #27224). Closes follow-up from #26593.
jquesnelle added a commit that referenced this pull request May 18, 2026
The canonical install flow

    irm https://raw.githubusercontent.com/.../scripts/install.ps1 | iex

fails on PowerShell 5.1 with a cascade of 'The assignment expression
is not valid' errors at every param() default value:

    [string]$Branch = 'main',
                      ~~~~~~
    The assignment expression is not valid. The input to an assignment
    operator must be an object that is able to accept assignments...

Root cause: scripts/install.ps1 carries a UTF-8 BOM (0xEF 0xBB 0xBF)
as its first three bytes. 'irm' returns the response body as a string;
on PS 5.1 the BOM survives into that string as a leading \ufeff
character. 'iex' then evaluates the string and PS's parser chokes
on the invisible character before param() -- error recovery proceeds
into the body but every assignment is reported as broken.

This was the exact failure mode the install.ps1 hardening pass (PR
#27224) deliberately fixed by stripping the BOM and ensuring the
file body is pure ASCII. Commit 4279da4 ('fix(windows): make
PowerShell installer parse in 5.1') re-introduced the BOM later,
unintentionally undoing the irm|iex compatibility fix; the merge
that brought it into bb/gui carried it forward.

Fix: strip the three BOM bytes. File body is verified pure ASCII
(any-byte > 127 returns false), so PS 5.1 with no BOM falls back to
Windows-1252 decoding which is identical to ASCII for our content.
Both install paths now work:
  - 'irm ... | iex' (canonical CLI)
  - 'powershell -File install.ps1' (programmatic / desktop bootstrap)
GleidsonSilva pushed a commit to GleidsonSilva/hermes-agent that referenced this pull request May 19, 2026
… ops

Three install.ps1 improvements pulled from the thin-installer work on
bb/gui (PR NousResearch#27822) that benefit the canonical CLI install flow on main:

1. Strip UTF-8 BOM from scripts/install.ps1.

   The canonical 'irm <raw URL> | iex' install flow has been broken
   since commit 4279da4 re-introduced a UTF-8 BOM that PR NousResearch#27224
   had explicitly stripped. PowerShell 5.1's 'irm' returns the
   response body as a string with the BOM surviving as a leading
   \ufeff character; 'iex' then evaluates that string and the parser
   chokes on the invisible character before param(), surfacing as a
   cascade of 'The assignment expression is not valid' errors at
   every param default value.

   File body is verified pure ASCII (no character above byte 127),
   so PS 5.1 with no BOM falls back to Windows-1252 decoding which
   is identical to ASCII for our content. Both install paths work:
     - 'irm ... | iex' (canonical one-liner)
     - 'powershell -File install.ps1' (programmatic / desktop bootstrap)

2. New -Commit and -Tag string params for reproducible pinning.

   Higher-precedence variants of -Branch. When set, the repository
   stage clones $Branch (fast partial fetch) and then 'git checkout's
   the exact ref. Precedence: Commit > Tag > Branch. Honoured by all
   three code paths:
     - Update path (existing valid checkout): fetch + checkout
       --detach <commit|tag> instead of checkout + pull.
     - Fresh clone: clone --branch $Branch, then post-clone
       'git checkout --detach' to the requested ref.
     - ZIP fallback: pick archive URL for the most-specific ref
       (commit -> archive/<sha>.zip, tag -> archive/refs/tags/
       <tag>.zip, else archive/refs/heads/<branch>.zip).

   Used by the Hermes desktop's first-launch bootstrap to pin the
   .exe to the exact commit it was built against, so the cloned
   Hermes Agent tree always matches what the .exe was tested with.
   Also enables release-bundle pinning (e.g. Microsoft Store builds
   pinning to a release tag) and CI reproducibility.

3. EAP=Continue wrap around the new pin-step git invocations.

   'git fetch origin <commit>' writes the routine 'From <url>' info
   line to stderr. Under the script's global $ErrorActionPreference
   = 'Stop' that stderr line is wrapped as an ErrorRecord and
   terminates the script even though fetch+checkout actually succeed.
   Same EAP=Stop + native-stderr footgun we hit during the install.ps1
   hardening pass in Install-Uv, Test-Python, _Run-NpmInstall.

   Wrap both the update-path fetch/checkout block AND the post-clone
   pin block in $ErrorActionPreference = 'Continue' (restored in
   finally). Real failures still caught by $LASTEXITCODE checks.
dimavrem22 pushed a commit to inkbox-ai/hermes-agent that referenced this pull request May 20, 2026
* fix(acp): treat polished tool error payloads as failed

* fix(acp): also mark raised-exception tool results as failed

Extends #26573 to also catch the case the original PR deliberately left
out: when a tool raises an exception, the agent's tool executor wraps it
in a canonical 'Error executing tool '<name>': ...' string prefix (see
agent/tool_executor.py around the try/except). That prefix is unique to
the wrapper and cannot legitimately appear in well-behaved tool output,
so it is a safe signal that the tool blew up.

Without this, the canonical 'tool raised' case still rendered as a green
'completed' row in Zed despite being a runtime failure — exactly the
class of bug #26573 set out to fix.

Adds a positive test (raised-exception prefix -> failed) and a negative
test (bare 'Error:' word in legit tool output stays completed) so a
future contributor doesn't accidentally widen the rule to false-positive
on compiler/linter diagnostics.

* fix(acp): refresh session info after auto-title

* fix(acp): use refresh moment as updated_at on session info push

Follow-up to #26543. The sessions table does not have an updated_at
column (see hermes_state.py — only started_at/ended_at), so
row.get('updated_at') always returned None and the str() coercion was
dead code. Use datetime.now(UTC).isoformat() instead, which reflects
exactly what the field means here: 'the title was refreshed at this
moment'. Drop the dead coercion.

* feat(acp): enrich permission request cards

* feat(web): mobile dashboard UX polish (#28127)

* feat(web): mobile dashboard UX polish

Bottom sheets for sidebar theme/language pickers on narrow viewports with
enter/exit animation and drag-to-close; inline header badges beside titles;
bottom padding on the route outlet for scroll clearance; profiles loading uses a
unicode braille spinner; align profile/cron card actions to the top; viewport-fit
cover and supporting layout tweaks across dashboard pages.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Fix Nix web npm hash and mobile sheet accessibility.

Align fetchNpmDeps in nix/web.nix with web/package-lock.json for CI. Improve BottomPickSheet backdrop labeling, avoid aria-hidden on the dialog during exit animation, and wire theme/language sheets with listbox semantics and localized dismiss labels.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(install.ps1): strip BOM, add -Commit/-Tag pin params, harden git ops

Three install.ps1 improvements pulled from the thin-installer work on
bb/gui (PR #27822) that benefit the canonical CLI install flow on main:

1. Strip UTF-8 BOM from scripts/install.ps1.

   The canonical 'irm <raw URL> | iex' install flow has been broken
   since commit 4279da4db re-introduced a UTF-8 BOM that PR #27224
   had explicitly stripped. PowerShell 5.1's 'irm' returns the
   response body as a string with the BOM surviving as a leading
   \ufeff character; 'iex' then evaluates that string and the parser
   chokes on the invisible character before param(), surfacing as a
   cascade of 'The assignment expression is not valid' errors at
   every param default value.

   File body is verified pure ASCII (no character above byte 127),
   so PS 5.1 with no BOM falls back to Windows-1252 decoding which
   is identical to ASCII for our content. Both install paths work:
     - 'irm ... | iex' (canonical one-liner)
     - 'powershell -File install.ps1' (programmatic / desktop bootstrap)

2. New -Commit and -Tag string params for reproducible pinning.

   Higher-precedence variants of -Branch. When set, the repository
   stage clones $Branch (fast partial fetch) and then 'git checkout's
   the exact ref. Precedence: Commit > Tag > Branch. Honoured by all
   three code paths:
     - Update path (existing valid checkout): fetch + checkout
       --detach <commit|tag> instead of checkout + pull.
     - Fresh clone: clone --branch $Branch, then post-clone
       'git checkout --detach' to the requested ref.
     - ZIP fallback: pick archive URL for the most-specific ref
       (commit -> archive/<sha>.zip, tag -> archive/refs/tags/
       <tag>.zip, else archive/refs/heads/<branch>.zip).

   Used by the Hermes desktop's first-launch bootstrap to pin the
   .exe to the exact commit it was built against, so the cloned
   Hermes Agent tree always matches what the .exe was tested with.
   Also enables release-bundle pinning (e.g. Microsoft Store builds
   pinning to a release tag) and CI reproducibility.

3. EAP=Continue wrap around the new pin-step git invocations.

   'git fetch origin <commit>' writes the routine 'From <url>' info
   line to stderr. Under the script's global $ErrorActionPreference
   = 'Stop' that stderr line is wrapped as an ErrorRecord and
   terminates the script even though fetch+checkout actually succeed.
   Same EAP=Stop + native-stderr footgun we hit during the install.ps1
   hardening pass in Install-Uv, Test-Python, _Run-NpmInstall.

   Wrap both the update-path fetch/checkout block AND the post-clone
   pin block in $ErrorActionPreference = 'Continue' (restored in
   finally). Real failures still caught by $LASTEXITCODE checks.

* fix: add default base_url_override for ollama-cloud provider

* chore(release): add AUTHOR_MAP entry for falasi

* feat(cli): add /update slash command to CLI and TUI (#23854)

* feat: add /update slash command to CLI and TUI

* test(cli): add Python tests for /update slash command

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(cli): address Copilot review for /update slash command

Route classic CLI /update through prompt_toolkit modal confirmation and
defer relaunch to the main-thread cleanup path after app.exit(). Tighten
Y/n semantics, add Python wrapper and catalog coverage tests, and assert
/update stays visible in the TUI command catalog.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(cli): address review feedback on /update command

- Replace raw input() with _prompt_text_input_modal in _handle_update_command
  to avoid EOF/hang/keystroke-leak races with prompt_toolkit's stdin ownership
- Fix confirmation logic: only proceed on recognized affirmative aliases
  (y/yes/1/ok); cancel on everything else including empty string, typos,
  and unrecognized input — matches all other [Y/n] prompts in the codebase
- Route relaunch through main-thread shutdown path: set _pending_relaunch
  and return False from process_command so process_loop triggers app.exit();
  run() then calls relaunch() after prompt_toolkit has restored terminal modes
  and after cleanup — safe on both POSIX (execvp) and Windows (subprocess+exit)
- Fix misleading docstring in test_update_command.py: the Vitest only covers
  the TypeScript slash handler that emits code 42, not the Python wrapper
  branch that acts on it
- Rewrite tests to use SimpleNamespace pattern (like test_destructive_slash_confirm)
  so _prompt_text_input_modal can be stubbed directly
- Add Python test for _launch_tui exit-code-42 → relaunch branch in main.py

Agent-Logs-Url: https://github.com/NousResearch/hermes-agent/sessions/f6da68cf-e7b1-4b7a-aed6-3d4b0f523bdb

Co-authored-by: austinpickett <260188+austinpickett@users.noreply.github.com>

* fix(cli): polish test fixtures for /update command

- Remove unused _prompt_text_input from SimpleNamespace stub
- Use pytest.fail sentinel in managed-install guard test to catch unexpected modal invocations

Agent-Logs-Url: https://github.com/NousResearch/hermes-agent/sessions/f6da68cf-e7b1-4b7a-aed6-3d4b0f523bdb

Co-authored-by: austinpickett <260188+austinpickett@users.noreply.github.com>

* chore: re-trigger CI after Copilot review fixes

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: austinpickett <260188+austinpickett@users.noreply.github.com>

* feat(skills): add baoyu-article-illustrator skill

* feat(skills): adapt baoyu-article-illustrator for Hermes

Adapts the upstream baoyu-article-illustrator skill (verbatim-copied in
the previous commit) to Hermes' tool ecosystem, matching the pattern
used by baoyu-infographic.

- Metadata: openclaw → hermes; add author, license, tags, category
- Triggering: slash command + CLI flags → natural language
- User config: remove EXTEND.md, first-time-setup, preferences-schema
- User prompts: AskUserQuestion (batched) → clarify (one at a time)
- Image gen: baoyu-imagine → image_generate (describe refs in prompt text)
- Platform: drop Windows/PowerShell; Linux/macOS only
- File ops: switch to write_file / read_file
- Watermark: opt-in per-article instead of EXTEND.md-driven
- Add PORT_NOTES.md describing the adaptation and sync procedure

Style, palette, and prompt/system.md reference files are verbatim copies
and are the sync points with upstream.

* fix(skills): align article-illustrator with real Hermes tool capabilities

Addresses review feedback on #13193:

1. Reference-image flow no longer assumes write_file/read_file handle
   binaries. vision_analyze produces a textual description; the binary
   is optionally copied via terminal (cp/curl). The description is what
   gets embedded in prompts.

2. image_generate's URL-only return is now explicit. Step 6 downloads
   the returned URL to local disk via terminal (curl -sSL -o ...), then
   verifies non-zero size before proceeding.

3. Removed "Please use nano banana pro..." line from prompts/system.md —
   the backend is user-configured and not agent-selectable, so routing
   hints in the prompt are misleading.

PORT_NOTES.md updated: prompts/system.md is no longer verbatim, and the
file-ops/backend-selection rows now reflect Hermes' actual tool surface
(write_file/read_file for text, terminal for binaries and URL downloads,
vision_analyze for reading images).

* chore(skills/baoyu-article-illustrator): tighten description, add platforms, regen docs

* chore(release): map Jack Yang contributor email

Adds the contributor email mapping for Jack Yang (@0xjackyang) so future
release-note generation attributes commits correctly.

Salvage of #27964 by @0xjackyang.

* chore(release): pre-stage AUTHOR_MAP for May 2026 LHF batch group 7

Pre-stages AUTHOR_MAP entries for 5 new contributors whose PRs are being
salvaged in the May 2026 low-hanging-fruit batch (group 7). Lands ahead
of the per-PR salvage PRs so they don't get blocked by AUTHOR_MAP CI.

Contributors:
- 02356abc (#28286 — wecom WSMsgType.CLOSING)
- burjorjee (#28201 — inline-shell timeout guard)
- oseftg (#28168 — natural response ending: emoji + caret)
- rudi193-cmd (#28241 — empty credential pool entries)
- sadiksaifi (#27982 — kanban horizontal scroll)

Per references/batch-pr-salvage-may14-additions.md.

* fix(wecom): handle WSMsgType.CLOSING to prevent CPU spin

The WeCom adapter's _read_events() loop only handled CLOSE, CLOSED,
and ERROR websocket message types. When the server initiates a graceful
shutdown, aiohttp returns WSMsgType.CLOSING before the connection is
fully closed. This message type was not handled, causing the receive()
call to return immediately in a tight loop while self._ws.closed
remained False. The result was 100% CPU usage on the asyncio event loop.

Add WSMsgType.CLOSING to the set of terminal message types that raise
RuntimeError("WeCom websocket closed"), allowing _listen_loop() to
enter its normal reconnect backoff path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(auth): treat empty credential pool entries as unauthenticated

Fixes #28140

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: include hermes_plugins in gateway.log component filter

gateway.log uses a _ComponentFilter that only passes records from
loggers starting with ('gateway',). Plugin modules are loaded under
the hermes_plugins.* namespace, so all plugin log output is silently
dropped from gateway.log.

This makes plugin registration — which directly affects gateway hooks
(pre_gateway_dispatch, transform_llm_output, etc.) — invisible in
the gateway-specific log. Operators debugging gateway behavior check
gateway.log and see no plugin activity, even when plugins are working
correctly.

Add 'hermes_plugins' to the gateway component prefixes tuple so
plugin log messages appear in gateway.log.

Closes #28138

* fix(gateway): align kanban artifact _IMAGE_EXTS with response dispatch

_deliver_kanban_artifacts used a broader _IMAGE_EXTS that included
.bmp, .tiff, and .svg. These three extensions are absent from the
equivalent set in _deliver_media_from_response (line 10661), which
intentionally routes them through send_document rather than
send_multiple_images (comment near line 10522 notes that Telegram
sendPhoto recompresses and rejects non-raster formats).

Routing .svg (XML text), .bmp, or .tiff through the photo API causes
send_multiple_images to raise on most platforms; the exception is caught
and logged as a warning, silently dropping the artifact. Aligning the
two sets ensures kanban deliverables with these extensions follow the
same send_document path as regular agent responses.

No behaviour change for .png/.jpg/.jpeg/.gif/.webp.

* fix(process-registry): detach stdin from background subprocesses to prevent keyboard freeze

Background process non-PTY path used stdin=subprocess.PIPE unconditionally,
creating an orphan pipe that was never written to and never closed. Child
processes that read stdin would block indefinitely, competing with the
parent's prompt_toolkit event loop for terminal ownership and causing
complete keyboard lockout.

Change to stdin=subprocess.DEVNULL so children get immediate EOF on stdin
reads instead of blocking forever. For interactive stdin, the PTY path
(which has its own independent PTY via ptyprocess.PtyProcess.spawn) should
be used instead.

Fixes #17959

* chore(release): alias stale-ID salvage commit for @LifeJiggy (#28317)

* fix(process-registry): detach stdin from background subprocesses to prevent keyboard freeze

Background process non-PTY path used stdin=subprocess.PIPE unconditionally,
creating an orphan pipe that was never written to and never closed. Child
processes that read stdin would block indefinitely, competing with the
parent's prompt_toolkit event loop for terminal ownership and causing
complete keyboard lockout.

Change to stdin=subprocess.DEVNULL so children get immediate EOF on stdin
reads instead of blocking forever. For interactive stdin, the PTY path
(which has its own independent PTY via ptyprocess.PtyProcess.spawn) should
be used instead.

Fixes #17959

* chore(release): alias stale-ID salvage commit for LifeJiggy

PR #28315 was salvaged with a wrong noreply numeric ID (192385615 vs
the correct 141562589). The commit on main is correctly authored to
LifeJiggy by username, but the noreply email doesn't match AUTHOR_MAP.
Adds an alias so release-notes generation maps both forms to the same
contributor.

---------

Co-authored-by: LifeJiggy <192385615+LifeJiggy@users.noreply.github.com>

* fix: elevate plugin discovery failures from debug to warning

Plugin discovery exceptions in gateway startup (gateway/run.py) and
CLI startup (hermes_cli/main.py) are caught and logged at DEBUG
level, making them invisible at the default INFO log level.

If any plugin import fails — syntax error, missing dependency, import
cycle — operators get zero indication unless they bump the log level
to DEBUG. This makes broken plugins appear enabled but silently
non-functional.

Change both locations to logger.warning() so failures are visible at
production log levels.

Closes #28137

* fix: treat inline-shell timeout guard as timeout

* fix(acp): resolve /tmp symlink before workspace auto-approve check on macOS

Path.resolve() follows the /tmp -> /private/tmp symlink on macOS, so
str(path).startswith("/tmp/") is always False for temp-dir paths.
The "Accept Edits" (workspace_session) mode silently refused to
auto-approve every /tmp write on macOS, breaking the documented
behaviour and making the existing test fail on this platform.

Fix: keep the raw expanded path (pre-resolve) for the /tmp prefix
check and continue using the resolved form only for the cwd
relative_to() call where symlink resolution is correct behaviour.

* fix(kanban): single-row horizontal scroll for board columns

Switch .hermes-kanban-columns from auto-fit CSS grid to a flex row with
overflow-x: auto and a hidden scrollbar (scrollbar-width / ::-webkit-
scrollbar), and pin .hermes-kanban-column to flex: 0 0 280px so columns
sit side-by-side at a fixed width instead of wrapping into a 2xN grid.

Page vertical scroll is unaffected: each column already caps at
max-height: calc(100vh - 220px), so the container never grows tall
enough to introduce its own vertical scrollbar.

* fix(approval): surface pending-approval state with explicit marker visible to LLM

When a tool call requires user approval in the non-blocking gateway path,
the LLM previously received a result that was indistinguishable from a
failed tool call (exit_code=-1, error=message). The LLM could not tell
whether the tool was pending approval, had returned empty results, or had
failed silently — causing it to burn context on wrong hypotheses.

Fix changes the result format to include:
- status: pending_approval (clear state name)
- approval_pending: True (explicit boolean for LLMs to detect)
- error: cleared to empty string (removes misleading error signal)

This lets the LLM reason about approval latency vs actual errors,
short-circuiting the previous silent failure mode.

Fixes #14806

* fix: recognize emoji and caret as natural response endings

GLM models via Ollama report finish_reason='stop' even when the
response was truncated by max_tokens. The continuation mechanism
uses _has_natural_response_ending() as one of the heuristics to
detect whether the response was genuinely finished.

Currently only ASCII punctuation and CJK punctuation are recognized.
This means any response ending with an emoji (e.g. ⚡, 👍) or the
caret character ^ (common in French ^^ smiley) is not recognized as
naturally ended, triggering a false-positive continuation where the
model receives 'Continue where you left off' and produces garbled
output.

Add:
- ^ (caret) to the punctuation set
- Unicode emoji range (codepoint >= 0x1F300) as natural ending

This only affects GLM/Ollama users but the fix is safe for all
backends since _has_natural_response_ending() is only consulted
inside the continuation flow.

* chore(release): pre-stage AUTHOR_MAP for May 2026 LHF batch group 8 (#28328)

Pre-stages AUTHOR_MAP entries for 10 new contributors whose PRs are being
salvaged in the May 2026 low-hanging-fruit batch (group 8). Lands ahead
of the per-PR salvage PRs so they don't get blocked by AUTHOR_MAP CI.

Contributors:
- AceWattGit (#28159 — _pool_may_recover_from_rate_limit NameError)
- YuanHanzhong (#28032 — x.com/status fallbacks link-like)
- colin-chang (#28245, #28249, #28251 — gateway + mattermost fixes)
- felix-windsor (#28019 — preserve cron asterisks in strip mode)
- houenyang-momo (#28205 — charizard completion menu contrast)
- iqdoctor (#28095 — windows installer docs)
- joe102084 (#28151 — whitespace-only cron responses)
- jvinals (#27936 — Slack U-IDs → DM channel)
- maxmilian (#28267 — ModelPickerDialog portal)
- samggggflynn (#27952 — dingtalk pre_start)

Per references/batch-pr-salvage-may14-additions.md.

* fix: add pre_start() to _IncomingHandler for dingtalk SDK compatibility

The dingtalk-stream SDK calls pre_start() on every registered handler
before opening the WebSocket connection. Without this method, the SDK
raises AttributeError and kills the stream connection, causing DingTalk
to be unable to connect via Stream Mode.

* fix(windows): handle redirected stdout in _cprint fallback

Wraps _pt_print in try/except with a print() fallback. When a
kanban worker's stdout is piped to a log file, prompt_toolkit
raises NoConsoleScreenBufferError (Windows) or OSError (other)
because there is no real console buffer. The fallback keeps
worker output flowing instead of crashing.

* chore(release): alias stale-ID salvage commit for @Grogger (#28334)

PR #28330 was salvaged with a wrong noreply numeric ID (18091625 vs
the correct 7065068). The commit on main is correctly authored to
Grogger by username, but neither noreply form was in AUTHOR_MAP.
Adds both so release-notes generation maps them to @Grogger.

* fix(aux): remove stale session_search model menu entry

* fix(tui): keep x status citation fallbacks link-like

* fix(xai-oauth): quarantine dead tokens on terminal refresh failure

resolve_xai_oauth_runtime_credentials() called _refresh_xai_oauth_tokens()
with no try/except. A terminal refresh failure (HTTP 400/401/403 —
invalid_grant, token revoked) propagated without clearing the dead
access_token / refresh_token from auth.json, causing every subsequent
session to retry the same doomed network request.

Add a try/except around the refresh call that mirrors the existing
credential_pool.py quarantine: when _is_terminal_xai_oauth_refresh_error
identifies a non-retryable failure, clear the dead token fields from
auth.json and write a last_auth_error diagnostic marker so future calls
fail fast with a clear relogin_required error instead of hitting the
network.

active_provider is preserved (set_active=False) so multi-provider users
whose chosen provider is not xai-oauth are unaffected.

Tests: two new cases in test_auth_xai_oauth_provider.py cover terminal
quarantine and transient pass-through.

* feat(bg-review): add bundled/pinned skill protection rules to review prompts (#27644)

The background review prompts (_SKILL_REVIEW_PROMPT and
_COMBINED_REVIEW_PROMPT) now include explicit protection rules
for bundled, hub-installed, and pinned skills — aligning with
the curator's existing policy at curator.py L345/350.

Before this change, bg-review could freely rewrite bundled skills
like 'hermes-agent' or pinned skills, while the 7-day curator
explicitly skips them.

The review agent now sees:
  • Bundled skills (shipped with Hermes)
  • Hub-installed skills (installed via hermes skills install)
  • Pinned skills (marked via hermes curator pin)
If only protected skills need updating, the review says
'Nothing to save.' and stops.

Fixes #27644

* fix(web): portal Change Model modal so it renders above the app sidebar

The dashboard's main column is `relative z-2` (App.tsx), which creates a
stacking context that traps fixed descendants below the app sidebar
(`z-50`). `ModelPickerDialog` renders `fixed inset-0 z-[100]` inline,
so its z-100 is scoped to z-2 and the sidebar covers its left edge.

The bug is visible across all themes but only obvious in the Large theme
variants (Hermes Teal (Large), etc.) where the larger root font widens
the dialog into the sidebar's column. Toast.tsx already documents the
same trap and uses the same `createPortal(..., document.body)` escape.

This commit ports the picker; the same pattern affects other inline
z-[100] modals in the dashboard (OAuthLoginModal, Cron / Models /
Profiles page modals) and is left for a follow-up — keeping this PR
scoped to the reporter's specific case.

Fixes #28103

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(gateway): exit code 75 on service restart so launchd relaunches

When the gateway receives SIGUSR1 (graceful restart via launchd_restart),
the SIGUSR1 handler calls request_restart(via_service=True) and the
gateway shuts down cleanly with exit code 0.

However, the generated launchd plist uses KeepAlive → SuccessfulExit →
false, meaning launchd only relaunches on *non-zero* exit codes.  A
clean exit(0) is treated as "successful, don't restart", so the
gateway stays down after /restart, /update, or SIGUSR1.

The systemd unit template already uses RestartForceExitStatus=75 for the
same scenario.  Mirror that convention: when _restart_via_service is
True, raise SystemExit(75) so launchd's SuccessfulExit=false policy
triggers a relaunch.

Closes #28135

* fix: guard json.loads() against invalid TTS and skill_view responses

Two code paths call json.loads() on output from external tools without
catching JSONDecodeError. If the tool returns a non-JSON string (error
message, empty string, or None), the entire call path crashes.

1. gateway/run.py — text_to_speech_tool() result in voice reply path.
   A TTS failure that returns an error string instead of JSON crashes
   the voice reply handler, killing the message response entirely.

2. cron/scheduler.py — skill_view() result when loading skills for
   cron jobs. A corrupted or missing skill file that returns an error
   string instead of JSON crashes the cron tick, preventing all jobs
   from executing that cycle.

Both fixes catch (json.JSONDecodeError, TypeError), log a warning,
and gracefully skip the failed operation instead of crashing.

* fix(gateway): bridge gateway_restart_notification from YAML platform sections

Two related bugs in gateway/config.py prevented per-platform
gateway_restart_notification from working through config.yaml:

1. The shared-key bridging loop (load_gateway_config) omitted
   'gateway_restart_notification', so the key never landed in
   platform_data['extra'] even when set under e.g. 'discord:' or
   'mattermost:' sections.

2. PlatformConfig.from_dict() only read gateway_restart_notification
   from the top-level data dict, ignoring the 'extra' sub-dict where
   bridged keys are stored.

Fix: add the key to the bridging loop, and add an 'extra' fallback
in from_dict() so that round-tripped values (YAML → bridged → extra
→ from_dict) resolve correctly.

Impact: users can now set gateway_restart_notification: false per
platform in config.yaml instead of relying on env vars or the
global platforms: block.

* feat(kanban): add auto_promote_children config toggle

When the kanban auto-decomposer fans a triage task into child tasks,
recompute_ready() immediately promotes parent-free children to 'ready'
so the dispatcher picks them up. Some users want a manual workflow
where children stay in 'todo' for review before dispatch.

Add 'kanban.auto_promote_children' config key (default: true):
- false: children stay in 'todo' after decomposition
- true: existing behavior (auto-promote to 'ready')

Changes:
- kanban_db.py: decompose_triage_task() gains auto_promote param
- kanban_decompose.py: reads auto_promote_children from config
- kanban dashboard API: exposes the new setting in GET/PUT /orchestration

Closes #28016

* fix: wrap _pool_may_recover_from_rate_limit call through run_agent namespace

The conversation_loop.py references _pool_may_recover_from_rate_limit which
was defined in run_agent.py. After the conversation-loop extraction refactor,
the helper was no longer in the same module scope. Wrap the call as
_ra()._pool_may_recover_from_rate_limit() to route through the run_agent
monkeypatch namespace where the helper is available.

Adds regression test in test_gemini_fast_fallback.py.

Fixes: MAILROOM Email Triage NameError, OPS Execution Monitor NameError.

* fix(tui): improve charizard completion menu contrast

* docs(windows): avoid piping installer directly into iex

* fix(agent): add qwen and deepseek to TOOL_USE_ENFORCEMENT_MODELS

Qwen3.x and DeepSeek-V3.x default to chatty/hallucinatory tool use without
enforcement steering — agents narrate "calling tool X" without actually
emitting a tool call, or run partial loops. Both model families fit the
same failure pattern TOOL_USE_ENFORCEMENT_GUIDANCE was already injected
for (gpt, codex, gemini, gemma, grok, glm).

Co-authored-by: briandevans <252620095+briandevans@users.noreply.github.com>

Squashed salvage of:
- 403e567ce fix(agent): add qwen and deepseek to TOOL_USE_ENFORCEMENT_MODELS
- 9433eabe7 test(agent): use realistic qwen-plus identifier in enforcement test

Fixes #28079.

* fix(send_message): resolve Slack user IDs to DM channel IDs

The _SLACK_TARGET_RE regex only matched IDs starting with C (channel),
G (group), or D (direct message). Slack user IDs start with U, causing
'Could not resolve' errors when trying to send DMs to specific users.

Changes:
- Expand _SLACK_TARGET_RE to accept U-prefixed IDs (user IDs)
- Add conversations.open fallback to resolve user IDs to DM channel
  IDs before sending, since chat.postMessage requires a conversation ID

Fixes #ISSUE_NUMBER

* fix(gateway): tighten MEDIA extraction regex + silent skip on file-not-found

Three related fixes for the MEDIA:<path> extraction pipeline that
caused 'file not found' noise in platform channels:

1. run.py — tighten tool-result MEDIA regex from \S+ (any non-
   whitespace) to require a path pattern with known extensions.
   Prevents LLM-generated placeholder paths like
   'MEDIA:/path/to/example.mp4' from being captured as real media.

2. base.py — remove the |\S+ fallback in extract_media() that
   catches anything non-whitespace as a potential MEDIA path.
   This was the primary cause of false positives — strings like
   '' in tool output were captured as MEDIA: paths.

3. mattermost.py — replace the file-not-found error message sent
   to the channel with a silent logger.warning() skip. When a
   path extracted by MEDIA doesn't exist on disk, the channel
   no longer gets a noisy '(file not found: ...)' message.

Impact: eliminates the persistent 'file not found' spam in
Mattermost channels caused by over-broad MEDIA regex patterns
matching non-path text in tool output.

* fix(xai-oauth): split 403 (tier/entitlement) from 400/401 in token endpoint

xAI's token endpoint returns HTTP 403 to the OAuth grant when the
account isn't on the allowlist for API access (e.g. standard
SuperGrok subscribers — see #26847). Treating it like a stale-token
400/401 made ``format_auth_error`` append "Run ``hermes model`` to
re-authenticate", which is misleading because re-login can't change
xAI's tier decision.

Split 403 off in both ``refresh_xai_oauth_pure`` and the loopback
login token exchange:

* New error code ``xai_oauth_tier_denied`` with ``relogin_required=False``
* Message explains the entitlement gate and points at the
  ``XAI_API_KEY`` + ``provider: xai`` fallback
* 400/401 still set ``relogin_required=True`` as before
* 5xx still set ``relogin_required=False`` as before

* fix(run-agent): treat any 403 on xai-oauth as entitlement to stop refresh-loop

The existing ``_is_entitlement_failure`` heuristic only fires when
the response body contains specific substrings ("do not have an
active Grok subscription", etc.). xAI has been seen to 403 standard
SuperGrok subscribers with a terser body that doesn't match those
keywords (#26847), and the recovery path would then mint a fresh
token, get a fresh 403, and loop until Ctrl+C.

Add a defense-in-depth check at the recovery call site: any 403 on
``provider == "xai-oauth"`` short-circuits ``try_refresh_current``
so the error surfaces immediately with the friendly hint from
``_summarize_api_error``. Keeps the existing keyword path for all
other providers untouched.

* test(xai-oauth): pin tier-denied 403 behavior + docs warning for #26847

Tests:

* ``test_refresh_xai_oauth_pure_403_marked_tier_denied_not_relogin`` —
  refresh-403 raises ``xai_oauth_tier_denied`` with
  ``relogin_required=False`` and the API-key fallback hint in body.
* ``test_format_auth_error_tier_denied_does_not_suggest_relogin`` —
  the renderer does not append "Run ``hermes model``" for the new
  code.
* ``test_recover_with_credential_pool_skips_refresh_on_bare_403_for_xai_oauth`` —
  bare ``{"reason":"forbidden","message":"Forbidden"}`` body (which
  does not match the existing keyword heuristic) still short-circuits
  ``try_refresh_current`` on xai-oauth.

Docs:

* Drop the "(any active tier)" claim from the xai-grok-oauth guide,
  add a top-of-page warning callout, and a Troubleshooting section
  for the 403-after-login case pointing at ``XAI_API_KEY`` +
  ``provider: xai`` as the documented fallback.

* fix: handle whitespace-only cron responses

* fix(cli): preserve cron asterisks in strip mode

* fix(mattermost): resolve thread root_id and route progress to threads

Two Mattermost thread-related bugs:

1. _resolve_root_id() — Mattermost CRT requires root_id to be the
   thread root post. Using any reply's own ID as root_id causes
   '400 Invalid RootId'. Add _resolve_root_id() that walks up the
   post chain via API to find the actual root, and apply it in
   send(), _send_url_as_file(), and _send_local_file().

2. _progress_reply_to — The condition in run.py only checked
   Platform.FEISHU, missing Mattermost entirely. This caused tool
   progress messages to always land in the main channel instead of
   the thread. Add Platform.MATTERMOST to the condition so
   progress messages are routed to threads when reply_mode=thread.

Impact: Tool progress messages now appear in Mattermost threads
instead of flooding the main channel; thread replies no longer
fail with Invalid RootId when the reply target is itself a reply.

* feat(kanban): archive --rm to hard-delete archived tasks

Salvages #19964 by @Beandon13. Adds `hermes kanban archive --rm` to
permanently remove already-archived tasks with cascading cleanup of
links, comments, events, runs, and notify-subs. Safety guard: only
archived tasks can be deleted; active/blocked/done must be archived
first.

Cherry-picked from #19964 onto current main (severe stale base, applied
manually to preserve substance only).

* feat(proxy): add xai upstream adapter for Grok via OAuth

* chore(release): map @yannsunn for PR #28064 xai proxy adapter salvage

* docs(skill): align kanban dispatcher failure_limit text with current default

* fix(oauth): add manual-paste fallback for browser-only remote consoles

xAI Grok OAuth (and Spotify) use a loopback redirect to
``http://127.0.0.1:<port>/callback`` to capture the authorization
code. That works when the browser and Hermes run on the same
machine, and the SSH tunnel recipe handles the regular remote
case. It breaks completely on **browser-only remote consoles**
(GCP Cloud Shell, GitHub Codespaces, AWS EC2 Instance Connect,
Gitpod, Replit, …) where the user has a browser but no real SSH
client to forward a port — the redirect to 127.0.0.1 on the
remote VM simply isn't reachable from the laptop, and there's
nothing the existing flow can do about it (#26923).

This commit adds the foundation for a manual-paste fallback:

* ``_is_remote_session`` now also recognises Cloud Shell,
  Codespaces, Gitpod, Replit, StackBlitz (in addition to SSH),
  so the existing tunnel hint at least fires in those
  environments.
* ``_parse_pasted_callback`` accepts any of: a full
  ``http(s)://...?code=...&state=...`` URL, a bare ``?code=...``
  query string, a bare ``code=...&state=...`` fragment, or a
  bare opaque code value.  Returns the same dict shape the HTTP
  callback handler produces, so the caller's state / error
  validation works unchanged (no CSRF bypass).
* ``_prompt_manual_callback_paste`` reads stdin with a clear
  multi-line explanation of what's happening and what to paste.
* ``_xai_oauth_loopback_login`` gains a ``manual_paste`` kwarg
  that skips the HTTP listener entirely.  The redirect_uri,
  PKCE verifier, state, and nonce are byte-identical to the
  loopback path so xAI's token endpoint can't tell the
  difference at the protocol level.
* ``_print_loopback_ssh_hint`` now also mentions
  ``--manual-paste`` so users without a real SSH client see a
  path forward instead of a dead-end tunnel recipe.
* ``_login_xai_oauth`` threads ``args.manual_paste`` into the
  loopback helper.

* feat(cli): wire --manual-paste into ``hermes auth add`` and ``hermes model``

Register the new ``--manual-paste`` flag on both entry points and
thread it through to the xAI loopback login:

* ``hermes auth add xai-oauth --manual-paste`` — pool-add path,
  forwarded inside ``auth_commands.handle_auth_add``.
* ``hermes model --manual-paste`` — model-picker path, forwarded
  by ``_model_flow_xai_oauth`` into the synthetic ``argparse.Namespace``
  it passes to ``_login_xai_oauth``.  The picker also now forwards
  ``--no-browser`` and ``--timeout`` for consistency (previously
  hardcoded to defaults regardless of CLI flags).

Help text on both flags points at #26923 and names the
browser-only remote consoles (Cloud Shell, Codespaces, EC2
Instance Connect) so users searching ``hermes --help`` can find
the workaround.

* test+docs(oauth): pin manual-paste semantics and document browser-only path (#26923)

Tests (``tests/hermes_cli/test_auth_manual_paste.py``):

* 9 parametrised + scalar cases for ``_is_remote_session`` covering
  the new Cloud Shell / Codespaces / Gitpod / Replit / StackBlitz
  env vars (plus the existing SSH ones).
* 9 cases for ``_parse_pasted_callback`` covering every paste form
  (full URL, https URL with extra params, bare ``?code=...``, bare
  ``code=...`` fragment, bare opaque value, error+description,
  empty, whitespace-only, malformed URL).
* 3 cases for ``_prompt_manual_callback_paste`` (happy path, EOF,
  Ctrl-C).
* 3 end-to-end ``_xai_oauth_loopback_login(manual_paste=True)``
  cases: the HTTP server MUST NOT be started (asserted via a
  callable that raises if invoked), wrong state still rejected
  with ``xai_state_mismatch`` (no CSRF bypass), and empty paste
  surfaces ``xai_code_missing``.
* SSH-hint mention test ensures the ``--manual-paste`` instruction
  is printed in the remote-session hint.

Docs:

* ``oauth-over-ssh.md`` — new "Browser-only remote (Cloud Shell /
  Codespaces / EC2 Instance Connect)" section with the
  ``--manual-paste`` recipe, plus a TL;DR note for the new flag.
* ``xai-grok-oauth.md`` — short subsection pointing at the same
  recipe and the OAuth-over-SSH guide anchor.

* docs(kanban): document max-retries task override

* docs(kanban): document inline create shortcuts

* test(kanban): cover default board dashboard pin

* docs: ignore box diagrams in ascii guard

Wrap existing box-drawing diagrams with ascii-guard markers so docs-site checks pass when website docs are touched.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat: per-task model override for kanban workers

- Add model_override field to Task class and tasks schema
- Add migration for existing databases
- Spawn worker with -m model when model_override is set

* test(kanban-dashboard): cover _task_dict task_age fallback

The fix in 061a1830 added an outer try/except in plugin_api._task_dict
so that a future failure mode in kanban_db.task_age (anything _safe_int
doesn't already absorb) cannot 500 the GET /board response. The
_safe_int / task_age corruption paths got regression coverage in
tests/hermes_cli/test_kanban_db.py, but the OUTER fallback contract
remained untested -- meaning a refactor that drops the try/except would
not be caught by CI.

Pin that contract from both consumers of _task_dict:
- GET /board returns 200 with the literal fallback age dict for the
  affected card (other cards continue to render via the same path)
- GET /tasks/:id (drawer view) returns 200 with the same fallback,
  so a single corrupt task can't block its own drawer

Both tests force task_age to raise RuntimeError rather than ValueError
on '%s', because ValueError is absorbed by _safe_int and never reaches
the outer try/except -- testing that path would only re-cover what
test_kanban_db.py already pins.

Manually verified the regression discipline:
  git checkout 061a1830^ -- plugins/kanban/dashboard/plugin_api.py
  pytest -k task_age_exception        # both FAIL with 500
  git checkout HEAD -- plugins/kanban/dashboard/plugin_api.py
  pytest -k task_age_exception        # both PASS

* fix(kanban): clear _INITIALIZED_PATHS in remove_board so recycled DBs re-init schema

Archiving or deleting a board via remove_board() leaves the path's
"schema already initialized" entry in the module-level cache. A
concurrent connect(board=<slug>) call (e.g. the dashboard event-stream
poll loop) then:

  1. resolves the same kanban.db path,
  2. recreates the directory + an empty sqlite file because
     connect() does mkdir(parents=True, exist_ok=True),
  3. skips the CREATE TABLE pass because the cache entry says the
     schema is already in place,
  4. errors on the next read with `no such table: task_events`.

Drop the cache entry before mutating the filesystem so the fresh file
gets a proper schema init on next connect(). Applies to both
archive=True (rename) and archive=False (rmtree) branches.

Fixes #23833.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(web): add Cache-Control: no-store to plugin static file serving

Prevents browser caching of stale dashboard plugin JS files that may
contain bugs already fixed upstream (e.g. COLUMN_LABEL undefined).

* fix(kanban): seed bundled skills (e.g. kanban-worker) on kanban init

Closes #23725

* fix(kanban): ignore stale HERMES_KANBAN_BOARD for removed boards

* fix(kanban): keep board-management commands independent from board override

* fix(kanban): preserve notifier_profile for dashboard home subscriptions

* fix(kanban): promote dependents when a parent is archived

* fix(cli): make kanban specify max_tokens configurable

* fix(kanban): sync slash subcommands with live parser

* fix(kanban): promote blocked tasks when parent dependencies complete

recompute_ready only scanned 'todo' tasks for promotion, ignoring
'blocked' tasks entirely. When a task was blocked (e.g. by the circuit
breaker) and its parent dependencies later completed, the task stayed
stuck in 'blocked' forever unless manually unblocked.

Now recompute_ready also scans 'blocked' tasks. When all parents are
done/archived, the blocked task is promoted to 'ready' with failure
counters reset — equivalent to an automatic unblock.

Includes a regression test for the blocked-parent-done promotion path.

* fix(kanban): use 'is not None' check for max_runtime_seconds in create_task

max_runtime_seconds=0 was being silently coerced to None due to a falsy
check (if max_runtime_seconds). Zero is a valid value that causes the
dispatcher to immediately time out a task. The adjacent max_retries
parameter already used the correct 'is not None' pattern.

Fixes the inconsistency by aligning max_runtime_seconds with max_retries.

* fix(kanban): reset failure counters on unblock_task

When a task is manually unblocked (blocked → ready/todo), the
consecutive_failures counter and last_failure_error were left intact.
The next failure would immediately re-trip the circuit breaker because
the counter was still at or above the failure limit.

Reset both fields on unblock so the task gets a fresh retry budget.

Includes a regression test that verifies counters are zeroed.

* fix(kanban): fingerprint crash errors to prevent fleet-wide retry exhaustion

When a systemic failure (provider outage, auth expiry, OOM) crashes
multiple workers simultaneously, detect_crashed_workers increments
each task failure counter independently. The circuit breaker only
trips after N × failure_limit retries across the fleet.

Fingerprint crash errors by normalizing host-specific details (PIDs,
timestamps). When 3+ tasks crash with the same fingerprint in a
single detection cycle, immediately trip the circuit breaker
(failure_limit=1) instead of waiting for repeated failures.

Isolated crashes (unique fingerprints) retain their normal retry
budget. Protocol violations continue to trip immediately.

Includes regression tests for systemic and isolated crash paths.

* fix(kanban): align board_exists with board discovery rules

* fix(kanban): demote ready children when a parent is reopened

* fix(kanban): serialize DB initialization

* fix(kanban): task_age() tolerates ISO-8601 timestamps

Prevents ValueError crash in dashboard get_board() when a task has
an ISO timestamp (e.g. "2026-05-10T15:00:00Z") instead of a unix epoch
int. Adds _to_epoch() helper that normalises both formats.

* Fix Kanban dashboard initial board selection

* fix(kanban): persist worker session metadata on completion

Salvages #25579 by @wesleysimplicio. Stamps task_runs.metadata.worker_session_id
from HERMES_SESSION_ID on kanban_complete. Cherry-picked the substantive
commit (not the AUTHOR_MAP fixup tip) onto current main.

* fix(kanban): make claim ttl configurable

Co-Authored-By: Paperclip <noreply@paperclip.ing>

* fix(kanban): pass accept-hooks to worker chat subprocess

* feat(kanban): add board-level default workdir (#25430)

* docs(kanban-worker): document notification routing configuration

* fix(kanban): preserve worker tools with restricted toolsets

* fix(kanban): make legacy task migration idempotent

(cherry picked from commit 293f1c3a7241b0117669e049d9aa746c9645ac90)

* fix: harden Kanban worker Hermes command resolution

* feat(kanban): allow trimmed task comments

SS-1647 live SHIP validation: real code + tests for kanban comment --max-len.

* fix: show scheduled kanban tasks in dashboard

* fix: assign single-task kanban decompositions

* fix(kanban-dashboard): make Orchestration mode checkbox label static

The checkbox label echoed its state ("Auto (default)" / "Manual") instead
of describing the action, so a checked box reading "Auto" parsed as a
status indicator rather than a control. The accompanying sub-description
was also static and started with "When on, ...", which read awkwardly
when the box was unchecked.

Replace the dynamic label with a static action label
("Auto-decompose triage tasks") and flip the sub-description between the
two modes so it stays accurate either way. The top-of-page Orchestration
pill is unchanged — that one is intentionally a status badge / toggle.

Fixes #28178

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(env): add HERMES_KANBAN_DISPATCH_IN_GATEWAY override (#21956)

Salvages the env-vars docs portion of #21956 by @Bartok9.
The ascii-guard-ignore tags from the original PR already landed on main.

* fix(kanban): close sqlite connection on init failure to prevent fd leak

Salvages #28301 by @Ade5954. If WAL setup, PRAGMA application, or schema
init raises after sqlite3.connect() succeeds, the new connection was
leaking. Wrap the body in try/except so the connection is closed before
the exception propagates.

* fix(kanban): don't crash dispatched workers when kanban-worker skill is absent

Salvages #27372 by @oemtalks. The dispatcher unconditionally injected
`--skills kanban-worker` into every worker spawn, but worker profiles
sometimes don't have that bundled skill in their skills dir, which is
fatal at CLI startup (`ValueError: Unknown skill(s): kanban-worker`).

Adds `_kanban_worker_skill_available(hermes_home)` and only injects the
flag when the skill resolves. The MANDATORY lifecycle still ships via
KANBAN_GUIDANCE in the system prompt, so omitting the flag is safe.

* fix(packaging): ship dashboard plugin assets in wheel

Salvages #23737 by @LeonSGP43. Adds plugins/* manifest.json and dist/
glob entries to setuptools package-data so wheel installs ship the
bundled dashboard plugin assets (kanban, achievements, etc.). Without
these, /api/dashboard/plugins can't discover plugin assets outside a
source checkout.

* docs(kanban): document worker protocol auto-blocks

Salvages #21585 by @helix4u. Documents the protocol_violation event
(worker exits successfully while task is still running), adds
--max-retries to the create flag list and --failure-limit to dispatch.

* fix(oneshot): pass fallback_providers from profile config to AIAgent

Salvages #23368 by @uzunkuyruk. Oneshot workers (e.g. kanban workers
spawned via 'hermes -p <profile> chat -q ...') were not honouring the
profile's fallback_providers / fallback_model chain because oneshot.py
never read the config and never passed fallback_model= to AIAgent.

Reads cfg.get('fallback_providers') (new list format) or
cfg.get('fallback_model') (legacy single-dict) with the same
normalization cli.py applies, then forwards as fallback_model=_fb.

* fix(kanban): reject direct running transitions in dashboard bulk updates

Salvages #24050 by @kronexoi. The single-task PATCH already rejects
direct status='running' since it bypasses the dispatcher/claim invariant,
but the bulk-update endpoint still accepted it. Aligns bulk with single
by emitting an error result row for any 'running' entry.

* feat(kanban): add initial-status for human-ops cards

Salvages #27526 by @shunsuke-hikiyama. Adds an --initial-status flag
(running|blocked, default running) to 'kanban create', threaded through
kanban_db.create_task() and the kanban_create tool schema. 'blocked'
parks the task directly in the blocked column for R3 human-ops review,
skipping the brief running-to-blocked transition.

Dropped the unrelated 'add' alias, WIFEXITED Windows compat, and
slash-handler error formatting changes that were bundled in the
original PR — those should ship as their own focused changes if still
wanted.

* fix(kanban): release scratch workspace and tmux session on task completion

Salvages #27369 by @LeonJS. complete_task() now calls _cleanup_workspace()
and _cleanup_worker_tmux() after marking a task complete.

Scratch workspaces (used by swarm agents) accumulate on disk — hundreds
of MB per task, never released. Stale tmux sessions from completed
agents also persist indefinitely.

Both gates are safe:
- workspace_kind == 'scratch' gate preserves user worktree/dir workspaces
- tmux #{pane_dead} == 1 gate only kills sessions where the worker has
  already exited
- best-effort: cleanup failures never block task completion

* fix(kanban): honor severity thresholds in diagnostics

Salvages #26431 by @LeonSGP43. Dashboard plugin_api list_diagnostics
was using exact-match (severity == filter), so '--severity warning'
hid 'error' and 'critical' diagnostics. Adds severity_at_or_above()
helper to kanban_diagnostics and uses it in the dashboard endpoint
(CLI already used SEVERITY_ORDER comparison correctly).

* test: isolate Kanban env pins in hermetic fixture

Salvages the substantive part of #22295 by @steezkelly. Adds the
missing HERMES_KANBAN_HOME, HERMES_KANBAN_RUN_ID, HERMES_KANBAN_CLAIM_LOCK,
HERMES_KANBAN_DISPATCH_IN_GATEWAY entries to _HERMES_BEHAVIORAL_VARS so
ambient developer-shell pins on those vars don't bleed into pytest runs.

The frozenset extraction + standalone regression test from the original
PR were dropped to keep the change minimal — main already maintains the
list inline.

* feat(kanban): add max_in_progress config to cap concurrent running tasks

Salvages #22981 by @SimbaKingjoe. Adds 'kanban.max_in_progress' config
that caps simultaneously running tasks. When the board already has N
running, dispatcher skips spawning so slow workers (local LLMs,
resource-constrained hosts) don't pile up and time out.

Threads through dispatch_once(max_in_progress=) and gateway dispatcher
config parsing with validation (warns on invalid/below-1 values).

* fix(packaging): ship bundled skills in wheel

Salvages #23738 by @LeonSGP43. Wheel installs were missing skills/ and
optional-skills/ because pyproject's [tool.setuptools.packages.find]
only includes Python packages — the skills directories don't have
__init__.py so they were silently dropped from the wheel.

Adds setup.py with data_files spec emitting skills/* and optional-skills/*
under hermes_agent-<v>.data/data/, and a get_bundled_skills_dir() helper
in hermes_constants that discovers the wheel-installed location via
sysconfig before falling back to a source-checkout path. tools/skills_sync
uses the helper so 'hermes update' works for pip-installed users.

* fix: 4 small surgical bugs

Salvages #23302 by @Bartok9. Four independent one-area fixes:

1. kanban boards delete alias now hard-deletes (not archives) — the
   alias didn't carry --delete, so getattr(args, 'delete', False)
   returned False. Detect boards_action=='delete' explicitly.
2. Gateway auto-title failures no longer leak as user-visible
   warnings — debug-log only since they're not actionable.
3. Background process completion notification snaps truncation to
   the next newline boundary, prepends a marker when content is
   dropped.
4. _cprint() schedules the run_in_terminal coroutine via
   asyncio.ensure_future so output isn't silently dropped from
   background threads (fixes #23185 Bug A). Skips the
   double-print fallback that would fire for mock paths.

* perf(prompt): cache kanban worker guidance at session init

Salvages #24402 by @RyanRana. The KANBAN_GUIDANCE block (~835 tokens)
is session-static — the dispatcher decides at spawn time whether the
process is a kanban worker via the kanban_show tool's check_fn (gated
on HERMES_KANBAN_TASK env var). Re-checking 'kanban_show' in
valid_tool_names and re-loading the reference on every system-prompt
rebuild (init + each context compression) is wasted work.

Caches the resolved string on agent._kanban_worker_guidance once in
agent_init and consumes it in system_prompt.build_system_prompt(),
with a getattr fallback for code paths that bypass agent_init.

* feat(kanban): add --sort option to 'hermes kanban list'

Salvages #25745 by @LizerAIDev. Adds --sort {created,created-desc,
priority,priority-desc,status,assignee,title,updated} to 'hermes kanban
list'. Validated against VALID_SORT_ORDERS map; invalid values raise
ValueError. Default behaviour (priority DESC, created ASC) is unchanged
when --sort is omitted.

* docs: add kanban codex lane skill

* feat(kanban): worker visibility endpoints (workers/active, runs/{id}, inspect)

Adds three read-only endpoints to the kanban dashboard plugin so the
SwitchUI workspace (and any other dashboard consumer) can track
workers across tasks without N+1 round-trips through /tasks/{task_id}.

- GET /workers/active
  Single SQL JOIN of task_runs + tasks where ended_at IS NULL,
  worker_pid IS NOT NULL, status='running'. Returns
  {workers: [...], count, checked_at}.

- GET /runs/{run_id}
  Direct lookup of any task_run row by id. Reuses existing
  kanban_db.get_run() helper and _run_dict() serialiser. 404 when
  not found. Mirrors GET /tasks/{task_id} 404 shape.

- GET /runs/{run_id}/inspect
  Live PID stats via psutil.Process.as_dict() — cpu_percent,
  memory_rss_bytes, memory_vms_bytes, num_threads, num_fds, status,
  create_time, cmdline. Short-circuits with alive:false when run
  has ended, has no worker_pid, the pid is gone, or psutil is
  unavailable. AccessDenied surfaces as alive:true with error
  rather than a 500.

11 new tests in tests/plugins/test_kanban_worker_runs.py cover the
empty-board case, running-task case, ended-run filtering,
missing-pid filtering, 404 paths, already-ended inspect, no-pid
inspect, dead-pid inspect, and live-pid inspect (psutil mocked).
All pass.

Companion termination endpoint (POST /runs/{run_id}/terminate) is
intentionally out of scope here — opening a separate issue first
since the RBAC and dispatcher-mediated soft-cancel design needs
maintainer input before code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): map contributor email for attribution check

* test(kanban-dashboard): pin enriched 409 detail and inline error wiring (#26744)

- Existing ``test_patch_drag_drop_move_todo_to_ready`` now asserts the
  enriched 409 detail names the blocking parent (id, quoted title, and
  current status), so the dashboard always has something actionable to
  render.
- New bundle-assertion test ``test_dashboard_surfaces_ready_blocked_error_inline``
  pins the frontend wiring: the ``parseApiErrorMessage`` helper exists,
  the drag/drop banner runs through it, and the drawer maintains a
  visible ``patchErr`` state that's cleared between PATCHes and tasks.

* docs(codex_app_server): document multi-root Kanban writable_roots (#27941)

Update the Codex app-server runtime guide's Kanban section to reflect
the new behaviour:

  * The sandbox override now adds the board DB directory plus every
    Kanban path the dispatcher pinned (HERMES_KANBAN_WORKSPACES_ROOT,
    HERMES_KANBAN_WORKSPACE, legacy HERMES_KANBAN_ROOT) -- deduplicated,
    DB-dir first.
  * The motivation note now includes the cross-mount artifact-write
    scenario (e.g. ``/media/.../kanban-workspaces/...`` on a separate
    drive) and links to issue #27941 so readers can find the original
    bug report.

* fix(gateway): quiet corrupt kanban dispatcher boards

Salvages substantive part of #26490 by @aqilaziz. Detects corrupt board
DBs ("file is not a database" / "database disk image is malformed")
and disables them by fingerprint until they're repaired, instead of
flooding the gateway log with repeated logger.exception tracebacks every
tick.

Cherry-picked the substantive commit (ea5b4ec2a); the tip commit was
an unrelated _is_dir OSError fix for service-path lookup. Dropped a
small test reformat that was bundled in the same commit.

* docs: align kanban readiness docs and smoke tests

Salvages #28199 by @bensargotest-sys. Aligns Kanban docs with current
tool registration: dispatcher-spawned task workers get task tools,
profiles that explicitly enable the kanban toolset get orchestrator
routing tools (kanban_list, kanban_unblock). Corrects failure-limit
text to current default of 2. Hardens the e2e subprocess script to
resolve repo root and use the spawnable default assignee. Updates the
diagnostics severity fixture to assert error below the critical
threshold.

* feat(kanban): surface per-task model_override in show + tool output

Salvages #26897 by @loicnico96. The per-task model_override DB column
already exists on main, but it wasn't exposed in user-facing surfaces.
This adds:
- 'kanban show' prints 'model: <name>' when model_override is set
- kanban_show / kanban_list tool responses include the model_override field

Original branch was stale (PR was authored against an older field name
'model'); applied the substantive surface exposure manually using the
current 'model_override' field name.

* feat(cli): add kanban swarm topology helper

Salvages #26791 by @Niraven. Adds 'hermes kanban swarm' to create a
durable Kanban Swarm v1 graph: a completed root/blackboard card,
parallel worker cards, a verifier gated on all workers, and a
synthesizer gated on the verifier. Stores shared swarm blackboard
updates as structured JSON comments on the root card.

Self-contained: new hermes_cli/kanban_swarm.py module + CLI wiring +
unit tests.

* feat(kanban): add optional board parameter to all MCP tools

Salvages #27598 by @nnnet. Adds optional 'board' parameter to all 9
kanban_* MCP tools via shared _connect helper. Backwards compatible —
omitting board keeps current pinned-board behavior. Useful for
orchestrator profiles that route across multiple boards.

Two-file scope: tools/kanban_tools.py + tests.

* feat(kanban): stamp originating ACP session_id on tasks

Salvages #23208 by @awizemann. Tracks which chat session created a
kanban task so clients can render a per-session board without falling
back to tenant + time-window heuristics.

- Schema: tasks gains nullable session_id TEXT column with index
  (additive migration in _migrate_add_optional_columns).
- ACP: server.py exposes the originating session id via HERMES_SESSION_ID
  with save/restore around the agent loop.
- Tool: kanban_create reads HERMES_SESSION_ID (with explicit override).
- CLI: 'hermes kanban list --session <id>' filter; JSON output exposes
  session_id.

* feat(kanban): wire dispatcher to dispatch review agents from review column

Salvages #23772 by @thewillhuang. Adds 'review' as a valid kanban task
status and extends dispatch_once to monitor the review column as a
second dispatch source (in addition to the existing ready column).

- Adds 'review' to VALID_STATUSES
- Adds claim_review_task() — atomically transitions review → running
- Adds has_spawnable_review() — health telemetry mirror
- Extends dispatch_once with a review column dispatch loop
- Review agents get 'sdlc-review' skill auto-loaded

Resolved 2 conflicts (VALID_STATUSES merge with main's 'scheduled' state,
test file additions). Adapted claim_review_task to main's
ttl_seconds: Optional[int] = None convention (matches claim_task).

* feat(kanban): stale detection for running tasks in dispatcher

Salvages #23790 by @thewillhuang. Adds detect_stale_running() to
the dispatcher cycle. Running tasks that have been started for longer
than dispatch_stale_timeout_seconds (default 14400 = 4h) without a
heartbeat in the last hour are auto-reclaimed to ready.

- New config kanban.dispatch_stale_timeout_seconds (default 14400, 0 disables)
- New 'stale' field on DispatchResult
- detect_stale_running() in kanban_db.py with heartbeat freshness check
- Records outcome='stale' on run close + 'stale' event; ticks failure counter
- Wires config through gateway embedded dispatcher
- Updates _cmd_dispatch verbose/JSON output and daemon logging

Resolved test-file end-of-file conflict by appending both halves.

* feat(kanban): filter tasks by workflow fields and runs by status/outcome

Salvages #26745 by @nehaaprasaad. Exposes filtering for the existing
workflow_template_id and current_step_key columns:

- list_tasks() accepts workflow_template_id and current_step_key kwargs
- 'hermes kanban list' adds matching CLI flags
- dashboard plugin_api also exposes the filters

Resolved a small conflict in list_tasks signature alongside main's
session_id and order_by additions; combined all three into the single
filter list.

* feat(kanban): add respawn guard to block repeat worker storms

Salvages #27484 by @fardoche6. Adds a respawn guard that skips worker
spawn for tasks where:
- a recent run already succeeded (recent_success — within guard window)
- the previous run hit a quota/auth error (blocker_auth, also auto-blocks)
- a recent task comment includes a GitHub PR URL (active_pr)

The guard prevents repeat worker storms on the same bug/task. Includes
the contributor's review-findings fixup (regex hardening, observability,
auth coverage).

Resolved a small DispatchResult conflict alongside main's 'stale' field;
kept both. Authorship preserved via rebase merge.

* feat(kanban): show dashboard cron jobs across profiles

Salvages #27568 by @SerenityTn. Dashboard cron page now lists cron
jobs from all profiles, with profile-aware filter UI and storage
routing. Includes test coverage for cross-profile listing, mutation,
deletion, and validation.

Also fixes orphan conflict markers in config.py left by an earlier
salvage merge (kanban.dispatch_stale_timeout_seconds was double-nested
in HEAD/PR markers from #28452 salvage of #23790).

* fix(kanban): remove orphan conflict markers from config.py (#28458)

PR #28452 (salvage of #23790, stale detection) merged with leftover
git conflict markers in hermes_cli/config.py around the
`dispatch_stale_timeout_seconds` config block, breaking config import
and any code path that loads it. Cleans up the markers and keeps both
config blocks (worker log rotation/orchestrator + stale detection).

Resolves a self-introduced regression.

* fix(kanban): remove orphan conflict markers from kanban.py (#28459)

PR #28454 (salvage of #26745, workflow filter) merged with leftover
git conflict markers in hermes_cli/kanban.py at three sites:
- _task_to_dict() (session_id alongside workflow_template_id/current_step_key)
- p_list parser (--sort alongside --workflow-template-id/--step-key)
- _cmd_list (order_by alongside the new filter kwargs)

Cleans up the markers and keeps both halves at each site.

Resolves a self-introduced regression.

* feat(kanban): configure worktree paths and branches

Salvages #26496 by @aqilaziz. Adds branch_name column + CLI flag so
tasks with workspace_kind='worktree' can pin a target branch on
create. Schema migration added to _migrate_add_optional_columns.

- Task.branch_name field + DB column + migration
- create_task accepts branch_name kwarg
- hermes kanban create --branch <name> flag
- kanban show output includes 'Branch: <name>' when set

Cherry-picked the substantive commit (a7558cf27); the PR's tip was
an unrelated service-path-dirs commit. Resolved 2 INSERT-column-list
and show-output conflicts alongside main's session_id and
max_runtime_seconds additions; kept all three.

* feat(skills): add skill bundles — alias /<name> loads multiple skills (#28373)

Skill bundles are tiny YAML files in ~/.hermes/skill-bundles/ that
group several skills under one slash command. Invoking /<bundle-name>
from any surface (CLI, TUI, dashboard, any gateway platform) loads
every referenced skill into a single combined user message.

Use cases:
- /backend-dev → loads github-code-review + test-driven-development
  + github-pr-workflow as one bundle.
- /research → loads several research skills together.
- Team task profiles shared via dotfiles.

Behavior:
- Bundles take precedence over individual skills when slugs collide.
- Missing skills are skipped with a note, not fatal.
- No system-prompt mutation — bundles generate a fresh user message
  at invocation time, the same way /<skill> does. Prompt cache stays
  intact.
- Works in CLI dispatch, gateway dispatch, autocomplete (CLI + TUI),
  /help display.

Schema (~/.hermes/skill-bundles/<slug>.yaml):
    name: backend-dev
    description: Backend feature work.
    skills:
      - github-code-review
      - test-driven-developme…
Lillard01 pushed a commit to Lillard01/hermes-agent that referenced this pull request May 21, 2026
… ops

Three install.ps1 improvements pulled from the thin-installer work on
bb/gui (PR NousResearch#27822) that benefit the canonical CLI install flow on main:

1. Strip UTF-8 BOM from scripts/install.ps1.

   The canonical 'irm <raw URL> | iex' install flow has been broken
   since commit 4279da4 re-introduced a UTF-8 BOM that PR NousResearch#27224
   had explicitly stripped. PowerShell 5.1's 'irm' returns the
   response body as a string with the BOM surviving as a leading
   \ufeff character; 'iex' then evaluates that string and the parser
   chokes on the invisible character before param(), surfacing as a
   cascade of 'The assignment expression is not valid' errors at
   every param default value.

   File body is verified pure ASCII (no character above byte 127),
   so PS 5.1 with no BOM falls back to Windows-1252 decoding which
   is identical to ASCII for our content. Both install paths work:
     - 'irm ... | iex' (canonical one-liner)
     - 'powershell -File install.ps1' (programmatic / desktop bootstrap)

2. New -Commit and -Tag string params for reproducible pinning.

   Higher-precedence variants of -Branch. When set, the repository
   stage clones $Branch (fast partial fetch) and then 'git checkout's
   the exact ref. Precedence: Commit > Tag > Branch. Honoured by all
   three code paths:
     - Update path (existing valid checkout): fetch + checkout
       --detach <commit|tag> instead of checkout + pull.
     - Fresh clone: clone --branch $Branch, then post-clone
       'git checkout --detach' to the requested ref.
     - ZIP fallback: pick archive URL for the most-specific ref
       (commit -> archive/<sha>.zip, tag -> archive/refs/tags/
       <tag>.zip, else archive/refs/heads/<branch>.zip).

   Used by the Hermes desktop's first-launch bootstrap to pin the
   .exe to the exact commit it was built against, so the cloned
   Hermes Agent tree always matches what the .exe was tested with.
   Also enables release-bundle pinning (e.g. Microsoft Store builds
   pinning to a release tag) and CI reproducibility.

3. EAP=Continue wrap around the new pin-step git invocations.

   'git fetch origin <commit>' writes the routine 'From <url>' info
   line to stderr. Under the script's global $ErrorActionPreference
   = 'Stop' that stderr line is wrapped as an ErrorRecord and
   terminates the script even though fetch+checkout actually succeed.
   Same EAP=Stop + native-stderr footgun we hit during the install.ps1
   hardening pass in Install-Uv, Test-Python, _Run-NpmInstall.

   Wrap both the update-path fetch/checkout block AND the post-clone
   pin block in $ErrorActionPreference = 'Continue' (restored in
   finally). Real failures still caught by $LASTEXITCODE checks.
Mucky010 pushed a commit to Mucky010/hermes-agent that referenced this pull request May 24, 2026
… ops

Three install.ps1 improvements pulled from the thin-installer work on
bb/gui (PR NousResearch#27822) that benefit the canonical CLI install flow on main:

1. Strip UTF-8 BOM from scripts/install.ps1.

   The canonical 'irm <raw URL> | iex' install flow has been broken
   since commit 4279da4 re-introduced a UTF-8 BOM that PR NousResearch#27224
   had explicitly stripped. PowerShell 5.1's 'irm' returns the
   response body as a string with the BOM surviving as a leading
   \ufeff character; 'iex' then evaluates that string and the parser
   chokes on the invisible character before param(), surfacing as a
   cascade of 'The assignment expression is not valid' errors at
   every param default value.

   File body is verified pure ASCII (no character above byte 127),
   so PS 5.1 with no BOM falls back to Windows-1252 decoding which
   is identical to ASCII for our content. Both install paths work:
     - 'irm ... | iex' (canonical one-liner)
     - 'powershell -File install.ps1' (programmatic / desktop bootstrap)

2. New -Commit and -Tag string params for reproducible pinning.

   Higher-precedence variants of -Branch. When set, the repository
   stage clones $Branch (fast partial fetch) and then 'git checkout's
   the exact ref. Precedence: Commit > Tag > Branch. Honoured by all
   three code paths:
     - Update path (existing valid checkout): fetch + checkout
       --detach <commit|tag> instead of checkout + pull.
     - Fresh clone: clone --branch $Branch, then post-clone
       'git checkout --detach' to the requested ref.
     - ZIP fallback: pick archive URL for the most-specific ref
       (commit -> archive/<sha>.zip, tag -> archive/refs/tags/
       <tag>.zip, else archive/refs/heads/<branch>.zip).

   Used by the Hermes desktop's first-launch bootstrap to pin the
   .exe to the exact commit it was built against, so the cloned
   Hermes Agent tree always matches what the .exe was tested with.
   Also enables release-bundle pinning (e.g. Microsoft Store builds
   pinning to a release tag) and CI reproducibility.

3. EAP=Continue wrap around the new pin-step git invocations.

   'git fetch origin <commit>' writes the routine 'From <url>' info
   line to stderr. Under the script's global $ErrorActionPreference
   = 'Stop' that stderr line is wrapped as an ErrorRecord and
   terminates the script even though fetch+checkout actually succeed.
   Same EAP=Stop + native-stderr footgun we hit during the install.ps1
   hardening pass in Install-Uv, Test-Python, _Run-NpmInstall.

   Wrap both the update-path fetch/checkout block AND the post-clone
   pin block in $ErrorActionPreference = 'Continue' (restored in
   finally). Real failures still caught by $LASTEXITCODE checks.
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
Three issues flagged by the Copilot review on this PR:

1. Double JSON emit on stage failure (Copilot NousResearch#1, NousResearch#2). When -Stage <name>
   ran a worker that threw, Invoke-Stage's finally emitted a JSON result
   frame AND the entry-point catch emitted a second error frame --
   producing two concatenated JSON objects on stdout and breaking the
   one-line-per-invocation contract that drivers parse against. Same
   issue applied to -Json mode on a full install (every stage's finally
   plus a final error frame missing duration_ms/skipped).

   Fix: Invoke-Stage's finally now sets $script:_StageEmittedErrorFrame
   when it emits a failure frame; the entry-point catch checks the flag
   and skips its own emit, still exit 1.

2. $prevEAP uninitialized on early try-block throw (Copilot NousResearch#3). In
   Install-Uv, Test-Python, Test-Node's winget fallback,
   _Run-NpmInstall, and the playwright block, '$prevEAP =
   $ErrorActionPreference' lived as the first statement INSIDE the
   try. If anything between 'try {' and that line threw (Write-Info on
   an unusual host, the npx-finding loop, etc.), the catch's
   'if ($prevEAP) { ... }' restore was a no-op and EAP could remain
   relaxed.

   Fix: hoist '$prevEAP = $ErrorActionPreference' to the line
   immediately before 'try {' in all five sites. Catch's restore is
   now always meaningful regardless of where in the try the throw
   originated.

No change to Invoke-Stage's success path or to the four lint-clean EAP
sites (Test-Node was the only winget-related catch). All 19 metadata
smoke tests still pass.
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
… ops

Three install.ps1 improvements pulled from the thin-installer work on
bb/gui (PR NousResearch#27822) that benefit the canonical CLI install flow on main:

1. Strip UTF-8 BOM from scripts/install.ps1.

   The canonical 'irm <raw URL> | iex' install flow has been broken
   since commit 4279da4 re-introduced a UTF-8 BOM that PR NousResearch#27224
   had explicitly stripped. PowerShell 5.1's 'irm' returns the
   response body as a string with the BOM surviving as a leading
   \ufeff character; 'iex' then evaluates that string and the parser
   chokes on the invisible character before param(), surfacing as a
   cascade of 'The assignment expression is not valid' errors at
   every param default value.

   File body is verified pure ASCII (no character above byte 127),
   so PS 5.1 with no BOM falls back to Windows-1252 decoding which
   is identical to ASCII for our content. Both install paths work:
     - 'irm ... | iex' (canonical one-liner)
     - 'powershell -File install.ps1' (programmatic / desktop bootstrap)

2. New -Commit and -Tag string params for reproducible pinning.

   Higher-precedence variants of -Branch. When set, the repository
   stage clones $Branch (fast partial fetch) and then 'git checkout's
   the exact ref. Precedence: Commit > Tag > Branch. Honoured by all
   three code paths:
     - Update path (existing valid checkout): fetch + checkout
       --detach <commit|tag> instead of checkout + pull.
     - Fresh clone: clone --branch $Branch, then post-clone
       'git checkout --detach' to the requested ref.
     - ZIP fallback: pick archive URL for the most-specific ref
       (commit -> archive/<sha>.zip, tag -> archive/refs/tags/
       <tag>.zip, else archive/refs/heads/<branch>.zip).

   Used by the Hermes desktop's first-launch bootstrap to pin the
   .exe to the exact commit it was built against, so the cloned
   Hermes Agent tree always matches what the .exe was tested with.
   Also enables release-bundle pinning (e.g. Microsoft Store builds
   pinning to a release tag) and CI reproducibility.

3. EAP=Continue wrap around the new pin-step git invocations.

   'git fetch origin <commit>' writes the routine 'From <url>' info
   line to stderr. Under the script's global $ErrorActionPreference
   = 'Stop' that stderr line is wrapped as an ErrorRecord and
   terminates the script even though fetch+checkout actually succeed.
   Same EAP=Stop + native-stderr footgun we hit during the install.ps1
   hardening pass in Install-Uv, Test-Python, _Run-NpmInstall.

   Wrap both the update-path fetch/checkout block AND the post-clone
   pin block in $ErrorActionPreference = 'Continue' (restored in
   finally). Real failures still caught by $LASTEXITCODE checks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cli CLI entry point, hermes_cli/, setup wizard P2 Medium — degraded but workaround exists type/refactor Code restructuring, no behavior change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants