Skip to content

docker: opt in to dashboard --insecure via env var, never derive from bind host#34188

Merged
benbarclay merged 1 commit into
mainfrom
dashboard-insecure-opt-in
May 28, 2026
Merged

docker: opt in to dashboard --insecure via env var, never derive from bind host#34188
benbarclay merged 1 commit into
mainfrom
dashboard-insecure-opt-in

Conversation

@benbarclay

Copy link
Copy Markdown
Collaborator

Summary

The s6 dashboard run script flipped --insecure on whenever HERMES_DASHBOARD_HOST was anything other than 127.0.0.1 / localhost. The comment ("the dashboard refuses otherwise") predates the OAuth auth gate — back then, start_server SystemExit'd on any non-loopback bind, and the run script's --insecure was the only way to make in-container deployments work at all.

The gate has since been replaced by should_require_auth(host, allow_public), which engages the OAuth flow when a DashboardAuthProvider is registered (the bundled dashboard_auth/nous provider auto-registers when HERMES_DASHBOARD_OAUTH_CLIENT_ID is set) and fails closed with a specific operator-facing error when none is. The host-derived --insecure ran upstream of all that and silently disabled the gate on every container-deployed dashboard.

Live evidence (before fix)

Test agent on the wildcard-subdomain rollout (nous-account-service PR #221), HERMES_DASHBOARD_OAUTH_CLIENT_ID correctly injected by the portal on first boot, bundled nous provider registers — and:

$ curl https://<agentId>.agents.staging-nousresearch.com/api/status
{
  ...
  "auth_required": false,
  "auth_providers": ["nous"]
}

Provider registered. Gate off. The combination is only reachable via allow_public=True, i.e. --insecure reached start_server. The dashboard SPA was served to anyone on the public internet, including /sessions, with no /login redirect.

Fix

Derive --insecure from an explicit opt-in env var, HERMES_DASHBOARD_INSECURE (truthy values matching the rest of the s6 boolean envs: 1, true, TRUE, True, yes, YES, Yes). Operators on trusted LANs behind a reverse proxy without the OAuth contract opt in explicitly; portal-managed agent deployments leave it unset and let the gate engage.

Behaviour table after the fix:

Bind host HERMES_DASHBOARD_OAUTH_CLIENT_ID HERMES_DASHBOARD_INSECURE Outcome
127.0.0.1 irrelevant irrelevant Loopback bind, no auth gate (unchanged)
0.0.0.0 set unset OAuth gate engages, login required (new correct behaviour)
0.0.0.0 unset unset start_server SystemExits with skip-reason from nous plugin (fail-closed)
0.0.0.0 irrelevant 1 --insecure passed, gate disabled, loud warning (explicit opt-in)

Compose-file impact

docker-compose.windows.yml already passes --insecure on the command: array directly (line 38), so it doesn't depend on the s6 auto-injection. No compose-file change required.

Tests

  • tests/test_docker_home_override_scripts.py — extends the existing static-text guard with test_dashboard_run_does_not_derive_insecure_from_bind_host, asserting the legacy host-derived case-statement is gone and the new env-var opt-in is present (locks against accidental revert).
  • tests/docker/test_dashboard.py — adds two Docker-in-Docker tests exercising the actual /api/status round-trip:
    • test_dashboard_oauth_gate_engages_on_non_loopback_bind0.0.0.0 bind + HERMES_DASHBOARD_OAUTH_CLIENT_IDauth_required: true, "nous" in providers.
    • test_dashboard_insecure_env_var_opts_out_of_gate0.0.0.0 bind + HERMES_DASHBOARD_INSECURE=1auth_required: false.

Both new Docker tests follow the existing _poll-style pattern in test_dashboard.py and probe via venv Python's urllib.request (the image doesn't ship curl).

Docs

  • website/docs/user-guide/docker.md + the matching zh-Hans i18n file — adds HERMES_DASHBOARD_INSECURE to the env-var table, replaces the previously stale prose ("the entrypoint no longer auto-enables insecure mode" — which was flat-out wrong until this PR) with an accurate description of the gate's trigger conditions and the explicit opt-out.

Local verification

  • shellcheck clean (--severity=error, matches CI's docker-lint.yml).
  • tests/test_docker_home_override_scripts.py — both tests pass.
  • tests/docker/test_dashboard.py collects all 8 tests without errors (Docker harness not run locally — CI's docker-publish smoke test exercises it).

Validation after merge

  1. Build & publish a new nousresearch/hermes-agent:latest.
  2. On the next portal-side agent create, the new image is picked up automatically.
  3. Hit https://<agentId>.agents.staging-nousresearch.com/api/status — expect "auth_required": true and "auth_providers": ["nous"].
  4. Hit https://<agentId>.agents.staging-nousresearch.com/sessions in a browser — expect redirect to the portal's OAuth /authorize endpoint.

Lane

Solo-landable Docker/s6 territory (only docker/s6-rc.d/dashboard/run, two test files, and two doc files touched — no run_agent.py / cli.py / gateway/ / model-related code). Routine teammate review per branch protection.

cc @teknium1 for the approving review (Docker/s6 lane, but maintainer review required by branch protection).


Handover source: filed by Ben after the nous-account-service portal-side fix (commit a3347645 on the agent_domains branch / PR #221) confirmed all four dashboard env vars are now injected on first boot; the only remaining blocker for end-to-end OAuth on the wildcard subdomain was this s6 script.

… bind host

The s6 dashboard run script flipped `--insecure` on whenever
`HERMES_DASHBOARD_HOST` was anything other than 127.0.0.1 / localhost.
That comment ("the dashboard refuses otherwise") predates the OAuth
auth gate: back when it was written, `start_server` would SystemExit
on any non-loopback bind, so the run script's `--insecure` was the
only way to make in-container deployments work at all.

The gate has since been replaced by `should_require_auth(host,
allow_public)`, which engages the OAuth flow when a
`DashboardAuthProvider` is registered (the bundled `dashboard_auth/nous`
provider auto-registers on `HERMES_DASHBOARD_OAUTH_CLIENT_ID`) and
fails closed with a specific operator-facing error when none is. The
host-derived `--insecure` ran upstream of all that and silently
disabled the gate on every container-deployed dashboard.

Most visible under the portal's wildcard-subdomain rollout: every Fly
machine binds 0.0.0.0 so the edge can reach Flycast, every machine
boots with the correct `HERMES_DASHBOARD_OAUTH_CLIENT_ID`, the nous
provider registers — and `/api/status` still returns
`{"auth_required": false, "auth_providers": ["nous"]}` because the
run script disabled the gate before `start_server` ever saw the
request. The dashboard SPA was served to anyone, no `/login` redirect,
no OAuth challenge.

Fix: derive `--insecure` from an explicit opt-in env var,
`HERMES_DASHBOARD_INSECURE` (truthy values matching the rest of the
s6 boolean envs: 1, true, TRUE, True, yes, YES, Yes). Operators on
trusted LANs behind a reverse proxy without the OAuth contract
(the existing `docker-compose.windows.yml` use case) opt in
explicitly; portal-managed agent deployments leave it unset and let
the gate engage.

`docker-compose.windows.yml` already passes `--insecure` on the
`command:` array directly (line 38), so it doesn't depend on the s6
auto-injection. No compose-file change required.

Tests:
* `tests/test_docker_home_override_scripts.py` — extends the existing
  static-text guard with a regression assertion that the legacy
  host-derived case-statement is gone and the new env-var opt-in is
  present (locks against accidental revert).
* `tests/docker/test_dashboard.py` — adds two Docker-in-Docker tests
  exercising the actual `/api/status` round-trip:
  - 0.0.0.0 bind + `HERMES_DASHBOARD_OAUTH_CLIENT_ID` → gate engaged
  - 0.0.0.0 bind + `HERMES_DASHBOARD_INSECURE=1` → gate disabled

Docs:
* `website/docs/user-guide/docker.md` + zh-Hans i18n — adds the new
  env var to the table, replaces the stale prose ("the entrypoint
  no longer auto-enables insecure mode" — which until this PR was
  flat-out wrong) with an accurate description of the gate's
  trigger conditions and the explicit opt-out.

shellcheck clean. Python static-text test passes locally. Behavioural
test will run against any future image build (CI's Docker harness).
@benbarclay benbarclay requested a review from teknium1 May 28, 2026 23:46
@github-actions

Copy link
Copy Markdown
Contributor

🔎 Lint report: dashboard-insecure-opt-in vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9577 on HEAD, 9577 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 5045 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant