docker: opt in to dashboard --insecure via env var, never derive from bind host#34188
Merged
Conversation
… bind host
The s6 dashboard run script flipped `--insecure` on whenever
`HERMES_DASHBOARD_HOST` was anything other than 127.0.0.1 / localhost.
That comment ("the dashboard refuses otherwise") predates the OAuth
auth gate: back when it was written, `start_server` would SystemExit
on any non-loopback bind, so the run script's `--insecure` was the
only way to make in-container deployments work at all.
The gate has since been replaced by `should_require_auth(host,
allow_public)`, which engages the OAuth flow when a
`DashboardAuthProvider` is registered (the bundled `dashboard_auth/nous`
provider auto-registers on `HERMES_DASHBOARD_OAUTH_CLIENT_ID`) and
fails closed with a specific operator-facing error when none is. The
host-derived `--insecure` ran upstream of all that and silently
disabled the gate on every container-deployed dashboard.
Most visible under the portal's wildcard-subdomain rollout: every Fly
machine binds 0.0.0.0 so the edge can reach Flycast, every machine
boots with the correct `HERMES_DASHBOARD_OAUTH_CLIENT_ID`, the nous
provider registers — and `/api/status` still returns
`{"auth_required": false, "auth_providers": ["nous"]}` because the
run script disabled the gate before `start_server` ever saw the
request. The dashboard SPA was served to anyone, no `/login` redirect,
no OAuth challenge.
Fix: derive `--insecure` from an explicit opt-in env var,
`HERMES_DASHBOARD_INSECURE` (truthy values matching the rest of the
s6 boolean envs: 1, true, TRUE, True, yes, YES, Yes). Operators on
trusted LANs behind a reverse proxy without the OAuth contract
(the existing `docker-compose.windows.yml` use case) opt in
explicitly; portal-managed agent deployments leave it unset and let
the gate engage.
`docker-compose.windows.yml` already passes `--insecure` on the
`command:` array directly (line 38), so it doesn't depend on the s6
auto-injection. No compose-file change required.
Tests:
* `tests/test_docker_home_override_scripts.py` — extends the existing
static-text guard with a regression assertion that the legacy
host-derived case-statement is gone and the new env-var opt-in is
present (locks against accidental revert).
* `tests/docker/test_dashboard.py` — adds two Docker-in-Docker tests
exercising the actual `/api/status` round-trip:
- 0.0.0.0 bind + `HERMES_DASHBOARD_OAUTH_CLIENT_ID` → gate engaged
- 0.0.0.0 bind + `HERMES_DASHBOARD_INSECURE=1` → gate disabled
Docs:
* `website/docs/user-guide/docker.md` + zh-Hans i18n — adds the new
env var to the table, replaces the stale prose ("the entrypoint
no longer auto-enables insecure mode" — which until this PR was
flat-out wrong) with an accurate description of the gate's
trigger conditions and the explicit opt-out.
shellcheck clean. Python static-text test passes locally. Behavioural
test will run against any future image build (CI's Docker harness).
Contributor
🔎 Lint report:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The s6 dashboard run script flipped
--insecureon wheneverHERMES_DASHBOARD_HOSTwas anything other than127.0.0.1/localhost. The comment ("the dashboard refuses otherwise") predates the OAuth auth gate — back then,start_serverSystemExit'd on any non-loopback bind, and the run script's--insecurewas the only way to make in-container deployments work at all.The gate has since been replaced by
should_require_auth(host, allow_public), which engages the OAuth flow when aDashboardAuthProvideris registered (the bundleddashboard_auth/nousprovider auto-registers whenHERMES_DASHBOARD_OAUTH_CLIENT_IDis set) and fails closed with a specific operator-facing error when none is. The host-derived--insecureran upstream of all that and silently disabled the gate on every container-deployed dashboard.Live evidence (before fix)
Test agent on the wildcard-subdomain rollout (
nous-account-servicePR #221),HERMES_DASHBOARD_OAUTH_CLIENT_IDcorrectly injected by the portal on first boot, bundled nous provider registers — and:Provider registered. Gate off. The combination is only reachable via
allow_public=True, i.e.--insecurereachedstart_server. The dashboard SPA was served to anyone on the public internet, including/sessions, with no/loginredirect.Fix
Derive
--insecurefrom an explicit opt-in env var,HERMES_DASHBOARD_INSECURE(truthy values matching the rest of the s6 boolean envs:1,true,TRUE,True,yes,YES,Yes). Operators on trusted LANs behind a reverse proxy without the OAuth contract opt in explicitly; portal-managed agent deployments leave it unset and let the gate engage.Behaviour table after the fix:
HERMES_DASHBOARD_OAUTH_CLIENT_IDHERMES_DASHBOARD_INSECURE127.0.0.10.0.0.00.0.0.0start_serverSystemExits with skip-reason from nous plugin (fail-closed)0.0.0.01--insecurepassed, gate disabled, loud warning (explicit opt-in)Compose-file impact
docker-compose.windows.ymlalready passes--insecureon thecommand:array directly (line 38), so it doesn't depend on the s6 auto-injection. No compose-file change required.Tests
tests/test_docker_home_override_scripts.py— extends the existing static-text guard withtest_dashboard_run_does_not_derive_insecure_from_bind_host, asserting the legacy host-derived case-statement is gone and the new env-var opt-in is present (locks against accidental revert).tests/docker/test_dashboard.py— adds two Docker-in-Docker tests exercising the actual/api/statusround-trip:test_dashboard_oauth_gate_engages_on_non_loopback_bind—0.0.0.0bind +HERMES_DASHBOARD_OAUTH_CLIENT_ID→auth_required: true,"nous"in providers.test_dashboard_insecure_env_var_opts_out_of_gate—0.0.0.0bind +HERMES_DASHBOARD_INSECURE=1→auth_required: false.Both new Docker tests follow the existing
_poll-style pattern intest_dashboard.pyand probe via venv Python'surllib.request(the image doesn't shipcurl).Docs
website/docs/user-guide/docker.md+ the matching zh-Hans i18n file — addsHERMES_DASHBOARD_INSECUREto the env-var table, replaces the previously stale prose ("the entrypoint no longer auto-enables insecure mode" — which was flat-out wrong until this PR) with an accurate description of the gate's trigger conditions and the explicit opt-out.Local verification
--severity=error, matches CI'sdocker-lint.yml).tests/test_docker_home_override_scripts.py— both tests pass.tests/docker/test_dashboard.pycollects all 8 tests without errors (Docker harness not run locally — CI's docker-publish smoke test exercises it).Validation after merge
nousresearch/hermes-agent:latest.https://<agentId>.agents.staging-nousresearch.com/api/status— expect"auth_required": trueand"auth_providers": ["nous"].https://<agentId>.agents.staging-nousresearch.com/sessionsin a browser — expect redirect to the portal's OAuth/authorizeendpoint.Lane
Solo-landable Docker/s6 territory (only
docker/s6-rc.d/dashboard/run, two test files, and two doc files touched — norun_agent.py/cli.py/gateway// model-related code). Routine teammate review per branch protection.cc @teknium1 for the approving review (Docker/s6 lane, but maintainer review required by branch protection).
Handover source: filed by Ben after the
nous-account-serviceportal-side fix (commita3347645on theagent_domainsbranch / PR #221) confirmed all four dashboard env vars are now injected on first boot; the only remaining blocker for end-to-end OAuth on the wildcard subdomain was this s6 script.