Skip to content

Managed mode: gateway hardcodes restrictive permissions, breaks dashboard for group members #9383

@lvnilesh

Description

@lvnilesh

Summary

When running hermes via the NixOS module (managed/native mode), three issues prevent hermes dashboard from working for users in the hermes group:

1. gateway.pid not written in managed mode

The managed gateway (started via systemd) writes gateway_state.json but never writes gateway.pid. The dashboard's get_running_pid() reads only gateway.pid, so it always returns None — the dashboard permanently shows "Gateway: not running" even though the gateway is active and connected.

gateway_state.json correctly shows "gateway_state": "running" with the PID, but the dashboard ignores it when gateway.pid is missing.

Suggested fix: In web_server.py's get_status(), fall back to read_runtime_status() when get_running_pid() returns None:

gateway_pid = get_running_pid()
if gateway_pid is None:
    rt = read_runtime_status()
    if rt and rt.get("gateway_state") == "running":
        rpid = rt.get("pid")
        if rpid and os.path.exists(f"/proc/{rpid}"):
            gateway_pid = rpid
gateway_running = gateway_pid is not None

2. cron/jobs.py hardcodes chmod 0o700 on cron directory

_secure_dir() in cron/jobs.py (line ~76) runs os.chmod(path, 0o700) on every startup via ensure_dirs(). The NixOS activation script sets cron/ to 2770 (group-readable), but the gateway immediately overrides it to 700.

This blocks any hermes group member from reading jobs.json via the dashboard. The dashboard's /api/cron/jobs endpoint returns 500 Internal Server Error with PermissionError.

Suggested fix: In managed mode (HERMES_MANAGED=true), use 0o770 or 0o2770 instead of 0o700, since the NixOS module intentionally sets group-accessible permissions.

3. Gateway rewrites .env with 0600

The gateway rewrites $HERMES_HOME/.env with 0600 permissions during startup. The NixOS activation script writes it as 0640 (group-readable), but the gateway overrides this.

Same suggestion: respect the NixOS module's 0640 in managed mode.

Environment

  • hermes-agent v0.9.0 (commit 78fa758)
  • NixOS module in native (non-container) mode
  • User cloudgenius is in the hermes group

Workaround

Run hermes dashboard as the hermes user directly:

# Wrapper script
(pkgs.writeShellScriptBin "hermes-dashboard" ''
  exec sudo -u hermes env HERMES_HOME=/var/lib/hermes/.hermes hermes dashboard "$@"
'')

# Passwordless sudo for the hermes command
security.sudo.extraConfig = ''
  cloudgenius ALL=(hermes) NOPASSWD: /run/current-system/sw/bin/hermes
'';

For the gateway status detection, we patch web_server.py in a PYTHONPATH overlay to add the gateway_state.json fallback.

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existsarea/nixNix flake, NixOS module, container packagingcomp/cronCron scheduler and job managementcomp/gatewayGateway runner, session dispatch, deliverytype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions