Skip to content

Bug: os.chmod(path.parent, 0o700) bricks host when path.parent resolves to / #25821

@dfrolov

Description

@dfrolov

Summary

Five call sites do os.chmod(path.parent, 0o700) on a derived path without checking that path.parent is a sane directory. If anything makes the resolution land at / (e.g. HERMES_HOME=/, an env-var concat bug, or a path whose .parent.parent == .parent), the rule strips traversal permission from the root inode and bricks the entire host: every non-root user (systemd-resolve, systemd-network, syslog, nobody, …) fails any path lookup with EACCES, taking out DNS, networking, journald, rsyslog and every Docker container that drops privileges.

We hit this in production today. Root cause took 5+ hours to isolate because root keeps working (CAP_DAC_OVERRIDE) and the symptom is a cascade: systemd-resolved watchdogs out → restarts fail with 200/CHDIR → systemd-networkd and timesyncd follow → SSH+ICMP keep working → graceful reboot hangs → recovery via Hetzner rescue. Fix: chmod 755 / on the mounted FS.

We could not isolate the exact triggering call — no auditd was running — but the pattern is the same across all five sites, and the catastrophic failure mode (chmod("/", 0o700)) is reachable from any of them under the right env.

Affected call sites (HEAD 26933c2)

All share this shape:

path.parent.mkdir(parents=True, exist_ok=True)
try:
    os.chmod(path.parent, 0o700)
except OSError:
    pass

os.chmod raises no exception when chmodding /; it succeeds.

Trigger

Any of:

  • HERMES_HOME=/ (or other env vars consumed by _qwen_cli_auth_path / _credentials_path / _nous_shared_auth_dir set to /)
  • A bug elsewhere that resolves a token storage path to a top-level filename (e.g. Path("/auth.json"))

The path filter _safe_filename in tools/mcp_oauth.py sanitises the filename but not the directory, so a malformed HERMES_HOME is not caught.

Proposed fix

A single helper, used at every site:

def _secure_dir_safe(d: Path) -> None:
    """chmod 0o700 a dir if and only if it's safe to do so."""
    d = d.resolve()
    if d == Path("/") or d == Path(d.anchor) or len(d.parts) < 2:
        # Refuse to chmod top-level dirs.
        return
    # Also refuse common system roots if anyone got here.
    if d in (Path("/etc"), Path("/var"), Path("/usr"), Path("/home"), Path("/root"), Path("/opt"), Path("/tmp")):
        return
    try:
        os.chmod(d, 0o700)
    except OSError:
        pass

Plus a startup assert / loud logger.error if get_hermes_home() returns Path("/") or any of the above — strict-mode would be better than silent corruption.

Related

Happy to send a PR if there's interest in this exact helper signature.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P0Critical — data loss, security, crash looparea/authAuthentication, OAuth, credential poolscomp/agentCore agent loop, run_agent.py, prompt buildercomp/cliCLI entry point, hermes_cli/, setup wizardtool/mcpMCP client and OAuthtype/securitySecurity vulnerability or hardening

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions