Skip to content

[Bug]: Docker terminal sandboxes fail to start for s6 /init images #34628

@LouisGameDev

Description

@LouisGameDev

Bug Description

Docker-backed terminal sandboxes fail to start for images that already use s6 /init as PID 1, which makes terminal-tool commands fail with a generic container ... is not running error.

In the reproduced case, Hermes starts a fresh hermes-agent:latest sandbox with both Docker --init and a noexec tmpfs on /run. That breaks s6 startup in two stages:

  1. Docker --init conflicts with an image that already has /init as its entrypoint.
  2. After skipping Docker --init, s6 still fails because it needs to exec /run/s6/basedir/bin/init, but /run is mounted noexec.

Steps to Reproduce

  1. Use the Docker terminal backend with hermes-agent:latest (or another image whose entrypoint is /init).
  2. Trigger any terminal-tool command that provisions a fresh sandbox.
  3. Observe that the container exits immediately and the terminal tool reports that the container is not running.

A minimal local smoke repro after sandbox creation was:

from tools.environments.docker import DockerEnvironment

env = DockerEnvironment(
    image='hermes-agent:latest',
    cwd='/root',
    timeout=60,
    cpu=1,
    memory=1024,
    disk=0,
    persistent_filesystem=False,
    task_id='smoke-hermes-agent-latest',
    volumes=[],
    forward_env=[],
    env={},
    host_cwd=None,
    auto_mount_cwd=False,
    run_as_host_user=False,
    extra_args=[],
    persist_across_processes=False,
)
print(env.execute('echo hermes-smoke-ok'))

Expected Behavior

Fresh Docker sandboxes should stay up, initialize correctly, and execute terminal commands normally.

Actual Behavior

Before the local hotfix, the container died during startup and terminal commands failed with a daemon-level "container is not running" error.

After removing Docker --init, the failure became explicit in container logs:

/package/admin/s6-overlay-3.2.3.0/libexec/stage0: 83: exec: /run/s6/basedir/bin/init: Permission denied

The container exited with code 126.

Affected Component

  • Tools (terminal, Docker sandbox environment)

Messaging Platform

  • N/A (CLI only)

Operating System

Windows host with Docker backend

Python Version

Reproduced locally with Python 3.11.14

Hermes Version

Reproduced on the repository's current main branch lineage (main at 75cd420b3ba1b83185020c6d4506d7cc53b12e2b when tested).

Root Cause Analysis

The Docker environment logic currently assumes it can always:

  • add Docker --init
  • mount /run as --tmpfs /run:rw,noexec,nosuid,size=64m

That is not valid for images that already use s6 /init as PID 1. Those images need:

  • no extra Docker --init
  • an executable /run, because s6 stage0 later executes from /run/s6/...

The relevant local fix was in tools/environments/docker.py.

Proposed Fix

Detect images whose entrypoint is /init and, for those images:

  • skip Docker --init
  • mount /run with exec instead of noexec

I validated that locally with:

  • a focused regression test in tests/tools/test_docker_environment.py
  • a fresh hermes-agent:latest smoke test that successfully returned hermes-smoke-ok

Additional Context

I did a duplicate search for likely matches before filing and did not find one.

If useful, I can turn the validated local hotfix into a PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existsarea/dockerDocker image, Compose, packagingbackend/dockerDocker container executiontool/terminalTerminal execution and process managementtype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions