Bug Description
Bug Description
When running multiple Hermes containers from the same image sharing a single /opt/data bind mount (e.g., a gateway container + dashboard container + custom profile container), the 02-reconcile-profiles cont-init script in every container registers s6 service slots for all profiles -- not just the one the container actually runs. This causes s6-log instances in the "wrong" containers to crash-loop with fatal: unable to lock errors.
Environment
- Hermes Agent: v0.14.0 (Docker image nousresearch/hermes-agent:latest)
- Docker Compose multi-container setup with shared /opt/data bind mount
- 3 containers: gateway (default profile), gateway (custom profile), dashboard
Steps to Reproduce
- Set up a docker-compose.yml with two or more Hermes containers sharing the same /opt/data volume:
- hermes-agent running gateway run (default profile)
- hermes-custom running hermes -p custom gateway run
- dashboard running dashboard --host 0.0.0.0 --insecure
-
Start the stack: docker compose up -d
-
Check logs on the dashboard or agent container:
docker logs hermes-dashboard 2>&1 | grep s6-log
Expected Behavior
Only the container actually running a given profile's gateway should register and run the corresponding s6 service slot (and its log sub-service). Other containers should not attempt to run s6-log against the same log directory.
Actual Behavior
Every container's 02-reconcile-profiles walks all profiles under $HERMES_HOME/profiles/ and creates s6 service directories for all of them under /run/service/gateway-/. The gateway service itself gets a down marker file (so it doesn't actually start in the wrong container), but the log sub-service (/run/service/gateway-/log/) has no down file and always starts.
Each log sub-service runs:
sh
exec s6-setuidgid hermes s6-log 1 n10 s1000000 T "$log_dir"
s6-log tries to exclusively lock $HERMES_HOME/logs/gateways/<name>/lock. The first container to grab the lock wins; all others get:
s6-log: fatal: unable to lock /opt/data/logs/gateways/default/lock: Resource busy
s6-log: fatal: unable to lock /opt/data/logs/gateways/community/lock: Resource busy
Since s6-supervise restarts the log service on every crash, these errors repeat indefinitely in every container that doesn't own the gateway.
Affected Component
Configuration (config.yaml, .env, hermes setup)
Messaging Platform (if gateway-related)
No response
Debug Report
Report https://paste.rs/OKDSr
agent.log https://paste.rs/5Kj7F
gateway.log https://paste.rs/RTWDF
Operating System
Ubuntu
Python Version
3.13.5.
Hermes Version
15
Additional Logs / Traceback (optional)
Root Cause Analysis (optional)
In hermes_cli/container_boot.py, reconcile_profile_gateways() unconditionally registers a gateway-default slot for the root profile AND walks all named profiles under $HERMES_HOME/profiles/. There is no mechanism to filter which profiles a given container should manage. The function has no awareness of which profile the container was started with (via hermes -p ).
The _register_service() helper creates the down marker on the gateway service directory, but the log/ sub-directory (which gets its own s6 service with its own run script) is never given a down marker.
Proposed Fix (optional)
Option A: Scope reconciliation to the active profile
Add an environment variable (e.g., HERMES_PROFILE=default) that 02-reconcile-profiles reads. Only register the s6 slot for the profile matching $HERMES_PROFILE. The dashboard container, which runs no gateway at all, could set HERMES_PROFILE=_none or a new env var like HERMES_SKIP_RECONCILE=1 to skip reconciliation entirely.
Option B: Add down markers to log sub-services
After _register_service() creates the service directory with a down marker on the main service, also create a down marker in the log/ sub-directory. This prevents the log sub-service from starting when the gateway itself is intentionally down. This is a smaller change but doesn't prevent the unnecessary service slot creation.
Option C: Both A and B
Option A prevents unnecessary slot creation; Option B is a safety net for any case where a gateway slot is created but shouldn't auto-start its logger.
Workaround
The errors are noisy but non-functional -- gateways and the dashboard continue to work correctly. The s6-log crash loops consume negligible resources. No production impact beyond log spam.
Are you willing to submit a PR for this?
Bug Description
Bug Description
Steps to Reproduce
Start the stack: docker compose up -d
Check logs on the dashboard or agent container:
docker logs hermes-dashboard 2>&1 | grep s6-log
Expected Behavior
Only the container actually running a given profile's gateway should register and run the corresponding s6 service slot (and its log sub-service). Other containers should not attempt to run s6-log against the same log directory.
Actual Behavior
Every container's 02-reconcile-profiles walks all profiles under $HERMES_HOME/profiles/ and creates s6 service directories for all of them under /run/service/gateway-/. The gateway service itself gets a down marker file (so it doesn't actually start in the wrong container), but the log sub-service (/run/service/gateway-/log/) has no down file and always starts.
Affected Component
Configuration (config.yaml, .env, hermes setup)
Messaging Platform (if gateway-related)
No response
Debug Report
Operating System
Ubuntu
Python Version
3.13.5.
Hermes Version
15
Additional Logs / Traceback (optional)
Root Cause Analysis (optional)
In hermes_cli/container_boot.py, reconcile_profile_gateways() unconditionally registers a gateway-default slot for the root profile AND walks all named profiles under $HERMES_HOME/profiles/. There is no mechanism to filter which profiles a given container should manage. The function has no awareness of which profile the container was started with (via hermes -p ).
Proposed Fix (optional)
Are you willing to submit a PR for this?