[Bug]: s6-log lock collision in multi-container setup with shared /opt/data volume

### Bug Description

 Bug Description

    When running multiple Hermes containers from the same image sharing a single /opt/data bind mount (e.g., a gateway container + dashboard container + custom profile container), the 02-reconcile-profiles cont-init script in every container registers s6 service slots for all profiles -- not just the one the container actually runs. This causes s6-log instances in the "wrong" containers to crash-loop with fatal: unable to lock errors.

    Environment

    - Hermes Agent: v0.14.0 (Docker image nousresearch/hermes-agent:latest)
    - Docker Compose multi-container setup with shared /opt/data bind mount
    - 3 containers: gateway (default profile), gateway (custom profile), dashboard



   

### Steps to Reproduce

 1. Set up a docker-compose.yml with two or more Hermes containers sharing the same /opt/data volume:
       - hermes-agent running gateway run (default profile)
       - hermes-custom running hermes -p custom gateway run
       - dashboard running dashboard --host 0.0.0.0 --insecure
    2. Start the stack: docker compose up -d
    3. Check logs on the dashboard or agent container:

       docker logs hermes-dashboard 2>&1 | grep s6-log

### Expected Behavior

Only the container actually running a given profile's gateway should register and run the corresponding s6 service slot (and its log sub-service). Other containers should not attempt to run s6-log against the same log directory.

### Actual Behavior

 Every container's 02-reconcile-profiles walks all profiles under $HERMES_HOME/profiles/ and creates s6 service directories for all of them under /run/service/gateway-<name>/. The gateway service itself gets a down marker file (so it doesn't actually start in the wrong container), but the log sub-service (/run/service/gateway-<name>/log/) has no down file and always starts.

    Each log sub-service runs:
    sh
    exec s6-setuidgid hermes s6-log 1 n10 s1000000 T "$log_dir"


    s6-log tries to exclusively lock $HERMES_HOME/logs/gateways/<name>/lock. The first container to grab the lock wins; all others get:

    s6-log: fatal: unable to lock /opt/data/logs/gateways/default/lock: Resource busy
    s6-log: fatal: unable to lock /opt/data/logs/gateways/community/lock: Resource busy


    Since s6-supervise restarts the log service on every crash, these errors repeat indefinitely in every container that doesn't own the gateway.


### Affected Component

Configuration (config.yaml, .env, hermes setup)

### Messaging Platform (if gateway-related)

_No response_

### Debug Report

```shell
Report       https://paste.rs/OKDSr
  agent.log    https://paste.rs/5Kj7F
  gateway.log  https://paste.rs/RTWDF
```

### Operating System

Ubuntu 

### Python Version

3.13.5.

### Hermes Version

15

### Additional Logs / Traceback (optional)

```shell

```

### Root Cause Analysis (optional)

 In hermes_cli/container_boot.py, reconcile_profile_gateways() unconditionally registers a gateway-default slot for the root profile AND walks all named profiles under $HERMES_HOME/profiles/. There is no mechanism to filter which profiles a given container should manage. The function has no awareness of which profile the container was started with (via hermes -p <name>).

    The _register_service() helper creates the down marker on the gateway service directory, but the log/ sub-directory (which gets its own s6 service with its own run script) is never given a down marker.

### Proposed Fix (optional)


    Option A: Scope reconciliation to the active profile

    Add an environment variable (e.g., HERMES_PROFILE=default) that 02-reconcile-profiles reads. Only register the s6 slot for the profile matching $HERMES_PROFILE. The dashboard container, which runs no gateway at all, could set HERMES_PROFILE=_none or a new env var like HERMES_SKIP_RECONCILE=1 to skip reconciliation entirely.

    Option B: Add down markers to log sub-services

    After _register_service() creates the service directory with a down marker on the main service, also create a down marker in the log/ sub-directory. This prevents the log sub-service from starting when the gateway itself is intentionally down. This is a smaller change but doesn't prevent the unnecessary service slot creation.

    Option C: Both A and B

    Option A prevents unnecessary slot creation; Option B is a safety net for any case where a gateway slot is created but shouldn't auto-start its logger.

    Workaround

    The errors are noisy but non-functional -- gateways and the dashboard continue to work correctly. The s6-log crash loops consume negligible resources. No production impact beyond log spam.

### Are you willing to submit a PR for this?

- [ ] I'd like to fix this myself and submit a PR

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: s6-log lock collision in multi-container setup with shared /opt/data volume #34480

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Messaging Platform (if gateway-related)

Debug Report

Operating System

Python Version

Hermes Version

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

Proposed Fix (optional)

Are you willing to submit a PR for this?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug]: s6-log lock collision in multi-container setup with shared /opt/data volume #34480

Description

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Messaging Platform (if gateway-related)

Debug Report

Operating System

Python Version

Hermes Version

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

Proposed Fix (optional)

Are you willing to submit a PR for this?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions