fix(docker): respect HERMES_HOME env var in entrypoint by malaiwah · Pull Request #8115 · NousResearch/hermes-agent

malaiwah · 2026-04-12T02:55:55Z

Summary

The Docker entrypoint hardcodes HERMES_HOME="/opt/data", ignoring the environment variable set by the Dockerfile (ENV HERMES_HOME=/opt/data) or overridden at container runtime (-e HERMES_HOME=/custom/path).

This one-line fix changes it to HERMES_HOME="${HERMES_HOME:-/opt/data}" so the entrypoint respects the env var when set, falling back to /opt/data as before. Existing deployments are unaffected.

Use case: Running multiple hermes-agent instances on the same host with different data directories (e.g. a crash-test instance alongside production). Without this fix, the entrypoint creates template files and syncs skills into /opt/data regardless of where the actual data volume is mounted.

Changes

docker/entrypoint.sh line 12: HERMES_HOME="/opt/data" → HERMES_HOME="${HERMES_HOME:-/opt/data}"

Test plan

Existing deployment with no HERMES_HOME override: entrypoint uses /opt/data (unchanged behavior)
podman run -e HERMES_HOME=/opt/data-test ...: entrypoint bootstraps into /opt/data-test
Skills sync, .env template, config.yaml template all land in the correct directory

- Add DEFAULT_ALLOWED_TOOLSETS including 'mcp' to enable MCP tools for subagents - Make BLOCKED_TOOLSET_NAMES configurable (was hardcoded) - Subagents now inherit MCP access from parent when available - Fixes subagent limitation where only terminal+process were available - Allows subagents to use SearXNG and Crawl4AI MCP servers

- Reorder imports to top of file (E402) - Add noqa comment for registry import (circular import requirement) - Readd missing constants: MAX_DEPTH, MAX_CONCURRENT_CHILDREN, DEFAULT_MAX_ITERATIONS - All ruff checks now passing

- Remove 'memory' from BLOCKED_TOOLSET_NAMES in delegate_tool.py - Add 'memory' to DEFAULT_ALLOWED_TOOLSETS for subagent access - Add DEFAULT_SUBAGENT_MEMORY_MODE = 'read_only' configuration - Modify memory_tool() to accept is_subagent and subagent_memory_mode params - Enforce memory write blocking in read_only mode for subagents - Support three modes: 'read_only', 'full', 'none' - Add comprehensive tests for subagent memory access - Maintains backward compatibility with existing memory tests Benefits: ✅ Subagents can now query Honcho observations dialectically ✅ Subagents can read MEMORY.md and USER.md for context ✅ Subagents blocked from writing (prevents memory pollution) ✅ Parent agent remains sole writer of memory ✅ Enables orchestrator/coordinator pattern with long-lived subagents ✅ Configurable per-subagent or global default

- docker.py: remove --pids-limit (unavailable without cgroup delegation), add _cgroup_limits_available() probe for --cpus/--memory - delegate_tool.py: add "browser" to DEFAULT_ALLOWED_TOOLSETS - Dockerfile: build from source, add podman-remote shim, wait-for-honcho - docker/wait-for-honcho.sh: poll Honcho API before starting gateway

…tation) - delegate_tool.py: Set skip_memory=False and pass subagent_memory_mode to child agents - delegate_tool.py: Add _is_subagent=True flag for memory tool access control - run_agent.py: Pass is_subagent and subagent_memory_mode to memory_tool calls - memory_tool.py: Enforce read-only mode for subagents (blocks add/replace/remove) - memory_tool.py: Support three modes: 'read_only' (default), 'full', 'none' Benefits: ✅ Subagents can read MEMORY.md and USER.md for context ✅ Subagents can query Honcho observations dialectically ✅ Subagents blocked from memory writes (prevents pollution) ✅ Parent agent remains sole writer of memory ✅ Configurable per-subagent via delegation config ✅ Enables orchestrator/coordinator pattern with long-lived subagents Tests: ✅ test_memory_subagent_readonly.py (4/4 passed) ✅ test_mcp_subagent_access.py (4/4 passed) ✅ Existing memory tests (33/33 passed) ✅ Existing delegate tests (5/5 passed)

- Added _cleanup_orphaned_containers() function to gateway/run.py - Automatically removes exited/dead/created hermes-* containers on startup - Prevents container accumulation from crashes, OOM kills, or manual stops - Logs cleanup activity with INFO level for visibility - Added comprehensive documentation in docs/CONTAINER_CLEANUP.md - Includes manual cleanup commands and CLI design proposal Benefits: ✅ No more manual container cleanup needed ✅ Recovers gracefully from crashes ✅ Reduces disk space usage from stale containers ✅ Improves system hygiene automatically ✅ Safe - only removes non-running containers Manual cleanup (if needed before deploying): podman ps -a --filter 'name=^hermes-' --filter 'status=exited' -q | xargs -r podman rm -f

Remove GitHub Actions workflows (deploy-site, docker-publish, tests, nix, supply-chain-audit, docs-site-checks) — these are for upstream's GitHub CI. Add .gitea/workflows/build-push.yml: builds the container image and pushes to Gitea's container registry on every push to main.

- run_agent.py: Add _shared_memory_store parameter to AIAgent.__init__() - run_agent.py: Use shared memory store when provided (subagents) - run_agent.py: Add _is_subagent flag for access control - delegate_tool.py: Pass parent's _memory_store to subagents - Subagents now share parent's memory store (read-only enforced) - Memory writes still blocked for subagents via memory_tool.py enforcement Benefits: ✅ Subagents can READ from MEMORY.md and USER.md ✅ Subagents can query Honcho observations ✅ Subagents blocked from memory writes (add/replace/remove) ✅ Parent remains sole writer of memory ✅ Shared store prevents duplicate memory initialization ✅ Enables orchestrator pattern with long-lived subagents Technical: - Subagents share parent's MemoryStore instance - Read-only enforcement in memory_tool.py still active - No duplicate memory loading for subagents - Memory stays consistent across parent + subagents

delegate_tool.py passes subagent_memory_mode and _is_subagent to AIAgent() but AIAgent.__init__ did not declare subagent_memory_mode, causing an unexpected keyword argument crash on every delegate_task call. Added the parameter to the signature and stored both self._is_subagent and self.subagent_memory_mode as instance attributes (previously only accessed via getattr with defaults, never stored).

…ra_body config - terminal_tool.py: add docker_forward_env to container_config dict passed to _create_environment — it was read from config but never propagated, so docker exec calls were built with no -e flags and credentials were never forwarded into sandbox containers - run_agent.py: add subagent_memory_mode param to AIAgent.__init__ — delegate_tool was passing it but AIAgent didn't accept it, crashing every delegate_task call; also store self._is_subagent and self._config_extra_body; add model.extra_body config support so extra_body fields (e.g. enable_thinking: false) are merged into every API call - tools/environments/docker.py: log exact docker exec command at WARNING level (secrets masked) for debugging env forwarding - Dockerfile: add logging to podman-remote shim — appends full command to /opt/data/logs/shim.log for introspection Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…g to debug - gateway/run.py: move _cleanup_orphaned_containers() before main() so it is defined before it is called; call it before asyncio.run() so cleanup happens at startup, not after the gateway exits - gateway/run.py: replace hardcoded "podman" with find_docker() so the function respects the configured docker/shim executable - docker.py: downgrade exec command log from WARNING to DEBUG (too noisy for normal operation; shim.log already captures all podman-remote calls) - tools/: delete leftover delegate_tool.py.patch artifact Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Allows config.yaml to specify which Docker/Podman network sandbox containers join via `terminal.docker_network`. When set, passes `--network <name>` to docker run so containers can resolve hostnames of other services on that network (e.g. hermes-litellm on hermes-net). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

When pricing/cost estimation calls fetch_endpoint_model_metadata without an api_key (e.g. from insights._get_pricing), the function made unauthenticated requests to the /models endpoint causing repeated 401 errors every 5 minutes (cache TTL). Now falls back to LITELLM_KEY env var so requests to the proxy are authenticated. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docker_network was added to DockerEnvironment (docker.py) and terminal_tool.py but the config.yaml → TERMINAL_DOCKER_NETWORK env var bridge was missing in both code paths: - cli.py: used by `hermes chat`, `hermes model`, etc. - gateway/run.py: used by `hermes gateway run` Also add TERMINAL_DOCKER_NETWORK reading to _get_env_config() in terminal_tool.py, and add docker_network to the container_config dict that's built before calling _create_environment(). Without this, `docker_network: "hermes-net"` in config.yaml had no effect — sandbox containers were always created on the default podman network and could not resolve hermes-net hostnames like hermes-litellm. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ge config and invalid workdir bypass - PR NousResearch#4350: Load config.yaml terminal block as fallback before hardcoded defaults - Fixes docker_image in config.yaml not being loaded - Adds cfg.get() fallbacks for all terminal config options - PR NousResearch#4673: Don't clobber already-resolved absolute TERMINAL_CWD - Fixes invalid workdir bypassing terminal.cwd config - Skips config override when env var already has absolute path

- Install gnupg in system dependencies - Enables GPG-signed emails from angelos-hermes@mailbox.org - Supports git commit signing with GPG keys - Rename Dockerfile to Containerfile (Podman convention) Signed-off-by: Angelos <angelos-hermes@mailbox.org>

Previous run NousResearch#46 failed due to transient network issue. This is a no-op commit to re-trigger the build pipeline. Related: feat: Add gnupg package for GPG email signing

Previous commit had double-escaped backslashes (\\) which broke the Dockerfile syntax. This fixes the RUN instructions to use proper single backslashes for line continuation. Fixes build failure in run NousResearch#47.

- GPG support moved to hermes-sandbox-image (where it belongs) - Sandbox image is the correct location for agent tooling - Reduces production container attack surface - Follows separation of concerns: * hermes-agent: Production service container * hermes-sandbox: Ephemeral agent execution environment Related: angelos/hermes-sandbox-image@4294f7f

Two merge fallout fixes: 1. delegate_tool.py uses tool_error() 6 times but never imported it from tools.registry — every error path crashed with NameError. 2. test_batch_capped_at_3 expected the old silent-truncation behavior; updated to test the new error-on-excess behavior.

…d lifetime Two changes to how sandbox containers are spawned: 1. Add --init to docker run. This uses tini as PID 1, which automatically reaps zombie child processes. Previously PID 1 was sleep(1) which doesn't call wait() — every background process that exited became a zombie, and the process tool reported them as "running" because zombie PIDs still exist in the process table. Fixes NousResearch#6908 (upstream). 2. Replace 'sleep 2h' with 'sleep infinity'. The fixed 2-hour lifetime was arbitrary and sometimes too short for long agent sessions. The idle reaper (terminal.lifetime_seconds, default 300s, configurable via config.yaml) already handles cleanup based on last activity — there's no reason for the container itself to have a fixed death timer. With sleep infinity, the container lives until the idle reaper kills it or the task ends. Both changes are one line each. No config changes needed — the existing terminal.lifetime_seconds config controls idle timeout.

The upstream merge (PR NousResearch#14) auto-resolved gateway/run.py's 1598-line diff and silently dropped the entire self-nudge system (~55 lines in gateway/run.py, ~37 in run_agent.py, ~5 in model_tools.py). This broke notify_on_complete for background processes — the mechanism that fires a hidden turn when a background process exits was gone. Cherry-picked b8737bc ("feat(gateway): add one-shot self nudge tool") on top of the merged state, resolving 3 conflicts in gateway/run.py (media_message_callback + self_nudge_callback coexistence) and 1 in tests/test_model_tools.py (kept both clarify + self_nudge tests). The self-nudge system provides: - _arm_self_nudge / _cancel_self_nudge / _fire_self_nudge in gateway - self_nudge_callback on AIAgent for tool-initiated delayed turns - self_nudge tool exposed on gateway platforms (not CLI) - _pending_hidden_turns for injecting hidden messages on next turn - Wired to notify_on_complete in terminal_tool.py via process_registry.pending_watchers

gateway/run.py uses uuid.uuid4() at lines 1139 (restart-resume) and 1263 (self-nudge) but uuid was never imported at module level — upstream uses local 'import uuid as _uuid' inside functions, and our fork's additions used bare uuid without adding the import. Self-nudge fired but crashed with NameError: name 'uuid' is not defined.

The self-nudge path called _handle_message_with_agent directly without sending a typing indicator first. The user saw the agent silently process and respond with no 'typing...' feedback in Telegram.

_run_self_nudge_entry called _handle_message_with_agent but discarded its return value. When streaming doesn't deliver (typical for self-nudge turns since there's no user message to stream-edit), the agent's response vanished — the model did the work but the user never saw the result. Now captures the return and explicitly sends via adapter.send if streaming didn't already deliver it.

Adds logger.info when a turn is routed to the cheap model, showing the route label and first 80 chars of the user message. Added to both gateway and CLI paths. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds an optional 'model' parameter to delegate_task so the agent can route subagents to a smaller/faster model for simple tasks (e.g. summarization, formatting, lookups) while keeping the primary model for complex reasoning. Works at both levels: - Top-level 'model' param for single-task delegation - Per-task 'model' field in batch tasks array The per-call model overrides delegation.model from config, which in turn overrides inheriting the parent's model. Per-task model takes precedence over top-level model in batch mode. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds tiered model selection for subagent delegation: - Agent can pass model='small'/'medium'/'large' or a direct model name - Tiers configured via delegation.model_tiers in config.yaml - New list_models tool returns available models with tier assignments Use cases: - Delegate file exploration to a small/fast model - Escalate to a large model when stuck on complex reasoning - Spin up a peer review subagent on a stronger model - Mixed batch: simple tasks on small, complex on default Precedence: per-task model > top-level model param > delegation.model config > inherit parent model. Tier names resolve to configured model names; unknown tiers fall back to default. Not yet upstreamed — local fork only. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

list_models was only handled in the sequential dispatch path. The concurrent path (used when multiple tools are called in one turn) fell through to the default handler, causing the tool call to fail. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

list_models was only in the 'delegation' toolset but composite toolsets (hermes-cli, hermes-telegram, etc.) list tools directly without including the delegation toolset. Added list_models alongside delegate_task in all composite toolsets. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…oint is_local_endpoint only matched IPs and localhost, missing Docker/Podman DNS names like hermes-litellm. This caused stale stream timeouts (180s) to fire on local LLM proxies instead of being auto-disabled. Two fixes: 1. model.local_endpoints config: list of hostnames to treat as local 2. DNS resolution fallback: resolve hostname to IP, check if private Not yet upstreamed — local fork only. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Hostnames without dots (e.g. hermes-litellm, ollama) are always on the local network — Docker/Podman DNS, mDNS, or /etc/hosts. No need to configure them explicitly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds max_context_tokens guard to smart routing: if conversation context exceeds the threshold, stay on the primary model instead of routing to the cheap model. The cheap model is meant to be fast — sending it a large context defeats the purpose. Changes: - choose_cheap_model_route accepts context_tokens parameter - Gateway estimates context from cached agent's session_prompt_tokens or from history length (4 chars ≈ 1 token) - CLI estimates from conversation_history - Log line now includes context token count Config: smart_model_routing.max_context_tokens (default: 0 = disabled) Not yet upstreamed — local fork only. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When trim_context is enabled and context exceeds max_context_tokens, trim conversation history from the head (keep most recent messages) and still route to the cheap model — instead of falling back to the primary model entirely. For a simple "thanks!" in a 48K-token session, only the last ~32K tokens of history are sent to the cheap model. The model can still respond appropriately with recent context. Config: smart_model_routing: max_context_tokens: 32000 trim_context: true # default: false (safe — skip route entirely) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Reassigning the `history` parameter inside the trim block made Python treat it as a local variable, causing UnboundLocalError on earlier reads. Use _trimmed_history + _effective_history instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The entrypoint hardcoded HERMES_HOME="/opt/data", ignoring the environment variable set by the Dockerfile or container runtime. This made it impossible to run multiple instances with different data directories (e.g. a crash-test instance alongside production). Change to ${HERMES_HOME:-/opt/data} so the entrypoint respects the env var when set, falling back to /opt/data as before. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- find_docker() now checks HERMES_DOCKER_BINARY env var first, then docker on PATH, then podman on PATH, then macOS known locations - Entrypoint respects HERMES_HOME env var (was hardcoded to /opt/data) - Entrypoint uses groupmod -o to tolerate non-unique GIDs (fixes macOS GID 20 conflict with Debian's dialout group) - Entrypoint makes chown best-effort so rootless Podman continues instead of failing with 'Operation not permitted' - 5 new tests covering env var override, podman fallback, precedence Based on work by alanjds (PR #3996) and malaiwah (PR #8115). Closes #4084.

#10066) - find_docker() now checks HERMES_DOCKER_BINARY env var first, then docker on PATH, then podman on PATH, then macOS known locations - Entrypoint respects HERMES_HOME env var (was hardcoded to /opt/data) - Entrypoint uses groupmod -o to tolerate non-unique GIDs (fixes macOS GID 20 conflict with Debian's dialout group) - Entrypoint makes chown best-effort so rootless Podman continues instead of failing with 'Operation not permitted' - 5 new tests covering env var override, podman fallback, precedence Based on work by alanjds (PR #3996) and malaiwah (PR #8115). Closes #4084.

teknium1 · 2026-04-27T13:38:59Z

Thanks for the contribution! The specific fix proposed here — HERMES_HOME="${HERMES_HOME:-/opt/data}" in docker/entrypoint.sh — was already merged into main as part of PR #10066 (commit 8548893d1, "feat: entry-level Podman support — find_docker() + rootless entrypoint", April 14 2026). The commit message even calls it out explicitly: "Entrypoint respects HERMES_HOME env var (was hardcoded to /opt/data)".

Closing as implemented on main. This is an automated hermes-sweeper review.

File: docker/entrypoint.sh, line 5
Landed in: commit 8548893d1 via PR feat: entry-level Podman support — find_docker() + rootless entrypoint #10066

NousResearch#10066) - find_docker() now checks HERMES_DOCKER_BINARY env var first, then docker on PATH, then podman on PATH, then macOS known locations - Entrypoint respects HERMES_HOME env var (was hardcoded to /opt/data) - Entrypoint uses groupmod -o to tolerate non-unique GIDs (fixes macOS GID 20 conflict with Debian's dialout group) - Entrypoint makes chown best-effort so rootless Podman continues instead of failing with 'Operation not permitted' - 5 new tests covering env var override, podman fallback, precedence Based on work by alanjds (PR NousResearch#3996) and malaiwah (PR NousResearch#8115). Closes NousResearch#4084.

Hermes Agent (angelos) and others added 30 commits April 8, 2026 03:05

fix: Resolve ruff linting errors

61ed851

- Reorder imports to top of file (E402) - Add noqa comment for registry import (circular import requirement) - Readd missing constants: MAX_DEPTH, MAX_CONCURRENT_CHILDREN, DEFAULT_MAX_ITERATIONS - All ruff checks now passing

fix: configure insecure registry for HTTP Gitea in CI

361aae0

fix: skip docker login, host Podman has auth + insecure registry

a45c215

fix: add --load to docker build for buildx compatibility

55f8d35

fix: disable buildx, use classic builder for Podman compat

b7e3075

fix: add podman-remote login before push

2f4f75d

Use podman-remote for builds, drop DOCKER_BUILDKIT=0 workaround

f4d0ec0

ci: use built-in GITEA_TOKEN for registry push (no stored secrets)

2ae474d

ci: revert to REGISTRY_USER/REGISTRY_TOKEN secrets

6408bf3

ci: remove podman-remote install step (pre-baked in runner image)

29e1ac3

ci: re-trigger build for gnupg package

1205230

Previous run NousResearch#46 failed due to transient network issue. This is a no-op commit to re-trigger the build pipeline. Related: feat: Add gnupg package for GPG email signing

fix: correct backslash escaping in Containerfile

122f014

Previous commit had double-escaped backslashes (\\) which broke the Dockerfile syntax. This fixes the RUN instructions to use proper single backslashes for line continuation. Fixes build failure in run NousResearch#47.

feat(agent): add in-session user progress messages

5240f4c

fix(agent): scope in-session user updates

1b15311

angelos and others added 18 commits April 10, 2026 03:08

fix(gateway): send typing indicator during self-nudge hidden turns

64a3b07

The self-nudge path called _handle_message_with_agent directly without sending a typing indicator first. The user saw the agent silently process and respond with no 'typing...' feedback in Telegram.

chore: confirm ffmpeg for TTS voice bubbles (trigger gateway rebuild)

c8f15ba

Log smart model routing decisions

fa0ecb8

Adds logger.info when a turn is routed to the cheap model, showing the route label and first 80 chars of the user message. Added to both gateway and CLI paths. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

teknium1 mentioned this pull request Apr 15, 2026

feat: entry-level Podman support — find_docker() + rootless entrypoint #10066

Merged

alt-glitch mentioned this pull request Apr 24, 2026

fix(docker): add profiles directory and honor HERMES_HOME env override in entrypoint #8862

Closed

13 tasks

teknium1 closed this Apr 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(docker): respect HERMES_HOME env var in entrypoint#8115

fix(docker): respect HERMES_HOME env var in entrypoint#8115
malaiwah wants to merge 129 commits into
NousResearch:mainfrom
malaiwah:fix/entrypoint-hermes-home-env

malaiwah commented Apr 12, 2026

Uh oh!

teknium1 commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

malaiwah commented Apr 12, 2026

Summary

Changes

Test plan

Uh oh!

teknium1 commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants