Skip to content

Persistent terminal sandboxes (docker/daytona/modal) are torn down every turn, contradicting terminal.lifetime_seconds #6369

@malaiwah

Description

@malaiwah

Summary

HermesAgentLoop._cleanup_task_resources calls cleanup_vm(task_id) at the end of every run_conversation invocation (i.e. every user turn). This unconditionally stops and removes the active terminal sandbox container even when the backend is configured to be persistent (container_persistent: true, default for docker / daytona / modal). As a result:

  • A new container is spawned for the next turn (different hostname, fresh /workspace).
  • Anything written to /workspace between turns is lost (files, scratch notes, build artifacts).
  • Agent CLI auth state inside the sandbox (e.g. ~/.opencode/, ~/.codex/, ~/.config/gh, ~/.gnupg) is wiped — every turn the sub-agent has to re-authenticate.
  • The documented terminal.lifetime_seconds idle reaper (_cleanup_inactive_envs) never gets a chance to act on persistent envs because they are pre-emptively destroyed.

Within a single turn the container is correctly reused (multiple tool calls share it via _active_environments[task_id]), so the bug is only visible across turn boundaries.

Reproduction

config.yaml:

terminal:
  backend: docker
  lifetime_seconds: 600
  docker_image: <any>

In an interactive CLI session:

> run: hostname > /tmp/h && cat /tmp/h
38c9242c5c02
> run: cat /tmp/h
cat: /tmp/h: No such file or directory
> run: hostname
df7e901e6473

Expected: same hostname, file persists, until 600s of inactivity.
Actual: container destroyed at end of turn 1.

Root cause / git archaeology

  • faecbddd (2025-11-02, "fix terminal interactivity") introduced terminal.lifetime_seconds and _cleanup_inactive_envs — the idle reaper for persistent envs.
  • fbd3a2fd (2025-11-04, "prevent leakage of morph instances between tasks") added an unconditional cleanup_vm(effective_task_id) at the end of run_conversation to fix a Morph backend leak. This was correct for Morph (non-persistent) but blanket-applied to all backends.
  • 70dd3a16 (2026-02-20, "Cleanup time!") refactored the inline calls into _cleanup_task_resources, preserving the unconditional behavior.

Code and docs have disagreed for ~5 months.

Fix

Skip cleanup_vm in _cleanup_task_resources when the active env reports persistent_filesystem=True. The idle reaper still tears it down on terminal.lifetime_seconds expiry. Non-persistent backends (Morph) remain torn down per turn — original leak-prevention intent preserved.

PR: https://github.com/NousResearch/hermes-agent/pull/

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions