Skip to content

Gateway terminal hangs enumerating external macOS volume while local shell succeeds #22111

@justyn-clark

Description

@justyn-clark

Bug Description

Hermes gateway terminal/runtime repeatedly failed to enumerate a large external macOS volume directory that another local coding agent (Pi) could enumerate in milliseconds using the same commands. Hermes could read metadata (stat, df, mount info) but find, ls -1f, and Python os.scandir all timed out with no stdout/stderr. This led the agent to misdiagnose the problem as external SSD/folder responsiveness instead of recognizing a Hermes runtime/tool-capture/session issue sooner.

Target path in observed session:

/Volumes/SSD/1. Sample Library - NEW

Environment observed:

  • macOS/Darwin host
  • Hermes gateway via Telegram DM
  • External volume: /Volumes/SSD, Journaled HFS+, USB, writable, SMART verified
  • Directory metadata was readable: drwxr-xr-x justyn:staff
  • diskutil verifyVolume was accidentally started during debugging, then stopped; after confirming no verification/fsck jobs were running, failures persisted

Steps to Reproduce

From a Hermes gateway/Telegram session with terminal tools enabled, run:

ps aux | grep -Ei '[d]iskutil|[f]sck|[h]fs|[v]erifyVolume' || true

TIMEFORMAT='find_elapsed=%3R'; time /usr/bin/find '/Volumes/SSD/1. Sample Library - NEW' -maxdepth 1 -mindepth 1 -print | wc -l

TIMEFORMAT='ls1f_elapsed=%3R'; time /bin/ls -1f '/Volumes/SSD/1. Sample Library - NEW' | wc -l

python3 - <<'PY'
from pathlib import Path
import os, time
p=Path('/Volumes/SSD/1. Sample Library - NEW')
start=time.time(); n=0
with os.scandir(p) as it:
    for e in it:
        n+=1
print(f'os_scandir_count={n} elapsed={time.time()-start:.4f}s')
PY

Observed in Hermes:

  • no diskutil / fsck / verifyVolume jobs running
  • find timed out after 60-90s with no stdout/stderr
  • ls -1f timed out after 60-90s with no stdout/stderr
  • Python os.scandir timed out after 60-90s with no stdout/stderr
  • post-timeout process checks showed no leftover commands
  • metadata commands such as stat continued to work

Control result from another local coding agent (Pi) on the same machine/path:

  • find -maxdepth 1 completed in ~0.004s
  • ls -1f completed in ~0.004s
  • recursive metadata scan completed in under 1s

Expected Behavior

Hermes should be able to enumerate the directory as quickly as the local shell/other agent, or at minimum should classify the failure as a Hermes terminal/runtime/session/capture problem once multiple simple enumeration methods time out while metadata works.

The agent should not keep spinning for minutes or blame the external drive/folder without stronger evidence.

Actual Behavior

Hermes repeatedly retried equivalent enumeration methods, spent many minutes, started an unnecessary diskutil verifyVolume, and produced low-confidence/incorrect conclusions about external volume I/O.

Requested Fixes

  1. Add a regression test or diagnostic for terminal commands that hang only under the Hermes gateway/runtime path while succeeding in normal local execution.
  2. Improve terminal tool timeout handling so it captures partial output and distinguishes:
    • child command timeout
    • shell/session timeout
    • command-capture deadlock
    • filesystem-level stall
  3. Add a runtime/session refresh path that can be invoked when repeated terminal commands time out while equivalent commands work outside Hermes.
  4. Consider a built-in diagnostic command for macOS removable volume access from gateway-launched Hermes processes, including TCC/removable-volume permission checks.
  5. Update agent guidance/default skills to avoid repeated equivalent filesystem probes after multiple bounded timeouts.

Impact

This blocks Hermes from reliably helping with large external audio/video/sample-library workflows, where external SSDs are common and directory enumeration is a basic operation.

Labels Suggested

  • bug
  • terminal
  • gateway
  • macos
  • reliability

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/gatewayGateway runner, session dispatch, deliverytool/terminalTerminal execution and process managementtype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions