Skip to content

fix(gateway): raise macOS launchd fd limit to 4096 in generated plist#26536

Open
aisaacsmitchell wants to merge 1 commit into
NousResearch:mainfrom
aisaacsmitchell:fix/launchd-fd-limits
Open

fix(gateway): raise macOS launchd fd limit to 4096 in generated plist#26536
aisaacsmitchell wants to merge 1 commit into
NousResearch:mainfrom
aisaacsmitchell:fix/launchd-fd-limits

Conversation

@aisaacsmitchell

Copy link
Copy Markdown

What

Add HardResourceLimits and SoftResourceLimits (NumberOfFiles: 4096) to generate_launchd_plist().

Why

The default macOS per-process open-file limit is 256. The gateway spawns cron-job subprocesses concurrently; under load, file descriptors accumulate across parallel runs and exhaust the limit, causing every subsequent Popen() call to fail with EAGAIN / BlockingIOError. This crashes the gateway and all cron jobs silently until the service is manually restarted.

Both keys are required:

  • SoftResourceLimits is the effective runtime cap.
  • HardResourceLimits is the ceiling the process is permitted to raise itself to via setrlimit(). Setting only Soft leaves Hard at the OS default (256), so any code that calls setrlimit() to raise the soft limit hits the hard wall and fails.

4096 matches the value recommended in the macOS developer documentation for long-running daemons and is consistent with what nginx, postgres, and other launchd-managed services use as a baseline.

How to test

from hermes_cli.gateway import generate_launchd_plist
plist = generate_launchd_plist()
assert '<key>HardResourceLimits</key>' in plist
assert '<key>SoftResourceLimits</key>' in plist
assert '<integer>4096</integer>' in plist

Or: pytest tests/hermes_cli/test_update_gateway_restart.py::TestLaunchdPlistReplace -v

Platforms tested

macOS (Sonoma 15.x, Apple Silicon). Change is plist-only — no effect on Linux (systemd) or Windows paths.

The default macOS per-process open-file limit is 256. The gateway spawns
cron-job subprocesses concurrently; under load, file descriptors accumulate
across parallel runs and exhaust the limit, causing every subsequent
Popen() call to fail with EAGAIN / BlockingIOError. This crashes the gateway
and all cron jobs silently until the service is manually restarted.

Add HardResourceLimits and SoftResourceLimits (NumberOfFiles: 4096) to
generate_launchd_plist() so every hermes gateway install/start/restart
produces a plist with an adequate fd ceiling.

Both keys are required: SoftResourceLimits is the effective runtime cap;
HardResourceLimits is the ceiling the process is permitted to raise itself
to. Setting only Soft leaves Hard at the OS default (256), so any call to
setrlimit() that tries to raise the soft limit hits the hard wall and fails.

4096 matches the value recommended in the macOS developer documentation for
long-running daemons and is consistent with what other gateway daemons
(nginx, postgres launchd plists) use as a baseline.

Adds two tests to TestLaunchdPlistReplace verifying the keys are present
and appear exactly once each.
@alt-glitch alt-glitch added type/bug Something isn't working comp/gateway Gateway runner, session dispatch, delivery comp/cron Cron scheduler and job management P2 Medium — degraded but workaround exists labels May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cron Cron scheduler and job management comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants