Skip to content

FD leak: response_store.db opens multiple SQLite connections without closing, hits ulimit after ~2 days #37369

@datin-antasena

Description

@datin-antasena

Description

The gateway process leaks file descriptors through response_store.db. After running for approximately 2 days with a Telegram gateway (multiple topics/chats), the process hits the default ulimit of 1024 open files and starts failing with OSError: [Errno 24] Too many open files.

Evidence

FD leak in response_store.db

After just 20 minutes of uptime (post-restart), response_store.db already has 15 FDs open (7-8 separate SQLite connections to the same file):

lrwx 18 -> /home/ubuntu/.hermes/response_store.db
lrwx 19 -> /home/ubuntu/.hermes/response_store.db-wal
lrwx 20 -> /home/ubuntu/.hermes/response_store.db-shm
lrwx 21 -> /home/ubuntu/.hermes/response_store.db
lrwx 22 -> /home/ubuntu/.hermes/response_store.db-wal
lrwx 23 -> /home/ubuntu/.hermes/response_store.db
lrwx 24 -> /home/ubuntu/.hermes/response_store.db-wal
lrwx 26 -> /home/ubuntu/.hermes/response_store.db
lrwx 27 -> /home/ubuntu/.hermes/response_store.db-wal
lrwx 31 -> /home/ubuntu/.hermes/response_store.db
lrwx 32 -> /home/ubuntu/.hermes/response_store.db-wal
lrwx 34 -> /home/ubuntu/.hermes/response_store.db
lrwx 36 -> /home/ubuntu/.hermes/response_store.db-wal
lrwx 37 -> /home/ubuntu/.hermes/response_store.db
lrwx 38 -> /home/ubuntu/.hermes/response_store.db-wal

Each agent session appears to open a new SQLite connection to response_store.db without closing previous ones.

Error cascade

The FD exhaustion cascades across multiple subsystems:

2026-06-02 19:22:04 ERROR [Telegram] Error handling message: [Errno 24] Too many open files: sessions/.sessions_iefgia2d.tmp
2026-06-02 14:37:56 ERROR kanban dispatcher: tick failed on board default - [Errno 24] Too many open files: kanban.db.init.lock
2026-06-02 19:22:13 ERROR [Telegram] Error handling message: [Errno 24] Too many open files: gateway/slash_access.py

Timeline

  • Gateway uptime before crash: ~45 hours (163,200 seconds)
  • Total session files: 454
  • Default ulimit: 1024
  • Active Telegram topics: ~8 (General, Peksos, DataOps, Blogging, Riset, Sidejob, Log, Birokrasi)

After restart

PID: 1732839 (restarted at 19:26)
Total FDs: 28 (healthy)
response_store.db FDs: 15 (already leaking after 20 min)

Workaround

Added LimitNOFILE=65536 to the systemd service file. This delays the crash but does not fix the leak.

Expected behavior

SQLite connections should be closed when an agent session ends. Each database file should have at most 2-3 FDs open (one connection + WAL/SHM), not 7-8.

Environment

  • Hermes Agent v0.15.1 (2026.5.29), commit c10ccaa
  • Python 3.11.15
  • Ubuntu (Linux 6.8.0-101-generic)
  • Platform: Telegram gateway with multiple topics
  • OpenAI SDK: 2.24.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High — major feature broken, no workaroundcomp/gatewayGateway runner, session dispatch, deliveryplatform/telegramTelegram bot adaptertype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions