Skip to content

fix: close ephemeral agents to prevent fd leaks#12998

Closed
bloodcarter wants to merge 1 commit into
NousResearch:mainfrom
bloodcarter:fix/close-ephemeral-agents-emfile
Closed

fix: close ephemeral agents to prevent fd leaks#12998
bloodcarter wants to merge 1 commit into
NousResearch:mainfrom
bloodcarter:fix/close-ephemeral-agents-emfile

Conversation

@bloodcarter

Copy link
Copy Markdown
Contributor

Summary

  • close cron job agents in cron.scheduler.run_job() so OpenAI/httpx, browser, process, and child-agent resources are released deterministically
  • close ephemeral gateway agents used for memory flush, session hygiene compression, /background, /btw, and manual /compress
  • add regression tests asserting close() is called on success and failure paths

Why

Long-lived gateway/cron processes were accumulating file descriptors until they hit EMFILE / Too many open files. Restarting the gateway reset the symptom, but these ephemeral AIAgent instances were not being closed.

Testing

  • python -m pytest tests/cron/test_scheduler.py::TestRunJobSessionPersistence::test_run_job_passes_session_db_and_cron_platform tests/cron/test_scheduler.py::TestRunJobSessionPersistence::test_run_job_empty_response_returns_empty_not_placeholder tests/cron/test_scheduler.py::TestRunJobSessionPersistence::test_run_job_closes_agent_on_failure tests/gateway/test_compress_focus.py tests/gateway/test_flush_memory_stale_guard.py -q
  • independent review subagent verdict: passed

Notes

  • Running the broader selected suite also surfaced unrelated pre-existing failures in tests/cron/test_scheduler.py around silent delivery / tick behavior; this PR does not modify those paths.

@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/cron Cron scheduler and job management comp/gateway Gateway runner, session dispatch, delivery labels Apr 22, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Related to #8331 (close leaked cron and gateway resources) — may supersede or overlap.

@bloodcarter

Copy link
Copy Markdown
Contributor Author

Superseded by #13979, which includes this PR's agent.close() fix in cron/scheduler.py::run_job and extends it with cleanup_stale_async_clients() to reap auxiliary httpx clients whose worker-thread loop died with the ThreadPoolExecutor. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cron Cron scheduler and job management comp/gateway Gateway runner, session dispatch, delivery P1 High — major feature broken, no workaround type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants