perf(terminal): adaptive subprocess poll — cut ~195ms off every tool call, 1+ second per turn by teknium1 · Pull Request #29006 · NousResearch/hermes-agent

teknium1 · 2026-05-20T00:55:54Z

Summary

This is the perf win that lands DURING a chat session, not just at startup.

_wait_for_process() in tools/environments/base.py was sleeping for a fixed 200ms between polls of the subprocess exit status. For commands that complete in <50ms (echo, pwd, date, cat short files, write_file/read_file with small content), the agent was stuck waiting for the next 200ms tick to notice the process had exited. That fixed floor was the dominant component of per-tool latency for typical short commands.

Replace with adaptive backoff: start at 5ms, multiply by 1.5 each iteration up to 200ms. Fast commands (the common case) return in ~6ms. Long-running commands (builds, tests, sleeps) reach the 200ms steady-state poll rate within ~12 iterations (~150ms total) and pay identical CPU after that.

How I found it

Instrumented a real interactive tmux session with multiple tool calls and inspected the gaps between log events. Saw tool terminal completed events firing 217ms apart for commands that should take ~30ms. The 217ms matched the 200ms poll + drain idle interval almost exactly.

Validation

Deterministic microbench of echo first through the actual poll loop (20 runs each):

	BEFORE	AFTER
median wall	200ms	5ms
min wall	200ms	5ms
max wall	200ms	7ms

End-to-end chat -q with 3 sequential terminal tool calls (4 runs each):

	BEFORE	AFTER	delta
median wall	5.73s	4.64s	-1096ms
min wall	5.61s	4.60s	-1014ms

Live tmux session verifying user-visible behavior: a 'write file, read it back' turn now shows each tool as 0.1s in the spinner (was 0.9s before — see the example from a real session: ✍️ write 0.9s, 📖 read 0.9s became ✍️ write 0.1s, 📖 read 0.1s).

Per-turn impact for typical workflows

Hermes chat sessions commonly do 4-8 terminal/file calls per turn. This saves:

4 calls × 195ms = ~780ms per turn
8 calls × 195ms = ~1.5s per turn

Stacked across a multi-turn session, this is the biggest user-visible 'feels faster' win in the perf series.

Why it's safe

Interrupt and timeout checks fire on every iteration (no longer rate-limited to 5/sec)
Activity callback fires on the same 'due' schedule (touch_activity_if_due is unchanged)
DEBUG_INTERRUPT heartbeat schedule is unchanged (30s)
Steady-state poll rate for long-running commands matches the old 200ms behavior within ~150ms of process start
The drain thread's select(timeout=0.1) and idle_after_exit >= 3 logic is unchanged — output capture semantics are preserved

Tests

tests/tools/ (full module) — 5246 passed, 22 skipped
2 failures (test_delegate.py::test_depth_limit, ::test_constants) — confirmed pre-existing xdist test-pollution flakes, pass in isolation. Listed in the documented flake set in hermes-agent-perf-work skill.
Live tmux session with multi-turn conversation + tool calls — zero errors in agent.log

Cluster of perf wins shipped today

perf(cli) defer openai._base_client — -28% / -19MB on every cold start (perf(cli): defer openai._base_client import to cut 240ms / 17MB off every CLI cold start #28864)
perf(agent-loop) -47% function calls per turn (perf(agent-loop): cut 47% of per-conversation function calls via 3 targeted hot-path optimizations #28866)
perf(compression) defer feasibility check — -170-290ms per invocation (perf(compression): defer feasibility check to first compression attempt — cut 170-290ms off every chat invocation #28957)
perf(terminal) adaptive subprocess poll — -195ms PER TOOL CALL, -1s+ per turn (this PR)

This one is the one that actually moves the user-perceived 'tool just ran' speed needle.

`_wait_for_process()` was sleeping for a fixed 200ms between polls of the subprocess exit status. For commands that complete in <50ms (echo, pwd, date, cat short files, write_file with small content, read_file with small content), the agent was stuck waiting for the next 200ms tick to notice the process had exited. That floor was the dominant component of per-tool latency for typical short commands. Replace with adaptive backoff: start at 5ms, multiply by 1.5 each iteration up to 200ms. Fast commands (the common case) return in ~6ms; long-running commands (builds, tests, sleeps) reach the 200ms steady-state poll rate within ~12 iterations (~150ms total) and pay identical CPU after that. Tool-call wall time (deterministic microbench of `echo first`): before: median 200ms min 200ms max 200ms after: median 5ms min 5ms max 7ms saved: ~195ms per terminal tool call End-to-end chat -q with 3 sequential terminal tool calls (`echo first`, `echo second`, `echo third`): before: median 5.73s, min 5.61s after: median 4.64s, min 4.60s saved: ~1100ms wall per turn Live tmux session: a typical 'write file, read it back' turn now displays each tool as 0.1s in the spinner (was 0.9s before). The agent observes the subprocess exit ~200ms faster per call. For chat workflows that do 4-8 terminal/file calls per turn this saves 800ms-1.5s of pure wall-clock waiting. Why it's safe: - Interrupt and timeout checks still fire on every iteration (no longer rate-limited to 5/sec) - Activity callback fires on the same 'due' schedule (`touch_activity_if_due`) - DEBUG_INTERRUPT heartbeat is unchanged (30s) - Steady-state poll rate for long-running commands matches the old 200ms within ~150ms of startup Tests: - tests/tools/ — 5246 passed, 22 skipped, 2 pre-existing xdist flakes (test_delegate.py::test_depth_limit, test_constants — pass in isolation) - Live tmux: 2-turn conversation + multiple tool calls, no errors

github-actions · 2026-05-20T00:56:37Z

🔎 Lint report: `perf/adaptive-subprocess-poll` vs `origin/main`

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 8963 on HEAD, 8963 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 4725 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

…all (NousResearch#29006) `_wait_for_process()` was sleeping for a fixed 200ms between polls of the subprocess exit status. For commands that complete in <50ms (echo, pwd, date, cat short files, write_file with small content, read_file with small content), the agent was stuck waiting for the next 200ms tick to notice the process had exited. That floor was the dominant component of per-tool latency for typical short commands. Replace with adaptive backoff: start at 5ms, multiply by 1.5 each iteration up to 200ms. Fast commands (the common case) return in ~6ms; long-running commands (builds, tests, sleeps) reach the 200ms steady-state poll rate within ~12 iterations (~150ms total) and pay identical CPU after that. Tool-call wall time (deterministic microbench of `echo first`): before: median 200ms min 200ms max 200ms after: median 5ms min 5ms max 7ms saved: ~195ms per terminal tool call End-to-end chat -q with 3 sequential terminal tool calls (`echo first`, `echo second`, `echo third`): before: median 5.73s, min 5.61s after: median 4.64s, min 4.60s saved: ~1100ms wall per turn Live tmux session: a typical 'write file, read it back' turn now displays each tool as 0.1s in the spinner (was 0.9s before). The agent observes the subprocess exit ~200ms faster per call. For chat workflows that do 4-8 terminal/file calls per turn this saves 800ms-1.5s of pure wall-clock waiting. Why it's safe: - Interrupt and timeout checks still fire on every iteration (no longer rate-limited to 5/sec) - Activity callback fires on the same 'due' schedule (`touch_activity_if_due`) - DEBUG_INTERRUPT heartbeat is unchanged (30s) - Steady-state poll rate for long-running commands matches the old 200ms within ~150ms of startup Tests: - tests/tools/ — 5246 passed, 22 skipped, 2 pre-existing xdist flakes (test_delegate.py::test_depth_limit, test_constants — pass in isolation) - Live tmux: 2-turn conversation + multiple tool calls, no errors

…all (NousResearch#29006) `_wait_for_process()` was sleeping for a fixed 200ms between polls of the subprocess exit status. For commands that complete in <50ms (echo, pwd, date, cat short files, write_file with small content, read_file with small content), the agent was stuck waiting for the next 200ms tick to notice the process had exited. That floor was the dominant component of per-tool latency for typical short commands. Replace with adaptive backoff: start at 5ms, multiply by 1.5 each iteration up to 200ms. Fast commands (the common case) return in ~6ms; long-running commands (builds, tests, sleeps) reach the 200ms steady-state poll rate within ~12 iterations (~150ms total) and pay identical CPU after that. Tool-call wall time (deterministic microbench of `echo first`): before: median 200ms min 200ms max 200ms after: median 5ms min 5ms max 7ms saved: ~195ms per terminal tool call End-to-end chat -q with 3 sequential terminal tool calls (`echo first`, `echo second`, `echo third`): before: median 5.73s, min 5.61s after: median 4.64s, min 4.60s saved: ~1100ms wall per turn Live tmux session: a typical 'write file, read it back' turn now displays each tool as 0.1s in the spinner (was 0.9s before). The agent observes the subprocess exit ~200ms faster per call. For chat workflows that do 4-8 terminal/file calls per turn this saves 800ms-1.5s of pure wall-clock waiting. Why it's safe: - Interrupt and timeout checks still fire on every iteration (no longer rate-limited to 5/sec) - Activity callback fires on the same 'due' schedule (`touch_activity_if_due`) - DEBUG_INTERRUPT heartbeat is unchanged (30s) - Steady-state poll rate for long-running commands matches the old 200ms within ~150ms of startup Tests: - tests/tools/ — 5246 passed, 22 skipped, 2 pre-existing xdist flakes (test_delegate.py::test_depth_limit, test_constants — pass in isolation) - Live tmux: 2-turn conversation + multiple tool calls, no errors #AI commit#

…all (NousResearch#29006) `_wait_for_process()` was sleeping for a fixed 200ms between polls of the subprocess exit status. For commands that complete in <50ms (echo, pwd, date, cat short files, write_file with small content, read_file with small content), the agent was stuck waiting for the next 200ms tick to notice the process had exited. That floor was the dominant component of per-tool latency for typical short commands. Replace with adaptive backoff: start at 5ms, multiply by 1.5 each iteration up to 200ms. Fast commands (the common case) return in ~6ms; long-running commands (builds, tests, sleeps) reach the 200ms steady-state poll rate within ~12 iterations (~150ms total) and pay identical CPU after that. Tool-call wall time (deterministic microbench of `echo first`): before: median 200ms min 200ms max 200ms after: median 5ms min 5ms max 7ms saved: ~195ms per terminal tool call End-to-end chat -q with 3 sequential terminal tool calls (`echo first`, `echo second`, `echo third`): before: median 5.73s, min 5.61s after: median 4.64s, min 4.60s saved: ~1100ms wall per turn Live tmux session: a typical 'write file, read it back' turn now displays each tool as 0.1s in the spinner (was 0.9s before). The agent observes the subprocess exit ~200ms faster per call. For chat workflows that do 4-8 terminal/file calls per turn this saves 800ms-1.5s of pure wall-clock waiting. Why it's safe: - Interrupt and timeout checks still fire on every iteration (no longer rate-limited to 5/sec) - Activity callback fires on the same 'due' schedule (`touch_activity_if_due`) - DEBUG_INTERRUPT heartbeat is unchanged (30s) - Steady-state poll rate for long-running commands matches the old 200ms within ~150ms of startup Tests: - tests/tools/ — 5246 passed, 22 skipped, 2 pre-existing xdist flakes (test_delegate.py::test_depth_limit, test_constants — pass in isolation) - Live tmux: 2-turn conversation + multiple tool calls, no errors

alt-glitch added type/perf Performance improvement or optimization tool/terminal Terminal execution and process management P2 Medium — degraded but workaround exists labels May 20, 2026

teknium1 merged commit 6bd4311 into main May 20, 2026
20 of 21 checks passed

teknium1 deleted the perf/adaptive-subprocess-poll branch May 20, 2026 03:02

BrewTestBot mentioned this pull request May 28, 2026

hermes-agent 2026.5.28 Homebrew/homebrew-core#285115

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(terminal): adaptive subprocess poll — cut ~195ms off every tool call, 1+ second per turn#29006

perf(terminal): adaptive subprocess poll — cut ~195ms off every tool call, 1+ second per turn#29006
teknium1 merged 1 commit into
mainfrom
perf/adaptive-subprocess-poll

teknium1 commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

teknium1 commented May 20, 2026

Summary

How I found it

Validation

Per-turn impact for typical workflows

Why it's safe

Tests

Cluster of perf wins shipped today

Uh oh!

github-actions Bot commented May 20, 2026

🔎 Lint report: perf/adaptive-subprocess-poll vs origin/main

ruff

ty (type checker)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

🔎 Lint report: `perf/adaptive-subprocess-poll` vs `origin/main`