perf(terminal): adaptive subprocess poll — cut ~195ms off every tool call, 1+ second per turn#29006
Merged
Merged
Conversation
`_wait_for_process()` was sleeping for a fixed 200ms between polls of the subprocess exit status. For commands that complete in <50ms (echo, pwd, date, cat short files, write_file with small content, read_file with small content), the agent was stuck waiting for the next 200ms tick to notice the process had exited. That floor was the dominant component of per-tool latency for typical short commands. Replace with adaptive backoff: start at 5ms, multiply by 1.5 each iteration up to 200ms. Fast commands (the common case) return in ~6ms; long-running commands (builds, tests, sleeps) reach the 200ms steady-state poll rate within ~12 iterations (~150ms total) and pay identical CPU after that. Tool-call wall time (deterministic microbench of `echo first`): before: median 200ms min 200ms max 200ms after: median 5ms min 5ms max 7ms saved: ~195ms per terminal tool call End-to-end chat -q with 3 sequential terminal tool calls (`echo first`, `echo second`, `echo third`): before: median 5.73s, min 5.61s after: median 4.64s, min 4.60s saved: ~1100ms wall per turn Live tmux session: a typical 'write file, read it back' turn now displays each tool as 0.1s in the spinner (was 0.9s before). The agent observes the subprocess exit ~200ms faster per call. For chat workflows that do 4-8 terminal/file calls per turn this saves 800ms-1.5s of pure wall-clock waiting. Why it's safe: - Interrupt and timeout checks still fire on every iteration (no longer rate-limited to 5/sec) - Activity callback fires on the same 'due' schedule (`touch_activity_if_due`) - DEBUG_INTERRUPT heartbeat is unchanged (30s) - Steady-state poll rate for long-running commands matches the old 200ms within ~150ms of startup Tests: - tests/tools/ — 5246 passed, 22 skipped, 2 pre-existing xdist flakes (test_delegate.py::test_depth_limit, test_constants — pass in isolation) - Live tmux: 2-turn conversation + multiple tool calls, no errors
Contributor
🔎 Lint report:
|
Lillard01
pushed a commit
to Lillard01/hermes-agent
that referenced
this pull request
May 21, 2026
…all (NousResearch#29006) `_wait_for_process()` was sleeping for a fixed 200ms between polls of the subprocess exit status. For commands that complete in <50ms (echo, pwd, date, cat short files, write_file with small content, read_file with small content), the agent was stuck waiting for the next 200ms tick to notice the process had exited. That floor was the dominant component of per-tool latency for typical short commands. Replace with adaptive backoff: start at 5ms, multiply by 1.5 each iteration up to 200ms. Fast commands (the common case) return in ~6ms; long-running commands (builds, tests, sleeps) reach the 200ms steady-state poll rate within ~12 iterations (~150ms total) and pay identical CPU after that. Tool-call wall time (deterministic microbench of `echo first`): before: median 200ms min 200ms max 200ms after: median 5ms min 5ms max 7ms saved: ~195ms per terminal tool call End-to-end chat -q with 3 sequential terminal tool calls (`echo first`, `echo second`, `echo third`): before: median 5.73s, min 5.61s after: median 4.64s, min 4.60s saved: ~1100ms wall per turn Live tmux session: a typical 'write file, read it back' turn now displays each tool as 0.1s in the spinner (was 0.9s before). The agent observes the subprocess exit ~200ms faster per call. For chat workflows that do 4-8 terminal/file calls per turn this saves 800ms-1.5s of pure wall-clock waiting. Why it's safe: - Interrupt and timeout checks still fire on every iteration (no longer rate-limited to 5/sec) - Activity callback fires on the same 'due' schedule (`touch_activity_if_due`) - DEBUG_INTERRUPT heartbeat is unchanged (30s) - Steady-state poll rate for long-running commands matches the old 200ms within ~150ms of startup Tests: - tests/tools/ — 5246 passed, 22 skipped, 2 pre-existing xdist flakes (test_delegate.py::test_depth_limit, test_constants — pass in isolation) - Live tmux: 2-turn conversation + multiple tool calls, no errors
clckmedia
pushed a commit
to clckmedia/hermes-agent
that referenced
this pull request
May 21, 2026
…all (NousResearch#29006) `_wait_for_process()` was sleeping for a fixed 200ms between polls of the subprocess exit status. For commands that complete in <50ms (echo, pwd, date, cat short files, write_file with small content, read_file with small content), the agent was stuck waiting for the next 200ms tick to notice the process had exited. That floor was the dominant component of per-tool latency for typical short commands. Replace with adaptive backoff: start at 5ms, multiply by 1.5 each iteration up to 200ms. Fast commands (the common case) return in ~6ms; long-running commands (builds, tests, sleeps) reach the 200ms steady-state poll rate within ~12 iterations (~150ms total) and pay identical CPU after that. Tool-call wall time (deterministic microbench of `echo first`): before: median 200ms min 200ms max 200ms after: median 5ms min 5ms max 7ms saved: ~195ms per terminal tool call End-to-end chat -q with 3 sequential terminal tool calls (`echo first`, `echo second`, `echo third`): before: median 5.73s, min 5.61s after: median 4.64s, min 4.60s saved: ~1100ms wall per turn Live tmux session: a typical 'write file, read it back' turn now displays each tool as 0.1s in the spinner (was 0.9s before). The agent observes the subprocess exit ~200ms faster per call. For chat workflows that do 4-8 terminal/file calls per turn this saves 800ms-1.5s of pure wall-clock waiting. Why it's safe: - Interrupt and timeout checks still fire on every iteration (no longer rate-limited to 5/sec) - Activity callback fires on the same 'due' schedule (`touch_activity_if_due`) - DEBUG_INTERRUPT heartbeat is unchanged (30s) - Steady-state poll rate for long-running commands matches the old 200ms within ~150ms of startup Tests: - tests/tools/ — 5246 passed, 22 skipped, 2 pre-existing xdist flakes (test_delegate.py::test_depth_limit, test_constants — pass in isolation) - Live tmux: 2-turn conversation + multiple tool calls, no errors
Gpapas
pushed a commit
to Gpapas/hermes-agent
that referenced
this pull request
May 23, 2026
…all (NousResearch#29006) `_wait_for_process()` was sleeping for a fixed 200ms between polls of the subprocess exit status. For commands that complete in <50ms (echo, pwd, date, cat short files, write_file with small content, read_file with small content), the agent was stuck waiting for the next 200ms tick to notice the process had exited. That floor was the dominant component of per-tool latency for typical short commands. Replace with adaptive backoff: start at 5ms, multiply by 1.5 each iteration up to 200ms. Fast commands (the common case) return in ~6ms; long-running commands (builds, tests, sleeps) reach the 200ms steady-state poll rate within ~12 iterations (~150ms total) and pay identical CPU after that. Tool-call wall time (deterministic microbench of `echo first`): before: median 200ms min 200ms max 200ms after: median 5ms min 5ms max 7ms saved: ~195ms per terminal tool call End-to-end chat -q with 3 sequential terminal tool calls (`echo first`, `echo second`, `echo third`): before: median 5.73s, min 5.61s after: median 4.64s, min 4.60s saved: ~1100ms wall per turn Live tmux session: a typical 'write file, read it back' turn now displays each tool as 0.1s in the spinner (was 0.9s before). The agent observes the subprocess exit ~200ms faster per call. For chat workflows that do 4-8 terminal/file calls per turn this saves 800ms-1.5s of pure wall-clock waiting. Why it's safe: - Interrupt and timeout checks still fire on every iteration (no longer rate-limited to 5/sec) - Activity callback fires on the same 'due' schedule (`touch_activity_if_due`) - DEBUG_INTERRUPT heartbeat is unchanged (30s) - Steady-state poll rate for long-running commands matches the old 200ms within ~150ms of startup Tests: - tests/tools/ — 5246 passed, 22 skipped, 2 pre-existing xdist flakes (test_delegate.py::test_depth_limit, test_constants — pass in isolation) - Live tmux: 2-turn conversation + multiple tool calls, no errors
Mucky010
pushed a commit
to Mucky010/hermes-agent
that referenced
this pull request
May 24, 2026
…all (NousResearch#29006) `_wait_for_process()` was sleeping for a fixed 200ms between polls of the subprocess exit status. For commands that complete in <50ms (echo, pwd, date, cat short files, write_file with small content, read_file with small content), the agent was stuck waiting for the next 200ms tick to notice the process had exited. That floor was the dominant component of per-tool latency for typical short commands. Replace with adaptive backoff: start at 5ms, multiply by 1.5 each iteration up to 200ms. Fast commands (the common case) return in ~6ms; long-running commands (builds, tests, sleeps) reach the 200ms steady-state poll rate within ~12 iterations (~150ms total) and pay identical CPU after that. Tool-call wall time (deterministic microbench of `echo first`): before: median 200ms min 200ms max 200ms after: median 5ms min 5ms max 7ms saved: ~195ms per terminal tool call End-to-end chat -q with 3 sequential terminal tool calls (`echo first`, `echo second`, `echo third`): before: median 5.73s, min 5.61s after: median 4.64s, min 4.60s saved: ~1100ms wall per turn Live tmux session: a typical 'write file, read it back' turn now displays each tool as 0.1s in the spinner (was 0.9s before). The agent observes the subprocess exit ~200ms faster per call. For chat workflows that do 4-8 terminal/file calls per turn this saves 800ms-1.5s of pure wall-clock waiting. Why it's safe: - Interrupt and timeout checks still fire on every iteration (no longer rate-limited to 5/sec) - Activity callback fires on the same 'due' schedule (`touch_activity_if_due`) - DEBUG_INTERRUPT heartbeat is unchanged (30s) - Steady-state poll rate for long-running commands matches the old 200ms within ~150ms of startup Tests: - tests/tools/ — 5246 passed, 22 skipped, 2 pre-existing xdist flakes (test_delegate.py::test_depth_limit, test_constants — pass in isolation) - Live tmux: 2-turn conversation + multiple tool calls, no errors
1 task
Bryce-huang
pushed a commit
to wbkunlun/hermes-agent
that referenced
this pull request
May 29, 2026
…all (NousResearch#29006) `_wait_for_process()` was sleeping for a fixed 200ms between polls of the subprocess exit status. For commands that complete in <50ms (echo, pwd, date, cat short files, write_file with small content, read_file with small content), the agent was stuck waiting for the next 200ms tick to notice the process had exited. That floor was the dominant component of per-tool latency for typical short commands. Replace with adaptive backoff: start at 5ms, multiply by 1.5 each iteration up to 200ms. Fast commands (the common case) return in ~6ms; long-running commands (builds, tests, sleeps) reach the 200ms steady-state poll rate within ~12 iterations (~150ms total) and pay identical CPU after that. Tool-call wall time (deterministic microbench of `echo first`): before: median 200ms min 200ms max 200ms after: median 5ms min 5ms max 7ms saved: ~195ms per terminal tool call End-to-end chat -q with 3 sequential terminal tool calls (`echo first`, `echo second`, `echo third`): before: median 5.73s, min 5.61s after: median 4.64s, min 4.60s saved: ~1100ms wall per turn Live tmux session: a typical 'write file, read it back' turn now displays each tool as 0.1s in the spinner (was 0.9s before). The agent observes the subprocess exit ~200ms faster per call. For chat workflows that do 4-8 terminal/file calls per turn this saves 800ms-1.5s of pure wall-clock waiting. Why it's safe: - Interrupt and timeout checks still fire on every iteration (no longer rate-limited to 5/sec) - Activity callback fires on the same 'due' schedule (`touch_activity_if_due`) - DEBUG_INTERRUPT heartbeat is unchanged (30s) - Steady-state poll rate for long-running commands matches the old 200ms within ~150ms of startup Tests: - tests/tools/ — 5246 passed, 22 skipped, 2 pre-existing xdist flakes (test_delegate.py::test_depth_limit, test_constants — pass in isolation) - Live tmux: 2-turn conversation + multiple tool calls, no errors #AI commit#
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
…all (NousResearch#29006) `_wait_for_process()` was sleeping for a fixed 200ms between polls of the subprocess exit status. For commands that complete in <50ms (echo, pwd, date, cat short files, write_file with small content, read_file with small content), the agent was stuck waiting for the next 200ms tick to notice the process had exited. That floor was the dominant component of per-tool latency for typical short commands. Replace with adaptive backoff: start at 5ms, multiply by 1.5 each iteration up to 200ms. Fast commands (the common case) return in ~6ms; long-running commands (builds, tests, sleeps) reach the 200ms steady-state poll rate within ~12 iterations (~150ms total) and pay identical CPU after that. Tool-call wall time (deterministic microbench of `echo first`): before: median 200ms min 200ms max 200ms after: median 5ms min 5ms max 7ms saved: ~195ms per terminal tool call End-to-end chat -q with 3 sequential terminal tool calls (`echo first`, `echo second`, `echo third`): before: median 5.73s, min 5.61s after: median 4.64s, min 4.60s saved: ~1100ms wall per turn Live tmux session: a typical 'write file, read it back' turn now displays each tool as 0.1s in the spinner (was 0.9s before). The agent observes the subprocess exit ~200ms faster per call. For chat workflows that do 4-8 terminal/file calls per turn this saves 800ms-1.5s of pure wall-clock waiting. Why it's safe: - Interrupt and timeout checks still fire on every iteration (no longer rate-limited to 5/sec) - Activity callback fires on the same 'due' schedule (`touch_activity_if_due`) - DEBUG_INTERRUPT heartbeat is unchanged (30s) - Steady-state poll rate for long-running commands matches the old 200ms within ~150ms of startup Tests: - tests/tools/ — 5246 passed, 22 skipped, 2 pre-existing xdist flakes (test_delegate.py::test_depth_limit, test_constants — pass in isolation) - Live tmux: 2-turn conversation + multiple tool calls, no errors
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This is the perf win that lands DURING a chat session, not just at startup.
_wait_for_process()intools/environments/base.pywas sleeping for a fixed 200ms between polls of the subprocess exit status. For commands that complete in <50ms (echo, pwd, date, cat short files, write_file/read_file with small content), the agent was stuck waiting for the next 200ms tick to notice the process had exited. That fixed floor was the dominant component of per-tool latency for typical short commands.Replace with adaptive backoff: start at 5ms, multiply by 1.5 each iteration up to 200ms. Fast commands (the common case) return in ~6ms. Long-running commands (builds, tests, sleeps) reach the 200ms steady-state poll rate within ~12 iterations (~150ms total) and pay identical CPU after that.
How I found it
Instrumented a real interactive tmux session with multiple tool calls and inspected the gaps between log events. Saw
tool terminal completedevents firing 217ms apart for commands that should take ~30ms. The 217ms matched the 200ms poll + drain idle interval almost exactly.Validation
Deterministic microbench of
echo firstthrough the actual poll loop (20 runs each):End-to-end
chat -qwith 3 sequential terminal tool calls (4 runs each):Live tmux session verifying user-visible behavior: a 'write file, read it back' turn now shows each tool as
0.1sin the spinner (was0.9sbefore — see the example from a real session:✍️ write 0.9s,📖 read 0.9sbecame✍️ write 0.1s,📖 read 0.1s).Per-turn impact for typical workflows
Hermes chat sessions commonly do 4-8 terminal/file calls per turn. This saves:
Stacked across a multi-turn session, this is the biggest user-visible 'feels faster' win in the perf series.
Why it's safe
touch_activity_if_dueis unchanged)select(timeout=0.1)andidle_after_exit >= 3logic is unchanged — output capture semantics are preservedTests
tests/tools/(full module) — 5246 passed, 22 skippedtest_delegate.py::test_depth_limit,::test_constants) — confirmed pre-existing xdist test-pollution flakes, pass in isolation. Listed in the documented flake set inhermes-agent-perf-workskill.Cluster of perf wins shipped today
perf(cli)defer openai._base_client — -28% / -19MB on every cold start (perf(cli): defer openai._base_client import to cut 240ms / 17MB off every CLI cold start #28864)perf(agent-loop)-47% function calls per turn (perf(agent-loop): cut 47% of per-conversation function calls via 3 targeted hot-path optimizations #28866)perf(compression)defer feasibility check — -170-290ms per invocation (perf(compression): defer feasibility check to first compression attempt — cut 170-290ms off every chat invocation #28957)perf(terminal)adaptive subprocess poll — -195ms PER TOOL CALL, -1s+ per turn (this PR)This one is the one that actually moves the user-perceived 'tool just ran' speed needle.