Skip to content

fix: word-wrap spinner, interruptable agent join, and delegate_task interrupt#10940

Merged
teknium1 merged 2 commits into
mainfrom
hermes/hermes-6f9c5b0f
Apr 16, 2026
Merged

fix: word-wrap spinner, interruptable agent join, and delegate_task interrupt#10940
teknium1 merged 2 commits into
mainfrom
hermes/hermes-6f9c5b0f

Conversation

@teknium1

@teknium1 teknium1 commented Apr 16, 2026

Copy link
Copy Markdown
Contributor

Summary

Three fixes across cli.py and tools/delegate_tool.py (94 lines added, 17 removed).

1. Spinner clips long tool commands (cli.py)

The prompt_toolkit spinner widget showed running tool commands in a Window(height=1, wrap_lines=False). Text beyond terminal width was silently clipped.

Fix: wrap_lines=True on the Window, with _spinner_widget_height() computing dynamic height from text length / terminal width. Long commands wrap naturally across multiple lines.

2. CLI hangs after interrupt (cli.py)

agent_thread.join() blocked indefinitely after an interrupt. If the agent thread took time to clean up, the process_loop thread froze.

Fix: After an interrupt, poll with join(timeout=0.2) checking _should_exit each iteration so double Ctrl+C breaks out immediately. Normal completion uses a generous join(timeout=30).

3. Root cause: delegate_task() blocks forever on stuck children (tools/delegate_tool.py)

This was the actual 5-hour hang. Found via interrupt_debug.log — the last event before the freeze was "STOP RUNNING BROADER TASK SUITES" with 3 children. Zero agent.log entries for the next ~4 hours.

delegate_task() used as_completed(futures) with no interrupt check. When the parent agent received an interrupt, it propagated to children via child.interrupt(), but then waited indefinitely for all futures to complete. If even one child thread got stuck (hung API call, stalled connection), as_completed() never yielded and the parent froze inside the ThreadPoolExecutor.

Fix: Replace as_completed() with wait(pending, timeout=0.5, return_when=FIRST_COMPLETED) in a loop that checks parent_agent._interrupt_requested each iteration. When interrupted, collects completed results and reports remaining tasks as interrupted. Parent returns immediately.

Test plan

  • 548 tests pass (5 pre-existing failures unrelated)
  • E2E verified subprocess interrupt works at tool level
  • Delegate mock tests pass including batch mode

…er (#10300)

detect_provider_for_model() silently remapped models to OpenRouter when
the direct provider's credentials weren't found via env vars. Three bugs:

1. Credential check only looked at env vars from PROVIDER_REGISTRY,
   missing credential pool entries, auth store, and OAuth tokens
2. When env var check failed, silently returned ('openrouter', slug)
   instead of the direct provider the model actually belongs to
3. Users with valid credentials via non-env-var mechanisms (pool,
   OAuth, Claude Code tokens) got silently rerouted

Fix:
- Expand credential check to also query credential pool and auth store
- Always return the direct provider match regardless of credential
  status -- let client init handle missing creds with a clear error
  rather than silently routing through the wrong provider

Same philosophy as the provider-required fix: don't guess, don't
silently reroute, error clearly when something is missing.

Closes #10300
@github-actions

Copy link
Copy Markdown
Contributor

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: Install hook files modified

These files can execute code during package installation or interpreter startup.

Files:

hermes_cli/setup.py

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

…nterrupt

Three fixes:

1. Spinner widget clips long tool commands — prompt_toolkit Window had
   height=1 and wrap_lines=False. Now uses wrap_lines=True with dynamic
   height from text length / terminal width. Long commands wrap naturally.

2. agent_thread.join() blocked forever after interrupt — if the agent
   thread took time to clean up, the process_loop thread froze. Now polls
   with 0.2s timeout on the interrupt path, checking _should_exit so
   double Ctrl+C breaks out immediately.

3. Root cause of 5-hour CLI hang: delegate_task() used as_completed()
   with no interrupt check. When subagent children got stuck, the parent
   blocked forever inside the ThreadPoolExecutor. Now polls with
   wait(timeout=0.5) and checks parent_agent._interrupt_requested each
   iteration. Stuck children are reported as interrupted, and the parent
   returns immediately.
@teknium1 teknium1 force-pushed the hermes/hermes-6f9c5b0f branch from c8bc895 to 6f619a2 Compare April 16, 2026 10:05
@github-actions

Copy link
Copy Markdown
Contributor

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: Install hook files modified

These files can execute code during package installation or interpreter startup.

Files:

hermes_cli/setup.py

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

@teknium1 teknium1 changed the title fix: word-wrap spinner display and make agent thread join interruptable fix: word-wrap spinner, interruptable agent join, and delegate_task interrupt Apr 16, 2026
@teknium1 teknium1 merged commit e66b373 into main Apr 16, 2026
6 of 7 checks passed
@teknium1 teknium1 deleted the hermes/hermes-6f9c5b0f branch April 16, 2026 10:50
shuv1337 added a commit to shuv1337/hermes-agent that referenced this pull request Apr 16, 2026
Upstream's QQ platform hint (added in the salvaged PR NousResearch#10935/NousResearch#10940 block
that landed during this merge) re-introduced the 'MEDIA:/absolute/path/to/file'
literal that our prior commit 03f8a5e fixed for WeCom. The
ANTHROPIC_OAUTH_BLOCKED_LITERALS guard test flagged it.

Same fix pattern: 'absolute' -> 'full' to break tokenization.
ulasbilgen pushed a commit to ulasbilgen/hermes-adhd-agent that referenced this pull request May 1, 2026
…nterrupt (NousResearch#10940)

* fix: stop /model from silently rerouting direct providers to OpenRouter (NousResearch#10300)

detect_provider_for_model() silently remapped models to OpenRouter when
the direct provider's credentials weren't found via env vars. Three bugs:

1. Credential check only looked at env vars from PROVIDER_REGISTRY,
   missing credential pool entries, auth store, and OAuth tokens
2. When env var check failed, silently returned ('openrouter', slug)
   instead of the direct provider the model actually belongs to
3. Users with valid credentials via non-env-var mechanisms (pool,
   OAuth, Claude Code tokens) got silently rerouted

Fix:
- Expand credential check to also query credential pool and auth store
- Always return the direct provider match regardless of credential
  status -- let client init handle missing creds with a clear error
  rather than silently routing through the wrong provider

Same philosophy as the provider-required fix: don't guess, don't
silently reroute, error clearly when something is missing.

Closes NousResearch#10300

* fix: word-wrap spinner, interruptable agent join, and delegate_task interrupt

Three fixes:

1. Spinner widget clips long tool commands — prompt_toolkit Window had
   height=1 and wrap_lines=False. Now uses wrap_lines=True with dynamic
   height from text length / terminal width. Long commands wrap naturally.

2. agent_thread.join() blocked forever after interrupt — if the agent
   thread took time to clean up, the process_loop thread froze. Now polls
   with 0.2s timeout on the interrupt path, checking _should_exit so
   double Ctrl+C breaks out immediately.

3. Root cause of 5-hour CLI hang: delegate_task() used as_completed()
   with no interrupt check. When subagent children got stuck, the parent
   blocked forever inside the ThreadPoolExecutor. Now polls with
   wait(timeout=0.5) and checks parent_agent._interrupt_requested each
   iteration. Stuck children are reported as interrupted, and the parent
   returns immediately.
aj-nt pushed a commit to aj-nt/hermes-agent that referenced this pull request May 1, 2026
…nterrupt (NousResearch#10940)

* fix: stop /model from silently rerouting direct providers to OpenRouter (NousResearch#10300)

detect_provider_for_model() silently remapped models to OpenRouter when
the direct provider's credentials weren't found via env vars. Three bugs:

1. Credential check only looked at env vars from PROVIDER_REGISTRY,
   missing credential pool entries, auth store, and OAuth tokens
2. When env var check failed, silently returned ('openrouter', slug)
   instead of the direct provider the model actually belongs to
3. Users with valid credentials via non-env-var mechanisms (pool,
   OAuth, Claude Code tokens) got silently rerouted

Fix:
- Expand credential check to also query credential pool and auth store
- Always return the direct provider match regardless of credential
  status -- let client init handle missing creds with a clear error
  rather than silently routing through the wrong provider

Same philosophy as the provider-required fix: don't guess, don't
silently reroute, error clearly when something is missing.

Closes NousResearch#10300

* fix: word-wrap spinner, interruptable agent join, and delegate_task interrupt

Three fixes:

1. Spinner widget clips long tool commands — prompt_toolkit Window had
   height=1 and wrap_lines=False. Now uses wrap_lines=True with dynamic
   height from text length / terminal width. Long commands wrap naturally.

2. agent_thread.join() blocked forever after interrupt — if the agent
   thread took time to clean up, the process_loop thread froze. Now polls
   with 0.2s timeout on the interrupt path, checking _should_exit so
   double Ctrl+C breaks out immediately.

3. Root cause of 5-hour CLI hang: delegate_task() used as_completed()
   with no interrupt check. When subagent children got stuck, the parent
   blocked forever inside the ThreadPoolExecutor. Now polls with
   wait(timeout=0.5) and checks parent_agent._interrupt_requested each
   iteration. Stuck children are reported as interrupted, and the parent
   returns immediately.
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
…nterrupt (NousResearch#10940)

* fix: stop /model from silently rerouting direct providers to OpenRouter (NousResearch#10300)

detect_provider_for_model() silently remapped models to OpenRouter when
the direct provider's credentials weren't found via env vars. Three bugs:

1. Credential check only looked at env vars from PROVIDER_REGISTRY,
   missing credential pool entries, auth store, and OAuth tokens
2. When env var check failed, silently returned ('openrouter', slug)
   instead of the direct provider the model actually belongs to
3. Users with valid credentials via non-env-var mechanisms (pool,
   OAuth, Claude Code tokens) got silently rerouted

Fix:
- Expand credential check to also query credential pool and auth store
- Always return the direct provider match regardless of credential
  status -- let client init handle missing creds with a clear error
  rather than silently routing through the wrong provider

Same philosophy as the provider-required fix: don't guess, don't
silently reroute, error clearly when something is missing.

Closes NousResearch#10300

* fix: word-wrap spinner, interruptable agent join, and delegate_task interrupt

Three fixes:

1. Spinner widget clips long tool commands — prompt_toolkit Window had
   height=1 and wrap_lines=False. Now uses wrap_lines=True with dynamic
   height from text length / terminal width. Long commands wrap naturally.

2. agent_thread.join() blocked forever after interrupt — if the agent
   thread took time to clean up, the process_loop thread froze. Now polls
   with 0.2s timeout on the interrupt path, checking _should_exit so
   double Ctrl+C breaks out immediately.

3. Root cause of 5-hour CLI hang: delegate_task() used as_completed()
   with no interrupt check. When subagent children got stuck, the parent
   blocked forever inside the ThreadPoolExecutor. Now polls with
   wait(timeout=0.5) and checks parent_agent._interrupt_requested each
   iteration. Stuck children are reported as interrupted, and the parent
   returns immediately.
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…nterrupt (NousResearch#10940)

* fix: stop /model from silently rerouting direct providers to OpenRouter (NousResearch#10300)

detect_provider_for_model() silently remapped models to OpenRouter when
the direct provider's credentials weren't found via env vars. Three bugs:

1. Credential check only looked at env vars from PROVIDER_REGISTRY,
   missing credential pool entries, auth store, and OAuth tokens
2. When env var check failed, silently returned ('openrouter', slug)
   instead of the direct provider the model actually belongs to
3. Users with valid credentials via non-env-var mechanisms (pool,
   OAuth, Claude Code tokens) got silently rerouted

Fix:
- Expand credential check to also query credential pool and auth store
- Always return the direct provider match regardless of credential
  status -- let client init handle missing creds with a clear error
  rather than silently routing through the wrong provider

Same philosophy as the provider-required fix: don't guess, don't
silently reroute, error clearly when something is missing.

Closes NousResearch#10300

* fix: word-wrap spinner, interruptable agent join, and delegate_task interrupt

Three fixes:

1. Spinner widget clips long tool commands — prompt_toolkit Window had
   height=1 and wrap_lines=False. Now uses wrap_lines=True with dynamic
   height from text length / terminal width. Long commands wrap naturally.

2. agent_thread.join() blocked forever after interrupt — if the agent
   thread took time to clean up, the process_loop thread froze. Now polls
   with 0.2s timeout on the interrupt path, checking _should_exit so
   double Ctrl+C breaks out immediately.

3. Root cause of 5-hour CLI hang: delegate_task() used as_completed()
   with no interrupt check. When subagent children got stuck, the parent
   blocked forever inside the ThreadPoolExecutor. Now polls with
   wait(timeout=0.5) and checks parent_agent._interrupt_requested each
   iteration. Stuck children are reported as interrupted, and the parent
   returns immediately.
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
…nterrupt (NousResearch#10940)

* fix: stop /model from silently rerouting direct providers to OpenRouter (NousResearch#10300)

detect_provider_for_model() silently remapped models to OpenRouter when
the direct provider's credentials weren't found via env vars. Three bugs:

1. Credential check only looked at env vars from PROVIDER_REGISTRY,
   missing credential pool entries, auth store, and OAuth tokens
2. When env var check failed, silently returned ('openrouter', slug)
   instead of the direct provider the model actually belongs to
3. Users with valid credentials via non-env-var mechanisms (pool,
   OAuth, Claude Code tokens) got silently rerouted

Fix:
- Expand credential check to also query credential pool and auth store
- Always return the direct provider match regardless of credential
  status -- let client init handle missing creds with a clear error
  rather than silently routing through the wrong provider

Same philosophy as the provider-required fix: don't guess, don't
silently reroute, error clearly when something is missing.

Closes NousResearch#10300

* fix: word-wrap spinner, interruptable agent join, and delegate_task interrupt

Three fixes:

1. Spinner widget clips long tool commands — prompt_toolkit Window had
   height=1 and wrap_lines=False. Now uses wrap_lines=True with dynamic
   height from text length / terminal width. Long commands wrap naturally.

2. agent_thread.join() blocked forever after interrupt — if the agent
   thread took time to clean up, the process_loop thread froze. Now polls
   with 0.2s timeout on the interrupt path, checking _should_exit so
   double Ctrl+C breaks out immediately.

3. Root cause of 5-hour CLI hang: delegate_task() used as_completed()
   with no interrupt check. When subagent children got stuck, the parent
   blocked forever inside the ThreadPoolExecutor. Now polls with
   wait(timeout=0.5) and checks parent_agent._interrupt_requested each
   iteration. Stuck children are reported as interrupted, and the parent
   returns immediately.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant