Skip to content

feat(agent): allow pre_api_request hook to inject request-scoped context#9914

Open
devteapot wants to merge 1 commit into
NousResearch:mainfrom
devteapot:feat/request-scoped-context-hook
Open

feat(agent): allow pre_api_request hook to inject request-scoped context#9914
devteapot wants to merge 1 commit into
NousResearch:mainfrom
devteapot:feat/request-scoped-context-hook

Conversation

@devteapot

@devteapot devteapot commented Apr 14, 2026

Copy link
Copy Markdown

What & Why

The pre_api_request hook currently exists as a fire-and-forget observer: in run_agent.py, its call site is wrapped in try / except: pass, so any return values are discarded. This means plugins cannot refresh the model's context between retries inside the tool loop — they have to fall back to pre_llm_call, which fires only once per turn.

That gap forces per-request context (live browser snapshots, dynamic app/state trees, progress hints, request-scoped guardrails) to go stale the moment the agent enters a multi-step tool loop. A snapshot attached at the start of the turn can easily be minutes and many tool calls out of date by the time the second or third provider request goes out.

This PR completes the hook by collecting its return values the same way pre_llm_call already does and appending the resulting context to the current turn's user message on the outbound request copy only. The injection is rebuilt from base_api_messages on every retry, so each attempt sees a fresh snapshot and nothing leaks into session persistence or trajectory logs.

The hook callback now also receives user_message, conversation_history, and is_first_turn so plugins can make injection decisions without reaching into private agent state.

Changes

  • run_agent.py
    • Extract _collect_ephemeral_hook_context() and _append_ephemeral_context_to_user_message() helpers, and route the existing pre_llm_call injection through them.
    • Split the API-messages build into a turn-scoped base_api_messages and a per-attempt api_messages = copy.deepcopy(base_api_messages) inside the retry loop.
    • Invoke pre_api_request per attempt with the new kwargs, collect its return value, and append the context to the current turn's user message on that attempt's copy.
    • Re-apply Anthropic cache-control, sanitization, and request-size metrics per attempt so downstream code sees consistent state after injection.
  • hermes_cli/plugins.py — docstring: note that pre_api_request (alongside pre_llm_call) may return a {"context": "..."} dict.
  • tests/run_agent/test_run_agent.py
    • New test_pre_api_request_can_inject_request_scoped_context covering two requests in a single turn (first request triggers a tool call, second request gets a distinct injected string). Asserts both strings land in the right request and do not bleed across.
    • Extend the existing test_pre_api_request_and_post_api_request_fire_per_request with assertions on the new user_message, conversation_history, and is_first_turn kwargs.
  • Docswebsite/docs/user-guide/features/hooks.md, website/docs/user-guide/features/plugins.md, website/docs/guides/build-a-hermes-plugin.md:
    • Add pre_api_request / post_api_request to the quick-reference tables.
    • Document the full pre_api_request callback signature, parameters, fire semantics, return contract, injection site, and a "live state snapshot" example.
    • Update the "all hooks are fire-and-forget" caveat to call out both pre_llm_call and pre_api_request as context-injecting.

How to test

uv run pytest tests/run_agent/ -o addopts=""

Targeted:

uv run pytest tests/run_agent/test_run_agent.py::TestRunConversation \
  -k "pre_api_request or pre_llm_call" -o addopts=""

Manual smoke test — register a plugin with:

def inject_live_state(api_call_count, conversation_history, **kwargs):
    return {"context": f"Snapshot for request {api_call_count} "
                       f"(history len={len(conversation_history)})"}

def register(ctx):
    ctx.register_hook("pre_api_request", inject_live_state)

Then run any multi-step task and confirm each outbound request's user message ends with a snapshot whose api_call_count matches that attempt.

Platforms tested

  • macOS 15 / Python 3.11.15 — uv run pytest tests/run_agent/670 passed, 6 skipped.

No changes to file I/O, process management, terminal handling, or shell commands, so Windows/Linux behavior should be unaffected. Happy to run on another platform if requested.

Backwards compatibility

  • pre_api_request callbacks that return None (or any non-dict/non-string value) behave exactly as before — nothing is injected.
  • Existing pre_llm_call context-injection path is unchanged; the helpers it was inlining are now extracted but the semantics are identical.
  • run_agent.py's retry loop still rebuilds api_kwargs via _build_api_kwargs and still runs _preflight_codex_api_kwargs when api_mode == "codex_responses".

Related

  • Follow-up to the pre_api_request / post_api_request hook introduction — the hook name and kwargs are already in place; this PR makes the pre_ half actually useful for context injection.
  • Unblocks the slop-hermes plugin, which needs a fresh SLOP state snapshot on every retry.

The pre_api_request hook was introduced as a fire-and-forget observer:
its return values were caught and discarded, so plugins could not refresh
the model's context between retries inside the tool loop. pre_llm_call
fires only once per turn, which forces per-request state (live browser
snapshots, dynamic app trees, progress hints) to go stale.

This completes the hook by collecting its return values the same way
pre_llm_call already does and appending the resulting context to the
current turn's user message on the outbound request copy only. The
injection is rebuilt from base_api_messages on every retry, so each
attempt sees a fresh snapshot and nothing leaks into session persistence.

The hook now also receives user_message, conversation_history, and
is_first_turn so plugins can make injection decisions without reaching
into private agent state.

- Extract _collect_ephemeral_hook_context and
  _append_ephemeral_context_to_user_message helpers (shared with the
  existing pre_llm_call path).
- Split the api_messages build into a turn-scoped base and a per-attempt
  copy; re-apply Anthropic cache control and sanitization per attempt.
- Update the pre_api_request docstring, hooks.md, plugins.md, and the
  build-a-hermes-plugin guide to document the new contract.
- Add test_pre_api_request_can_inject_request_scoped_context covering
  two requests in a single turn with distinct injected contexts, and
  extend the existing per-request hook test to assert the new kwargs.

Tested on macOS 15 with pytest tests/run_agent/ (670 passed, 6 skipped).
@devteapot devteapot force-pushed the feat/request-scoped-context-hook branch from a87e138 to 5e87b50 Compare April 14, 2026 23:25
@alt-glitch alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have comp/plugins Plugin system and bundled plugins comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder comp/plugins Plugin system and bundled plugins P3 Low — cosmetic, nice to have type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants