feat(agent): allow pre_api_request hook to inject request-scoped context#9914
Open
devteapot wants to merge 1 commit into
Open
feat(agent): allow pre_api_request hook to inject request-scoped context#9914devteapot wants to merge 1 commit into
devteapot wants to merge 1 commit into
Conversation
The pre_api_request hook was introduced as a fire-and-forget observer: its return values were caught and discarded, so plugins could not refresh the model's context between retries inside the tool loop. pre_llm_call fires only once per turn, which forces per-request state (live browser snapshots, dynamic app trees, progress hints) to go stale. This completes the hook by collecting its return values the same way pre_llm_call already does and appending the resulting context to the current turn's user message on the outbound request copy only. The injection is rebuilt from base_api_messages on every retry, so each attempt sees a fresh snapshot and nothing leaks into session persistence. The hook now also receives user_message, conversation_history, and is_first_turn so plugins can make injection decisions without reaching into private agent state. - Extract _collect_ephemeral_hook_context and _append_ephemeral_context_to_user_message helpers (shared with the existing pre_llm_call path). - Split the api_messages build into a turn-scoped base and a per-attempt copy; re-apply Anthropic cache control and sanitization per attempt. - Update the pre_api_request docstring, hooks.md, plugins.md, and the build-a-hermes-plugin guide to document the new contract. - Add test_pre_api_request_can_inject_request_scoped_context covering two requests in a single turn with distinct injected contexts, and extend the existing per-request hook test to assert the new kwargs. Tested on macOS 15 with pytest tests/run_agent/ (670 passed, 6 skipped).
a87e138 to
5e87b50
Compare
19 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What & Why
The
pre_api_requesthook currently exists as a fire-and-forget observer: inrun_agent.py, its call site is wrapped intry / except: pass, so any return values are discarded. This means plugins cannot refresh the model's context between retries inside the tool loop — they have to fall back topre_llm_call, which fires only once per turn.That gap forces per-request context (live browser snapshots, dynamic app/state trees, progress hints, request-scoped guardrails) to go stale the moment the agent enters a multi-step tool loop. A snapshot attached at the start of the turn can easily be minutes and many tool calls out of date by the time the second or third provider request goes out.
This PR completes the hook by collecting its return values the same way
pre_llm_callalready does and appending the resulting context to the current turn's user message on the outbound request copy only. The injection is rebuilt frombase_api_messageson every retry, so each attempt sees a fresh snapshot and nothing leaks into session persistence or trajectory logs.The hook callback now also receives
user_message,conversation_history, andis_first_turnso plugins can make injection decisions without reaching into private agent state.Changes
run_agent.py_collect_ephemeral_hook_context()and_append_ephemeral_context_to_user_message()helpers, and route the existingpre_llm_callinjection through them.base_api_messagesand a per-attemptapi_messages = copy.deepcopy(base_api_messages)inside the retry loop.pre_api_requestper attempt with the new kwargs, collect its return value, and append the context to the current turn's user message on that attempt's copy.hermes_cli/plugins.py— docstring: note thatpre_api_request(alongsidepre_llm_call) may return a{"context": "..."}dict.tests/run_agent/test_run_agent.pytest_pre_api_request_can_inject_request_scoped_contextcovering two requests in a single turn (first request triggers a tool call, second request gets a distinct injected string). Asserts both strings land in the right request and do not bleed across.test_pre_api_request_and_post_api_request_fire_per_requestwith assertions on the newuser_message,conversation_history, andis_first_turnkwargs.website/docs/user-guide/features/hooks.md,website/docs/user-guide/features/plugins.md,website/docs/guides/build-a-hermes-plugin.md:pre_api_request/post_api_requestto the quick-reference tables.pre_api_requestcallback signature, parameters, fire semantics, return contract, injection site, and a "live state snapshot" example.pre_llm_callandpre_api_requestas context-injecting.How to test
uv run pytest tests/run_agent/ -o addopts=""Targeted:
Manual smoke test — register a plugin with:
Then run any multi-step task and confirm each outbound request's user message ends with a snapshot whose
api_call_countmatches that attempt.Platforms tested
uv run pytest tests/run_agent/→ 670 passed, 6 skipped.No changes to file I/O, process management, terminal handling, or shell commands, so Windows/Linux behavior should be unaffected. Happy to run on another platform if requested.
Backwards compatibility
pre_api_requestcallbacks that returnNone(or any non-dict/non-string value) behave exactly as before — nothing is injected.pre_llm_callcontext-injection path is unchanged; the helpers it was inlining are now extracted but the semantics are identical.run_agent.py's retry loop still rebuildsapi_kwargsvia_build_api_kwargsand still runs_preflight_codex_api_kwargswhenapi_mode == "codex_responses".Related
pre_api_request/post_api_requesthook introduction — the hook name and kwargs are already in place; this PR makes thepre_half actually useful for context injection.slop-hermesplugin, which needs a fresh SLOP state snapshot on every retry.