feat(middleware): add adaptive middleware to hermes-agent, consumed by NeMo-Relay by bbednarski9 · Pull Request #29724 · NousResearch/hermes-agent

bbednarski9 · 2026-05-21T08:07:04Z

What does this PR do?

This PR adds the middleware half of the Hermes Agent execution-intercept work, stacked on top of the now-merged observability contract from #38232.

#38232 established low-overhead telemetry hooks for watching what Hermes does. This PR adds opt-in middleware intercepts for changing or owning execution:

llm_request and tool_request middleware can inspect and rewrite payloads before execution.
llm_execution and tool_execution middleware can wrap or replace the actual downstream LLM/tool call.
NeMo-Relay is now an in-repo consumer of the execution middleware contract, so the middleware layer is validated against a real adaptive execution use case rather than only test plugins.
The no-listener path is explicitly gated with has_middleware(kind) and returns by reference, so default uninstrumented Hermes runs do not pay request-copy or execution-chain overhead.

The intent is to keep the two layers distinct:

Observability hooks from perf(observability): observer telemetry hooks + NeMo-Relay plugin, gated tool emit (salvage #38190/#29722) #38232 record sessions, turns, API calls, tool calls, approvals, subagents, and errors.
Middleware intercepts from this PR provide the control plane for guardrails, request rewrites, and NeMo-Relay-managed execution.

Related Issue

Builds on the merged observability PR: perf(observability): observer telemetry hooks + NeMo-Relay plugin, gated tool emit (salvage #38190/#29722) #38232
Updates and replaces the previous middleware/adaptive-execution shape from this PR with the restacked delta on top of main.

Type of Change

Changes Made

Middleware Registry And Contract

Added middleware registration and lookup alongside plugin hooks in hermes_cli/plugins.py.
Added has_middleware(kind) so hot-path call sites can cheaply skip middleware work when no consumer is registered.
Added middleware entry points in hermes_cli/middleware.py:
- apply_llm_request_middleware
- apply_tool_request_middleware
- run_llm_execution_middleware
- run_tool_execution_middleware
Kept backward-compatible aliases for the old API request naming where needed.

Request Middleware

llm_request middleware can rewrite provider request payloads before the API call.
tool_request middleware can rewrite tool arguments before tool execution.
No-listener request middleware returns by reference without deepcopy.
The changed flag is based on bool(trace) instead of structural deep-equality comparisons across copied payloads.

Execution Middleware

llm_execution middleware can wrap or replace the downstream LLM call.
tool_execution middleware can wrap or replace the downstream tool call.
The execution chain distinguishes downstream failures from middleware failures with _DownstreamExecutionError, so a failing tool/LLM call is not retried by middleware chaining.
Middleware that raises after a successful next_call() no longer causes the terminal LLM/tool call to execute twice.

Hermes Call Sites

Wired LLM request middleware and LLM execution middleware through agent/conversation_loop.py.
Wired tool request middleware and tool execution middleware through:
- model_tools.py
- agent/tool_executor.py
- agent/agent_runtime_helpers.py
Preserved existing observability hook behavior and hook gating from perf(observability): observer telemetry hooks + NeMo-Relay plugin, gated tool emit (salvage #38190/#29722) #38232.

NeMo-Relay Consumer

Extended plugins/observability/nemo_relay from a telemetry-only observer into a real adaptive execution consumer.
NeMo-Relay can now opt into managed LLM execution through llm_execution.
NeMo-Relay can now opt into managed tool execution through tool_execution.
When managed execution is disabled, the middleware falls through to Hermes with next_call(...).
The plugin still records observer telemetry for sessions, turns, API calls, tools, approvals, subagents, and final ATIF export.

Documentation

Added middleware documentation in docs/middleware/README.md.
Updated the NeMo-Relay plugin README to describe both telemetry hooks and execution middleware.

Tests

Added plugin registry and middleware contract tests.
Added no-listener by-reference fast-path tests.
Added request-change tracing tests.
Added downstream failure tests to verify middleware does not retry failing LLM/tool calls.
Added post-next_call middleware failure tests to verify terminal execution is not repeated.
Added NeMo-Relay plugin tests for the execution middleware paths.
Updated model tool and run-agent tests for the new middleware behavior.

Suggested Review Order

hermes_cli/plugins.py and hermes_cli/middleware.py
- Review the shared middleware contract, registration API, no-listener gates, request middleware result shape, and execution-chain semantics.
Hermes hot-path call sites
- agent/conversation_loop.py
- model_tools.py
- agent/tool_executor.py
- agent/agent_runtime_helpers.py
- Confirm middleware is gated before expensive work and that observability hooks remain separately gated.
NeMo-Relay consumer
- plugins/observability/nemo_relay/__init__.py
- plugins/observability/nemo_relay/README.md
- Confirm managed LLM/tool execution falls through correctly when disabled and captures adaptive execution when enabled.
Tests and docs
- tests/hermes_cli/test_plugins.py
- tests/plugins/test_nemo_relay_plugin.py
- tests/test_model_tools.py
- tests/run_agent/test_run_agent.py
- docs/middleware/README.md

How to Test

Targeted test run used locally:

uv run pytest tests/hermes_cli/test_plugins.py tests/test_model_tools.py tests/run_agent/test_run_agent.py tests/plugins/test_nemo_relay_plugin.py

Result:

491 passed in 90.41s

Useful manual checks:

# Confirm middleware registry / contract tests.
uv run pytest tests/hermes_cli/test_plugins.py

# Confirm model tool middleware behavior.
uv run pytest tests/test_model_tools.py

# Confirm run-agent integration behavior.
uv run pytest tests/run_agent/test_run_agent.py

# Confirm NeMo-Relay observer + execution middleware behavior.
uv run pytest tests/plugins/test_nemo_relay_plugin.py

Checklist

Code

Middleware registration is implemented through the plugin manager.
Middleware presence checks avoid hot-path work when no consumer is registered.
Request middleware returns by reference on the no-listener path.
Execution middleware returns directly to Hermes on the no-listener path.
Request changed state avoids structural deep-equality comparisons.
Downstream execution failures are not retried by the middleware chain.
Middleware failures after next_call() do not re-run the terminal LLM/tool call.
NeMo-Relay is wired as an in-repo execution middleware consumer.
Backward-compatible aliases are preserved where needed.

Documentation & Housekeeping

Middleware documentation is included.
NeMo-Relay documentation is updated for telemetry plus managed execution.
PR is restacked on top of the merged observability work from perf(observability): observer telemetry hooks + NeMo-Relay plugin, gated tool emit (salvage #38190/#29722) #38232.
Targeted tests pass locally.

Screenshots / Logs

No UI screenshots.

Hot-path middleware overhead was checked after the no-listener gates were added:

Payload	Objects	Old modeled `apply_llm`	Current no-listener `apply_llm`	No-listener `run_llm`	No-listener `apply_tool`	No-listener `run_tool`
100 KB	100	0.553 ms	0.000458 ms	0.000250 ms	0.000417 ms	0.000250 ms
1 MB	1024	5.999 ms	0.000458 ms	0.000250 ms	0.000458 ms	0.000291 ms
5 MB	5120	31.002 ms	0.000458 ms	0.000291 ms	0.000458 ms	0.000271 ms
10 MB	10240	64.149 ms	0.000562 ms	0.000291 ms	0.000458 ms	0.000292 ms

Interpretation:

Default no-listener Hermes only pays cheap middleware presence checks.
Request copying and middleware-chain setup are only paid when a middleware consumer is registered.
NeMo-Relay managed execution is an explicit opt-in middleware cost.

alt-glitch · 2026-05-21T08:15:03Z

This is phase 2 of the NeMo-Flow stacked change, depending on the middleware hooks from PR #29722. Supersedes the combined PR #29551. Related: draft #28611, unified telemetry tracking #6642. P3 per plugin-provider-demotion taste rule.

kshitijk4poor · 2026-05-27T19:28:53Z

Blocked on #29722 review — left detailed notes on the observer/middleware split there: #29722 (comment). This PR is unblocked once the observer half of #29722 lands.

kshitijk4poor

Reviewed the middleware half — the llm_execution / tool_execution wiring with NeMo-Relay as a real consumer is exactly what we wanted to see, and it's what closes the "no in-repo consumer" gap from the #29722 round. _DownstreamExecutionError is a good touch too — it correctly keeps a failing tool/LLM call from being retried by the chain.

A few things to sort before this lands, all on the hot path or the shared contract.

Critical

1. Middleware isn't gated on a registered consumer — re-introduces the per-call cost we just removed.

apply_llm_request_middleware runs deepcopy(request) twice on every API call before it ever checks whether any middleware is registered (hermes_cli/middleware.py:67-68), and it's called unconditionally at agent/conversation_loop.py:1231. That's the same per-call cost we gated out of the observability PR — and the pre_api_request hook ~25 lines below it (conversation_loop.py:1256) is already gated with has_hook(), with a comment about not wanting to walk every tool result + base64 image on every call. There's no has_middleware() yet, so all four apply_* / run_* entry points pay this unconditionally. Same on the tool path: apply_tool_request_middleware at model_tools.py:976 and agent/tool_executor.py:197 deepcopies tool args on every tool call.

Fix: add has_middleware(kind) mirroring has_hook(), gate the apply_* / run_* entry points, and return a by-reference no-op result when nothing's registered (same shape as the hook gating in the observability PR).

2. changed = current_request != original_request (hermes_cli/middleware.py:88 and :125) is a full structural deep-equality compare of two deepcopies of the same source — guaranteed False whenever no middleware rewrote the payload. Use changed = bool(trace).

Warning

3. Double-execution if a middleware raises after calling next_call.

_run_execution_chain (hermes_cli/middleware.py:195-236) handles a downstream failure correctly via _DownstreamExecutionError — verified, and test_execution_middleware_does_not_retry_downstream_failure covers it. But if a middleware calls next_call() successfully and then raises in its own post-processing, that exception falls into the generic except Exception at :227, which retries from call_at(index + 1) and re-runs the terminal call — so the tool or LLM call fires twice.

The bundled NeMo-Relay consumer is well-behaved so it won't hit this, but since this is the public contract for arbitrary third-party middleware it's worth closing now. A flag tracking whether next_call already fired in the frame (don't re-run the terminal if so) handles it, plus a test for the post-next_call-raise case alongside the existing downstream one.

Structural (not a code issue, but affects merge)

This PR currently carries the full observability diff layered under the middleware — it touches the same hot-path files as #29722 (conversation_loop.py, model_tools.py, agent_runtime_helpers.py). Once the observability half lands, this will conflict. Recommend rebasing down to just the middleware delta on top of the landed observability — it'll also make the actual change much easier to review (right now the real delta is buried under changes we've already reviewed).

Looks good

llm_execution / tool_execution wiring with a real consumer — validates the contract shape against an actual use case.
_DownstreamExecutionError correctly distinguishes downstream failures from middleware failures for the pre-next_call case.
Back-compat aliases (api_request → llm_request) are clean.
All targeted tests pass.

None of this is hard — items 1 and 2 mostly mirror the gating pattern already in the observability PR. Happy to re-review quickly once these are in.

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

bbednarski9 · 2026-06-03T18:57:08Z

Rebased the PR down to a single middleware/adaptive commit on current main, after the observer half landed in perf(observability): observer telemetry hooks + NeMo-Relay plugin, gated tool emit (salvage #38190/#29722) #38232
Added has_middleware(kind) mirroring has_hook().
Gated apply_* and run_* middleware entry points so the no-listener path returns by reference without deepcopy/callback setup.
Changed request middleware changed tracking to bool(trace).
Fixed the post-next_call middleware exception path so it does not re-run the downstream tool/LLM call.
Added regression coverage for no-listener fast path, changed = bool(trace), and post-next_call exceptions.

Responses to individual comments

1. Middleware isn't gated on a registered consumer

Thanks for pointing this out - I had forgotten to propagate these changes into this PR. Super helpful.

Resolved and measured on the restacked branch.

I added has_middleware(kind) mirroring has_hook() and moved the middleware checks ahead of any deepcopy() or execution-chain setup. With no middleware registered, the request middleware helpers now return by reference:

payload is request / original_payload is request
payload is args / original_payload is args
changed=False
trace=[]

The execution middleware helpers also bypass chain setup and call the downstream callback directly when no callbacks are registered.

I also benchmarked the no-consumer path with chat-shaped payloads. I used many message/tool-result dicts rather than one giant string, because deepcopy() reuses immutable string objects and under-represents traversal cost.

Payload	Objects	Old modeled `apply_llm`	Current no-listener `apply_llm`	No-listener `run_llm`	No-listener `apply_tool`	No-listener `run_tool`
100 KB	100	0.553 ms	0.000458 ms	0.000250 ms	0.000417 ms	0.000250 ms
1 MB	1024	5.999 ms	0.000458 ms	0.000250 ms	0.000458 ms	0.000291 ms
5 MB	5120	31.002 ms	0.000458 ms	0.000291 ms	0.000458 ms	0.000271 ms
10 MB	10240	64.149 ms	0.000562 ms	0.000291 ms	0.000458 ms	0.000292 ms

Interpretation: the default install/no-consumer path is effectively just the registry presence check and is flat with payload size. If request middleware is actually registered, that remains an explicit opt-in behavior-changing path and still pays the defensive copy/traversal cost. I added regression coverage for the no-listener by-reference behavior so this does not regress.

2. Deep equality compare

I changed request middleware result tracking from structural deep equality to trace-based tracking:

changed=bool(trace)

That applies to both LLM request middleware and tool request middleware. This avoids walking the full request/args payload a third time after the defensive copies, and it also reflects the actual middleware contract: a request is considered changed when a middleware callback returned a valid replacement payload and produced a trace entry.

I added regression coverage for the edge case where middleware returns an equal/same-shaped payload. That now reports changed=True because middleware did run and supplied a replacement, without relying on expensive structural comparison.

3. Double-execution if a middleware raises after calling `next_call`

I updated _run_execution_chain to track whether next_call() was already invoked in the current middleware frame. If the middleware raises after a successful downstream call, the generic middleware-error handler now returns the already-computed downstream result instead of retrying call_at(index + 1).

I also added regression coverage for this exact case:

def test_execution_middleware_post_next_call_error_does_not_retry(...)

The test verifies the terminal callback is called exactly once and the original downstream result is preserved.

kshitijk4poor · 2026-06-06T09:53:29Z

Reviewed the restacked commit 2e0c908 end-to-end (read the contract + all four hot-path call sites, ran the 4 targeted suites → 491 passed, plus an adversarial pass with a second reviewer + standalone repros). The three items from my earlier notes are genuinely resolved and each has a real regression test — nice. One new High-severity hole that the post-next_call fix opened up, with a small fix.

✅ Confirmed resolved

Consumer gating — has_middleware(kind) mirrors has_hook(); no-listener path returns by reference (payload is request verified, locked by test_middleware_helpers_skip_no_listener_work). Benchmarks match.
changed = bool(trace) — correct and intentional; can't produce a false changed=False. (Minor: .changed currently has no runtime consumer — only asserted in tests. Fine to leave, just flagging.)
Post-next_call no automatic re-run — the terminal can't run twice on the automatic path; covered by test_execution_middleware_post_next_call_error_does_not_retry.
Also nice: invoke_tool → handle_function_call passes skip_tool_request_middleware=True + forwards the trace, so request middleware doesn't double-apply on the agent-loop path.

🔴 High — downstream failure silently masked as `None` (`hermes_cli/middleware.py:244`, `265-266`)

The post-next_call fix opened the mirror-image bug. next_called is set True at line 244 before the recursion at line 246 can fail, but next_result is only assigned after a successful recursion. So when:

middleware calls next_call(),
the downstream LLM/tool call genuinely raises → next_result stays None, next_called is already True,
middleware catches the _DownstreamExecutionError and raises its own error (normal error-translation pattern),
line 258 catches it, next_called is True → line 266 returns next_result = None.

A real provider/tool failure gets reported as a successful empty result. Reproduced standalone:

RESULT: None
terminal_runs: 1   # terminal ran once and failed; chain returned None instead of raising

This path isn't covered — test_execution_middleware_does_not_retry_downstream_failure only exercises a middleware that lets the error propagate; the catch-and-reraise case is the gap.

Minimal fix — track downstream success, not just invocation:

next_succeeded = False
def next_call(next_payload=None):
    nonlocal next_called, next_succeeded, next_result
    next_called = True
    next_result = call_at(index + 1, payload if next_payload is None else next_payload)
    next_succeeded = True
    return next_result
...
except Exception as exc:
    logger.warning(...)
    if next_succeeded:          # was: if next_called
        return next_result
    if next_called:             # downstream failed + middleware re-raised → propagate
        raise
    return call_at(index + 1, payload)

Worth a regression test where the downstream raises and the middleware catches-then-reraises (asserts the error propagates, not None).

🟡 Optional (non-blocking)

next_call isn't single-use — a middleware that deliberately calls next_call() twice runs the terminal twice. Doesn't violate the "no automatic retry" claim (that holds), but if the intended contract is "terminal executes at most once," it needs a single-use guard.
deepcopy of the LLM request (lines 75-76, 90) — unsafe if api_kwargs ever carries non-deepcopyable/stateful objects (clients, callbacks, file handles). Guarded by the caller try/except against hard failures, but a silent client-clone would slip through. Tool path is safe (plain JSON args). A deepcopy-safety comment, or shallow-copy of the known-mutated keys, would harden it.

Doesn't strictly block merge today — the High trigger needs a catch-and-reraise consumer, and the only in-repo consumer (NeMo-Relay) calls its own captured next_call rather than the chain wrapper. But it's a real latent hole in a brand-new control-plane contract and the fix is ~4 lines + a test, so I'd rather get it right now than paper over it later. Everything else looks good.

Track successful next_call completion separately from invocation so execution middleware that catches and translates a downstream provider/tool failure does not accidentally convert that failure into a successful None result. Also avoid wrapping BaseException from downstream execution, and document the execution middleware error semantics. Tests cover: - pre-next_call middleware failures fail open to the remaining chain - post-next_call middleware failures preserve the downstream result - translated downstream failures propagate instead of returning None - downstream BaseException is not wrapped Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

bbednarski9 · 2026-06-06T16:43:57Z

@kshitijk4poor Thanks for the follow-up. I addressed your comments and added some additional test cases and documentation.

What changed:

_run_execution_chain now tracks next_succeeded separately from next_called.
next_succeeded is set only after downstream execution returns successfully.
If middleware calls next_call(), downstream fails, and the middleware catches/translates that failure, Hermes now propagates the translated error instead of returning None.
If middleware calls next_call() successfully and later raises during post-processing, Hermes still preserves the downstream result and does not re-run the provider/tool.
next_call() now wraps ordinary Exception downstream failures, not BaseException, so interrupts/cancellation-style base exceptions are not accidentally converted into normal middleware errors.

I added regression coverage for the edge cases around this contract:

test_execution_middleware_translated_downstream_failure_is_not_masked
test_execution_middleware_pre_next_call_error_fails_open_to_remaining_chain
test_execution_middleware_downstream_base_exception_is_not_wrapped

I also updated docs/middleware/README.md to spell out the execution middleware error semantics for plugin authors:

pre-next_call middleware failures fail open to the remaining chain/base execution;
post-success middleware failures preserve the downstream result and do not re-run execution;
downstream provider/tool failures can propagate or be deliberately translated, but are not converted into a successful None result.

Validation:

99 passed in 3.60s

Targeted command:

.venv/bin/python -m pytest tests/hermes_cli/test_plugins.py tests/plugins/test_nemo_relay_plugin.py -q

…opies Address the two non-blocking follow-ups from review: - next_call is now single-use per middleware frame. A second invocation raises instead of silently re-running the downstream provider/tool, so the terminal call cannot execute twice via the chain. The error surfaces through the existing handler, which preserves the first downstream result. - Request-middleware payload copies go through _safe_copy(), which falls back to a shallow dict copy when deepcopy() fails on a non-deepcopyable member (clients, callbacks, file handles) instead of aborting the pass. Adds regression coverage for both: double next_call() keeps the terminal single-run, and a non-deepcopyable (threading.Lock) request payload still runs middleware via the shallow fallback.

kshitijk4poor

Picked up the two non-blocking follow-ups myself and pushed them onto the branch (c4c5548, fast-forward on top of 5abe456):

next_call is now single-use per middleware frame — a second invocation raises instead of silently re-running the downstream provider/tool, so the terminal call can't execute twice via the chain. The error surfaces through the existing handler, which preserves the first (successful) downstream result.
Request-middleware copies go through _safe_copy() — falls back to a shallow dict copy when deepcopy() fails on a non-deepcopyable member (clients, callbacks, file handles) instead of aborting the request-middleware pass. Tool args stay plain JSON so they're unaffected; this just hardens the LLM-request path.

Added regression coverage for both:

test_execution_middleware_double_next_call_does_not_run_terminal_twice — double next_call() keeps the terminal single-run and returns the first result.
test_request_middleware_tolerates_non_deepcopyable_payload — a threading.Lock in the payload (which deepcopy can't pickle) still runs middleware via the shallow fallback, with the lock shared by reference.

Verified locally — full targeted suite green:

496 passed
(scripts/run_tests.sh tests/hermes_cli/test_plugins.py tests/test_model_tools.py tests/run_agent/test_run_agent.py tests/plugins/test_nemo_relay_plugin.py)

The earlier High (translated-downstream-failure masked as None) is fixed and verified across all 7 execution-chain paths, the new error semantics are documented, and the two optional items are now closed too. Approving. 🚢

bbednarski9 · 2026-06-06T17:41:33Z

Thanks @kshitijk4poor

alt-glitch mentioned this pull request May 21, 2026

feat(observability): add observer-grade telemetry hooks #29722

Closed

19 tasks

ai-ag2026 mentioned this pull request May 21, 2026

feat: add delegate pre-spawn guard #23177

Closed

bbednarski9 force-pushed the bbednarski/nmf-41B-nemoflow-plugin branch from 4f5be2f to d93e059 Compare May 21, 2026 18:57

bbednarski9 changed the title ~~feat(observability): add optional NeMo-Flow telemetry plugin~~ feat(observability): add optional NeMo-Relay telemetry plugin May 22, 2026

bbednarski9 changed the title ~~feat(observability): add optional NeMo-Relay telemetry plugin~~ feat(observability): add optional NeMo-Flow telemetry plugin May 22, 2026

bbednarski9 force-pushed the bbednarski/nmf-41B-nemoflow-plugin branch from 520bd9e to c595a54 Compare May 22, 2026 18:23

bbednarski9 marked this pull request as ready for review May 22, 2026 18:27

This was referenced May 26, 2026

Phoenix LLM spans now show with real prompts and completions NVIDIA/nemoclaw-community#17

Merged

Tidy up Hermes NeMo-Relay integration NVIDIA/nemoclaw-community#18

Merged

bbednarski9 mentioned this pull request May 29, 2026

Bbednarski/nmf 41 hermes middleware nemoflow optional #29551

Closed

19 tasks

bbednarski9 force-pushed the bbednarski/nmf-41B-nemoflow-plugin branch from c595a54 to d9ac183 Compare June 2, 2026 00:06

bbednarski9 requested a review from a team June 2, 2026 00:06

bbednarski9 changed the title ~~feat(observability): add optional NeMo-Flow telemetry plugin~~ feat(middleware): add adaptive middleware to hermes-agent, consumed by NeMo-Relay Jun 2, 2026

bbednarski9 force-pushed the bbednarski/nmf-41B-nemoflow-plugin branch from b24a720 to 0367582 Compare June 2, 2026 21:44

kshitijk4poor requested changes Jun 3, 2026

View reviewed changes

feat(middleware): add adaptive execution intercepts

2e0c908

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

bbednarski9 force-pushed the bbednarski/nmf-41B-nemoflow-plugin branch from ec1285f to 2e0c908 Compare June 3, 2026 18:27

bbednarski9 requested a review from kshitijk4poor June 4, 2026 05:51

kshitijk4poor approved these changes Jun 6, 2026

View reviewed changes

kshitijk4poor enabled auto-merge June 6, 2026 17:39

kshitijk4poor merged commit d4a7bfd into NousResearch:main Jun 6, 2026
30 of 31 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(middleware): add adaptive middleware to hermes-agent, consumed by NeMo-Relay#29724

feat(middleware): add adaptive middleware to hermes-agent, consumed by NeMo-Relay#29724
kshitijk4poor merged 3 commits into
NousResearch:mainfrom
bbednarski9:bbednarski/nmf-41B-nemoflow-plugin

bbednarski9 commented May 21, 2026 •

edited

Loading

Uh oh!

alt-glitch commented May 21, 2026

Uh oh!

kshitijk4poor commented May 27, 2026

Uh oh!

kshitijk4poor left a comment

Uh oh!

bbednarski9 commented Jun 3, 2026

Uh oh!

kshitijk4poor commented Jun 6, 2026

Uh oh!

bbednarski9 commented Jun 6, 2026

Uh oh!

kshitijk4poor left a comment

Uh oh!

bbednarski9 commented Jun 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bbednarski9 commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Related Issue

Type of Change

Changes Made

Middleware Registry And Contract

Request Middleware

Execution Middleware

Hermes Call Sites

NeMo-Relay Consumer

Documentation

Tests

Suggested Review Order

How to Test

Checklist

Code

Documentation & Housekeeping

Screenshots / Logs

Uh oh!

alt-glitch commented May 21, 2026

Uh oh!

kshitijk4poor commented May 27, 2026

Uh oh!

kshitijk4poor left a comment

Choose a reason for hiding this comment

Critical

Warning

Structural (not a code issue, but affects merge)

Looks good

Uh oh!

bbednarski9 commented Jun 3, 2026

Responses to individual comments

1. Middleware isn't gated on a registered consumer

2. Deep equality compare

3. Double-execution if a middleware raises after calling next_call

Uh oh!

kshitijk4poor commented Jun 6, 2026

✅ Confirmed resolved

🔴 High — downstream failure silently masked as None (hermes_cli/middleware.py:244, 265-266)

🟡 Optional (non-blocking)

Uh oh!

bbednarski9 commented Jun 6, 2026

Uh oh!

kshitijk4poor left a comment

Choose a reason for hiding this comment

Uh oh!

bbednarski9 commented Jun 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bbednarski9 commented May 21, 2026 •

edited

Loading

3. Double-execution if a middleware raises after calling `next_call`

🔴 High — downstream failure silently masked as `None` (`hermes_cli/middleware.py:244`, `265-266`)