Skip to content

feat(observability): observer-grade telemetry hooks + NeMo-Relay plugin (salvage #29722)#38190

Closed
kshitijk4poor wants to merge 3 commits into
NousResearch:mainfrom
kshitijk4poor:salvage/nmf-41A-observer-hooks
Closed

feat(observability): observer-grade telemetry hooks + NeMo-Relay plugin (salvage #29722)#38190
kshitijk4poor wants to merge 3 commits into
NousResearch:mainfrom
kshitijk4poor:salvage/nmf-41A-observer-hooks

Conversation

@kshitijk4poor

Copy link
Copy Markdown
Collaborator

Salvages #29722 onto current main. Authorship preserved (Bryan Bednarski's commit), with a small follow-up cleanup commit from me.

What this is

Phase 1 of the NeMo telemetry stack, restructured per the #29722 review into an observer-only PR with an immediate in-repo consumer:

  • Backend-neutral observer hooks for plugins — session, turn, API request, tool, approval, and subagent lifecycle events — with stable correlation IDs (session_id, task_id, turn_id, api_request_id, tool_call_id, parent/child subagent ids). Extends VALID_HOOKS with api_request_error and subagent_start.
  • Bundles the optional NeMo-Relay observability plugin (plugins/observability/nemo_relay) as a real consumer of the new hooks, peer to the existing langfuse plugin. Fails open when the optional nemo-relay package isn't installed.
  • The speculative middleware surface from the original PR was removed (deferred to a follow-up paired with an adaptive consumer).

Review concerns from #29722 — all addressed

  • Per-call overhead, zero plugins registered: payload construction is now gated behind has_hook() presence checks; request payloads return by reference when no middleware rewrites; the sanitized response payload no longer embeds raw response objects; redundant deepcopies removed. Contributor benchmark: no-listener path dropped from ~20 ms → ~0.0004 ms at 5 MB context.
  • Middleware with no consumer: middleware module/registry removed entirely; PR is observer-only with nemo_relay as the in-repo consumer.
  • Schema-version threading: centralized in invoke_hook (setdefault), not ~20 hand-edited call sites.
  • post_tool_call dual-emit: hardcoded set replaced with shared AGENT_RUNTIME_POST_HOOK_TOOL_NAMES frozenset + agent_runtime_owns_post_tool_hook() helper.
  • api_request_id format: docs now state it's opaque — "do not parse its string format."

Follow-up commit (mine)

  • Restored two unrelated trailing blank lines (test_file_tools_cwd_resolution.py, test_tool_search.py) that the original branch incidentally stripped, to keep the salvage scoped to the feature.

Testing

  • Unit: 179 targeted tests pass (plugins, model_tools, session-boundary hooks, nemo_relay plugin suite, the two reverted test files) + full tests/run_agent/ green.
  • E2E (real imports, isolated HERMES_HOME): verified (1) has_hook() returns False with no plugin → hot path skipped; (2) a registered observer fires with turn_id/api_request_id and auto-injected schema version; (3) nemo_relay register() succeeds and all hooks no-op (don't crash) when the nemo-relay package is absent → _get_runtime() returns None.
  • ruff clean on all changed source files.

Attribution

Cherry-picked from #29722 with Bryan Bednarski's authorship preserved in git log. Requires the AUTHOR_MAP chore PR to merge first so contributor_audit.py passes.

Closes #29722

bbednarski9 and others added 2 commits June 3, 2026 17:44
Adds backend-neutral observer hooks for plugins: session, turn, API
request, tool, approval, and subagent lifecycle events with stable
correlation IDs (session_id, task_id, turn_id, api_request_id,
tool_call_id, parent/child subagent ids). Extends VALID_HOOKS with
api_request_error and subagent_start.

Hot path is zero-cost when no plugin subscribes: has_hook()/presence
checks gate all payload construction, request payloads are returned
by reference when no middleware rewrites, and the sanitized response
payload no longer embeds raw response objects.

Bundles the optional NeMo-Relay observability plugin
(plugins/observability/nemo_relay) as an in-repo consumer of the new
hooks, peer to the existing langfuse plugin. Fails open when the
optional nemo-relay package is not installed.

Authored-by: Bryan Bednarski <bbednarski@nvidia.com>
Salvaged from NousResearch#29722 onto current main.
The salvaged PR incidentally stripped a trailing blank line from two
unrelated test files (test_file_tools_cwd_resolution.py,
test_tool_search.py). Restore them to keep the salvage diff scoped to
the observability feature.
@kshitijk4poor kshitijk4poor requested a review from a team June 3, 2026 12:18
@alt-glitch alt-glitch added type/feature New feature or request comp/plugins Plugin system and bundled plugins comp/agent Core agent loop, run_agent.py, prompt builder telemetry Touches outbound telemetry, usage attribution, or analytics — needs opt-in gating before merge P3 Low — cosmetic, nice to have labels Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder comp/plugins Plugin system and bundled plugins P3 Low — cosmetic, nice to have telemetry Touches outbound telemetry, usage attribution, or analytics — needs opt-in gating before merge type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants