fix(api): strip ALL leading-underscore internal markers before API send (closes #110) by PowerCreek · Pull Request #111 · TechDevGroup/hermes-agent

PowerCreek · 2026-05-26T23:20:41Z

Critical fix — closes #110. PR #108's synthetic recovery markers leaked to upstream API, Groq's strict validation rejected with HTTP 400 cascading on every retry. Replaces individual _thinking_prefill strip with generic underscore-prefix strip. See #110 body for full context. 28 tests green. v0.17.2 wheel will rebuild after merge.

…nd (closes #110) Strict OpenAI-compat validators (Groq, Mistral, Fireworks) reject any unknown property with HTTP 400. Hermes-side bookkeeping markers (``_thinking_prefill``, ``_empty_recovery_synthetic``, ``_empty_terminal_sentinel``) were leaking through to upstream because only ``_thinking_prefill`` was individually stripped. Reproducible on Groq from v0.16.0 (#69's ``_empty_recovery_synthetic``) but the bug only surfaced widely after v0.17.1 (#108) expanded the recovery path to ``finish_reason=tool_calls`` empty-list — now the marker leak fires on every Groq tool-call response. ## Fix Replace the individual ``api_msg.pop("_thinking_prefill", None)`` in the API-message preparation loop with a generic leading-underscore strip: for _k in [k for k in api_msg if isinstance(k, str) and k.startswith("_")]: api_msg.pop(_k, None) The leading-underscore convention is stable for hermes-internal fields; anything that genuinely needs to ride the API has a documented OpenAI shape (``role`` / ``content`` / ``tool_calls`` / ``tool_call_id`` / ``name`` / ``reasoning_content``). Future internal markers added with ``_`` prefix get auto-stripped without further maintenance. ## Tests - 9 new tests in tests/agent/test_internal_marker_stripping.py: strips each known marker (3); strips multiple in one pass (1); preserves standard OpenAI shape fields (1); handles no-markers and empty-message edge cases (2); non-string keys pass through (1); source-level patch-landed check (1). - 28 total green across affected suites (new + #108 finish-reason recovery + #67 structural terminal). No regression. ## Composition Without this fix, devagentic#295 (tools-strip at boundary) MUST stay in place — lifting it makes the model emit tool_calls finish reasons which trigger #108's recovery which now writes API-incompatible markers. With this fix, the recovery path produces clean OpenAI-shape messages on the wire.

…) (#113) Wraps devagentic's ``executeWorkflowPipeline`` (NousResearch#258) + ``writeWorkflowPipeline`` (NousResearch#261) mutations as MCP tools so workers can compose + execute pipelines directly through their tool surface. R12 of devagentic#210 (flow-aware router umbrella). ## Tools registered | Tool | Wraps | Returns | |---|---|---| | ``run_pipeline(pipeline_id, user_id)`` | ``executeWorkflowPipeline`` | ``PipelineRun {runId, status, nodeOutcomes, ...}`` | | ``propose_pipeline(name, intent, version, nodes, edges, user_id, lineage_ref?, produced_by?)`` | ``writeWorkflowPipeline`` | ``Doc {id, content, tags, source, ts}`` | Composes naturally: ``propose_pipeline → doc_id → run_pipeline``. ## Devagentic-side resolver verification Schemas read directly from devagentic ``api.py`` via gh api: - ``executeWorkflowPipeline(pipelineId: String!, userId: String!) -> PipelineRun`` (api.py:21628) - ``writeWorkflowPipeline(name, intent, version, nodes, edges, userId, lineageRef?, producedBy?) -> Doc`` (api.py:21576) Arg-name camelCase conversion (``pipeline_id`` → ``pipelineId``, ``lineage_ref`` → ``lineageRef``, etc.) matches strawberry's default. Note: #100's body referenced ``ShadowNodeRef`` as the propose_pipeline return — the actual mutation returns ``Doc``. Implementing the real shape. ## Tests - 10 new client-level tests in ``tests/plugins/devagentic-mutations/test_client.py``: * ``run_pipeline``: empty-args fail-soft (1), happy path with arg threading + nested nodeOutcomes (1), non-dict fail-soft (1). * ``propose_pipeline``: empty-args fail-soft (3), invalid-version-type fail-soft (3 parametrized), invalid nodes/edges-type fail-soft (2), happy path with all args threaded (1), optional-args-null-when-omitted (1), non-dict fail-soft (1). - Contract test in ``tests/test_mcp_defensive_registrars.py`` extended: ``run_pipeline`` + ``propose_pipeline`` added to the required-G2/R12 tool set (now 17 named devagentic tools). - 184 total green across affected suites — no regression. ## Composition Unblocks the worker-side pipeline-economy use case in devagentic#210 R12. Workers can now author + execute pipelines without shelling out to ``python api.py``. The recent #99 / #110 fixes (PRs #108 + #111) ensure tool_calls with empty content + populated tool_calls on Groq/strict-validator paths don't cascade through the empty-retry loop, so R12 tools are usable from client-tier dispatches.

…#121) (#122) OpenAI-spec ``tool_call.type`` is required with exactly one valid value (``"function"``). Mistral-large (and likely others) emit tool_calls without the ``type`` field. The OpenAI Python SDK's strict Pydantic validation drops these entries from ``response.choices[0].message.tool_calls`` — hermes then sees an empty list + ``finish_reason=tool_calls`` and falls into PR #108's synthetic-recovery path, where the tool the model wanted to call is lost entirely. ## Fix Add raw-response fallback in ``ChatCompletionsTransport.normalize_response``: - When the SDK-parsed ``msg.tool_calls`` is empty AND ``finish_reason in {"tool_calls", "function_call"}``, inspect the raw response via ``response.model_dump()`` (Pydantic v2), attribute walk, or dict shape. - Iterate the raw tool_calls list, default missing ``type`` to ``"function"`` per spec semantics (the field has only ONE valid value), and synthesize ``ToolCall`` entries. - ``ToolCall``'s existing ``type`` property already returns ``"function"`` unconditionally, so downstream consumers see the recovered call as identical to a normally-parsed one. Strictly additive: only fires when (a) SDK gave us nothing AND (b) the finish_reason signals tools. The SDK-happy-path is untouched. ## Implementation detail Three helpers in ``chat_completions.py``: - ``_DEFAULT_TOOL_CALL_TYPE = "function"`` constant - ``_raw_tool_calls_from_response(response)`` — extracts the raw tool_calls list via model_dump / dict; fail-soft on any errors - ``_normalize_raw_tool_call(tc_raw)`` — builds a ``ToolCall`` from a raw dict entry; returns ``None`` for unrecoverable entries (no function.name); re-serializes dict-arguments back to JSON string per OpenAI contract ## Tests - 17 new tests in tests/agent/test_tool_call_type_default.py: * 6 ``_normalize_raw_tool_call`` tests: with-type, without-type- defaults-function, no-function-block, no-name, not-dict, dict-arguments-reserialized, empty-arguments * 6 ``_raw_tool_calls_from_response`` tests: model_dump path, dict path, no-tool_calls, no-choices, model_dump-raises, non-dict-entries-filtered * 3 end-to-end normalize_response tests: mistral-shape recovery, no-fallback-on-stop-finish_reason, no-double-up-when-sdk-parsed * 2 source-level patch-landed checks - 76 total green across affected suites — no regression on the SDK-happy path or the #67/#99/#108 recovery families. ## Composition Same family of bugs as #99 / #108 / #111 — response handler being over-strict on the OpenAI spec. This is the next-uncovered case (tool_calls present in wire response but stripped by SDK before reaching us). Devagentic#NN (the companion devagentic-side normalization at the OAI-shim boundary) becomes redundant defense after this lands — still harmless to keep.

PowerCreek merged commit 24ca4c3 into main May 26, 2026

PowerCreek deleted the strip-internal-markers-before-api branch May 26, 2026 23:20

PowerCreek mentioned this pull request May 26, 2026

chore: bump pyproject.toml + uv.lock to 0.17.2 #112

Merged

PowerCreek added a commit that referenced this pull request May 26, 2026

chore: bump pyproject.toml + uv.lock to 0.17.2 (#111 release) (#112)

56f0ac1

PowerCreek mentioned this pull request May 26, 2026

feat(mcp): run_pipeline + propose_pipeline MCP tools — R12 (closes #100) #113

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(api): strip ALL leading-underscore internal markers before API send (closes #110)#111

fix(api): strip ALL leading-underscore internal markers before API send (closes #110)#111
PowerCreek merged 1 commit into
mainfrom
strip-internal-markers-before-api

PowerCreek commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

PowerCreek commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant