fix(api): strip ALL leading-underscore internal markers before API send (closes #110)#111
Merged
Merged
Conversation
…nd (closes #110) Strict OpenAI-compat validators (Groq, Mistral, Fireworks) reject any unknown property with HTTP 400. Hermes-side bookkeeping markers (``_thinking_prefill``, ``_empty_recovery_synthetic``, ``_empty_terminal_sentinel``) were leaking through to upstream because only ``_thinking_prefill`` was individually stripped. Reproducible on Groq from v0.16.0 (#69's ``_empty_recovery_synthetic``) but the bug only surfaced widely after v0.17.1 (#108) expanded the recovery path to ``finish_reason=tool_calls`` empty-list — now the marker leak fires on every Groq tool-call response. ## Fix Replace the individual ``api_msg.pop("_thinking_prefill", None)`` in the API-message preparation loop with a generic leading-underscore strip: for _k in [k for k in api_msg if isinstance(k, str) and k.startswith("_")]: api_msg.pop(_k, None) The leading-underscore convention is stable for hermes-internal fields; anything that genuinely needs to ride the API has a documented OpenAI shape (``role`` / ``content`` / ``tool_calls`` / ``tool_call_id`` / ``name`` / ``reasoning_content``). Future internal markers added with ``_`` prefix get auto-stripped without further maintenance. ## Tests - 9 new tests in tests/agent/test_internal_marker_stripping.py: strips each known marker (3); strips multiple in one pass (1); preserves standard OpenAI shape fields (1); handles no-markers and empty-message edge cases (2); non-string keys pass through (1); source-level patch-landed check (1). - 28 total green across affected suites (new + #108 finish-reason recovery + #67 structural terminal). No regression. ## Composition Without this fix, devagentic#295 (tools-strip at boundary) MUST stay in place — lifting it makes the model emit tool_calls finish reasons which trigger #108's recovery which now writes API-incompatible markers. With this fix, the recovery path produces clean OpenAI-shape messages on the wire.
PowerCreek
added a commit
that referenced
this pull request
May 26, 2026
PowerCreek
added a commit
that referenced
this pull request
May 26, 2026
…) (#113) Wraps devagentic's ``executeWorkflowPipeline`` (NousResearch#258) + ``writeWorkflowPipeline`` (NousResearch#261) mutations as MCP tools so workers can compose + execute pipelines directly through their tool surface. R12 of devagentic#210 (flow-aware router umbrella). ## Tools registered | Tool | Wraps | Returns | |---|---|---| | ``run_pipeline(pipeline_id, user_id)`` | ``executeWorkflowPipeline`` | ``PipelineRun {runId, status, nodeOutcomes, ...}`` | | ``propose_pipeline(name, intent, version, nodes, edges, user_id, lineage_ref?, produced_by?)`` | ``writeWorkflowPipeline`` | ``Doc {id, content, tags, source, ts}`` | Composes naturally: ``propose_pipeline → doc_id → run_pipeline``. ## Devagentic-side resolver verification Schemas read directly from devagentic ``api.py`` via gh api: - ``executeWorkflowPipeline(pipelineId: String!, userId: String!) -> PipelineRun`` (api.py:21628) - ``writeWorkflowPipeline(name, intent, version, nodes, edges, userId, lineageRef?, producedBy?) -> Doc`` (api.py:21576) Arg-name camelCase conversion (``pipeline_id`` → ``pipelineId``, ``lineage_ref`` → ``lineageRef``, etc.) matches strawberry's default. Note: #100's body referenced ``ShadowNodeRef`` as the propose_pipeline return — the actual mutation returns ``Doc``. Implementing the real shape. ## Tests - 10 new client-level tests in ``tests/plugins/devagentic-mutations/test_client.py``: * ``run_pipeline``: empty-args fail-soft (1), happy path with arg threading + nested nodeOutcomes (1), non-dict fail-soft (1). * ``propose_pipeline``: empty-args fail-soft (3), invalid-version-type fail-soft (3 parametrized), invalid nodes/edges-type fail-soft (2), happy path with all args threaded (1), optional-args-null-when-omitted (1), non-dict fail-soft (1). - Contract test in ``tests/test_mcp_defensive_registrars.py`` extended: ``run_pipeline`` + ``propose_pipeline`` added to the required-G2/R12 tool set (now 17 named devagentic tools). - 184 total green across affected suites — no regression. ## Composition Unblocks the worker-side pipeline-economy use case in devagentic#210 R12. Workers can now author + execute pipelines without shelling out to ``python api.py``. The recent #99 / #110 fixes (PRs #108 + #111) ensure tool_calls with empty content + populated tool_calls on Groq/strict-validator paths don't cascade through the empty-retry loop, so R12 tools are usable from client-tier dispatches.
This was referenced May 26, 2026
PowerCreek
added a commit
that referenced
this pull request
May 27, 2026
…#121) (#122) OpenAI-spec ``tool_call.type`` is required with exactly one valid value (``"function"``). Mistral-large (and likely others) emit tool_calls without the ``type`` field. The OpenAI Python SDK's strict Pydantic validation drops these entries from ``response.choices[0].message.tool_calls`` — hermes then sees an empty list + ``finish_reason=tool_calls`` and falls into PR #108's synthetic-recovery path, where the tool the model wanted to call is lost entirely. ## Fix Add raw-response fallback in ``ChatCompletionsTransport.normalize_response``: - When the SDK-parsed ``msg.tool_calls`` is empty AND ``finish_reason in {"tool_calls", "function_call"}``, inspect the raw response via ``response.model_dump()`` (Pydantic v2), attribute walk, or dict shape. - Iterate the raw tool_calls list, default missing ``type`` to ``"function"`` per spec semantics (the field has only ONE valid value), and synthesize ``ToolCall`` entries. - ``ToolCall``'s existing ``type`` property already returns ``"function"`` unconditionally, so downstream consumers see the recovered call as identical to a normally-parsed one. Strictly additive: only fires when (a) SDK gave us nothing AND (b) the finish_reason signals tools. The SDK-happy-path is untouched. ## Implementation detail Three helpers in ``chat_completions.py``: - ``_DEFAULT_TOOL_CALL_TYPE = "function"`` constant - ``_raw_tool_calls_from_response(response)`` — extracts the raw tool_calls list via model_dump / dict; fail-soft on any errors - ``_normalize_raw_tool_call(tc_raw)`` — builds a ``ToolCall`` from a raw dict entry; returns ``None`` for unrecoverable entries (no function.name); re-serializes dict-arguments back to JSON string per OpenAI contract ## Tests - 17 new tests in tests/agent/test_tool_call_type_default.py: * 6 ``_normalize_raw_tool_call`` tests: with-type, without-type- defaults-function, no-function-block, no-name, not-dict, dict-arguments-reserialized, empty-arguments * 6 ``_raw_tool_calls_from_response`` tests: model_dump path, dict path, no-tool_calls, no-choices, model_dump-raises, non-dict-entries-filtered * 3 end-to-end normalize_response tests: mistral-shape recovery, no-fallback-on-stop-finish_reason, no-double-up-when-sdk-parsed * 2 source-level patch-landed checks - 76 total green across affected suites — no regression on the SDK-happy path or the #67/#99/#108 recovery families. ## Composition Same family of bugs as #99 / #108 / #111 — response handler being over-strict on the OpenAI spec. This is the next-uncovered case (tool_calls present in wire response but stripped by SDK before reaching us). Devagentic#NN (the companion devagentic-side normalization at the OAI-shim boundary) becomes redundant defense after this lands — still harmless to keep.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Critical fix — closes #110. PR #108's synthetic recovery markers leaked to upstream API, Groq's strict validation rejected with HTTP 400 cascading on every retry. Replaces individual _thinking_prefill strip with generic underscore-prefix strip. See #110 body for full context. 28 tests green. v0.17.2 wheel will rebuild after merge.