Skip to content

fix(api): strip ALL leading-underscore internal markers before API send (closes #110)#111

Merged
PowerCreek merged 1 commit into
mainfrom
strip-internal-markers-before-api
May 26, 2026
Merged

fix(api): strip ALL leading-underscore internal markers before API send (closes #110)#111
PowerCreek merged 1 commit into
mainfrom
strip-internal-markers-before-api

Conversation

@PowerCreek

Copy link
Copy Markdown

Critical fix — closes #110. PR #108's synthetic recovery markers leaked to upstream API, Groq's strict validation rejected with HTTP 400 cascading on every retry. Replaces individual _thinking_prefill strip with generic underscore-prefix strip. See #110 body for full context. 28 tests green. v0.17.2 wheel will rebuild after merge.

…nd (closes #110)

Strict OpenAI-compat validators (Groq, Mistral, Fireworks) reject any
unknown property with HTTP 400. Hermes-side bookkeeping markers
(``_thinking_prefill``, ``_empty_recovery_synthetic``,
``_empty_terminal_sentinel``) were leaking through to upstream because
only ``_thinking_prefill`` was individually stripped.

Reproducible on Groq from v0.16.0 (#69's ``_empty_recovery_synthetic``)
but the bug only surfaced widely after v0.17.1 (#108) expanded the
recovery path to ``finish_reason=tool_calls`` empty-list — now the
marker leak fires on every Groq tool-call response.

## Fix

Replace the individual ``api_msg.pop("_thinking_prefill", None)`` in
the API-message preparation loop with a generic leading-underscore
strip:

  for _k in [k for k in api_msg if isinstance(k, str) and k.startswith("_")]:
      api_msg.pop(_k, None)

The leading-underscore convention is stable for hermes-internal
fields; anything that genuinely needs to ride the API has a
documented OpenAI shape (``role`` / ``content`` / ``tool_calls`` /
``tool_call_id`` / ``name`` / ``reasoning_content``). Future internal
markers added with ``_`` prefix get auto-stripped without further
maintenance.

## Tests

- 9 new tests in tests/agent/test_internal_marker_stripping.py:
  strips each known marker (3); strips multiple in one pass (1);
  preserves standard OpenAI shape fields (1); handles
  no-markers and empty-message edge cases (2); non-string keys
  pass through (1); source-level patch-landed check (1).
- 28 total green across affected suites (new + #108 finish-reason
  recovery + #67 structural terminal). No regression.

## Composition

Without this fix, devagentic#295 (tools-strip at boundary) MUST stay
in place — lifting it makes the model emit tool_calls finish reasons
which trigger #108's recovery which now writes API-incompatible
markers. With this fix, the recovery path produces clean OpenAI-shape
messages on the wire.
@PowerCreek PowerCreek merged commit 24ca4c3 into main May 26, 2026
@PowerCreek PowerCreek deleted the strip-internal-markers-before-api branch May 26, 2026 23:20
PowerCreek added a commit that referenced this pull request May 26, 2026
…) (#113)

Wraps devagentic's ``executeWorkflowPipeline`` (NousResearch#258) +
``writeWorkflowPipeline`` (NousResearch#261) mutations as MCP tools so workers
can compose + execute pipelines directly through their tool
surface. R12 of devagentic#210 (flow-aware router umbrella).

## Tools registered

| Tool | Wraps | Returns |
|---|---|---|
| ``run_pipeline(pipeline_id, user_id)`` | ``executeWorkflowPipeline`` | ``PipelineRun {runId, status, nodeOutcomes, ...}`` |
| ``propose_pipeline(name, intent, version, nodes, edges, user_id, lineage_ref?, produced_by?)`` | ``writeWorkflowPipeline`` | ``Doc {id, content, tags, source, ts}`` |

Composes naturally: ``propose_pipeline → doc_id → run_pipeline``.

## Devagentic-side resolver verification

Schemas read directly from devagentic ``api.py`` via gh api:

- ``executeWorkflowPipeline(pipelineId: String!, userId: String!) -> PipelineRun`` (api.py:21628)
- ``writeWorkflowPipeline(name, intent, version, nodes, edges, userId, lineageRef?, producedBy?) -> Doc`` (api.py:21576)

Arg-name camelCase conversion (``pipeline_id`` → ``pipelineId``,
``lineage_ref`` → ``lineageRef``, etc.) matches strawberry's default.

Note: #100's body referenced ``ShadowNodeRef`` as the propose_pipeline
return — the actual mutation returns ``Doc``. Implementing the real
shape.

## Tests

- 10 new client-level tests in
  ``tests/plugins/devagentic-mutations/test_client.py``:
  * ``run_pipeline``: empty-args fail-soft (1), happy path with arg
    threading + nested nodeOutcomes (1), non-dict fail-soft (1).
  * ``propose_pipeline``: empty-args fail-soft (3),
    invalid-version-type fail-soft (3 parametrized), invalid
    nodes/edges-type fail-soft (2), happy path with all args
    threaded (1), optional-args-null-when-omitted (1), non-dict
    fail-soft (1).
- Contract test in
  ``tests/test_mcp_defensive_registrars.py`` extended:
  ``run_pipeline`` + ``propose_pipeline`` added to the required-G2/R12
  tool set (now 17 named devagentic tools).
- 184 total green across affected suites — no regression.

## Composition

Unblocks the worker-side pipeline-economy use case in
devagentic#210 R12. Workers can now author + execute pipelines
without shelling out to ``python api.py``. The recent #99 / #110
fixes (PRs #108 + #111) ensure tool_calls with empty content +
populated tool_calls on Groq/strict-validator paths don't cascade
through the empty-retry loop, so R12 tools are usable from
client-tier dispatches.
PowerCreek added a commit that referenced this pull request May 27, 2026
…#121) (#122)

OpenAI-spec ``tool_call.type`` is required with exactly one valid
value (``"function"``). Mistral-large (and likely others) emit
tool_calls without the ``type`` field. The OpenAI Python SDK's
strict Pydantic validation drops these entries from
``response.choices[0].message.tool_calls`` — hermes then sees an
empty list + ``finish_reason=tool_calls`` and falls into PR #108's
synthetic-recovery path, where the tool the model wanted to call
is lost entirely.

## Fix

Add raw-response fallback in
``ChatCompletionsTransport.normalize_response``:

- When the SDK-parsed ``msg.tool_calls`` is empty AND
  ``finish_reason in {"tool_calls", "function_call"}``, inspect
  the raw response via ``response.model_dump()`` (Pydantic v2),
  attribute walk, or dict shape.
- Iterate the raw tool_calls list, default missing ``type`` to
  ``"function"`` per spec semantics (the field has only ONE valid
  value), and synthesize ``ToolCall`` entries.
- ``ToolCall``'s existing ``type`` property already returns
  ``"function"`` unconditionally, so downstream consumers see the
  recovered call as identical to a normally-parsed one.

Strictly additive: only fires when (a) SDK gave us nothing AND (b)
the finish_reason signals tools. The SDK-happy-path is untouched.

## Implementation detail

Three helpers in ``chat_completions.py``:

- ``_DEFAULT_TOOL_CALL_TYPE = "function"`` constant
- ``_raw_tool_calls_from_response(response)`` — extracts the raw
  tool_calls list via model_dump / dict; fail-soft on any errors
- ``_normalize_raw_tool_call(tc_raw)`` — builds a ``ToolCall``
  from a raw dict entry; returns ``None`` for unrecoverable
  entries (no function.name); re-serializes dict-arguments back
  to JSON string per OpenAI contract

## Tests

- 17 new tests in tests/agent/test_tool_call_type_default.py:
  * 6 ``_normalize_raw_tool_call`` tests: with-type, without-type-
    defaults-function, no-function-block, no-name, not-dict,
    dict-arguments-reserialized, empty-arguments
  * 6 ``_raw_tool_calls_from_response`` tests: model_dump path,
    dict path, no-tool_calls, no-choices, model_dump-raises,
    non-dict-entries-filtered
  * 3 end-to-end normalize_response tests: mistral-shape recovery,
    no-fallback-on-stop-finish_reason, no-double-up-when-sdk-parsed
  * 2 source-level patch-landed checks
- 76 total green across affected suites — no regression on the
  SDK-happy path or the #67/#99/#108 recovery families.

## Composition

Same family of bugs as #99 / #108 / #111 — response handler being
over-strict on the OpenAI spec. This is the next-uncovered case
(tool_calls present in wire response but stripped by SDK before
reaching us).

Devagentic#NN (the companion devagentic-side normalization at the
OAI-shim boundary) becomes redundant defense after this lands —
still harmless to keep.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

synthetic recovery markers leak to upstream API — Groq rejects with HTTP 400

1 participant