Skip to content

Bug: Empty assistant message from reasoning-only fix leaks into Chat Completions API, causing prefill rejection #2128

@bigph00t

Description

@bigph00t

Bug Description

Commit e84d952d ("fix(codex): handle reasoning-only responses and replay path #2070") adds an empty assistant message after reasoning-only responses to satisfy the Responses API missing_following_item requirement:

# run_agent.py, _chat_messages_to_responses_input()
items.append({"role": "assistant", "content": ""})

This empty assistant message persists in the conversation history and gets sent on subsequent turns to any provider — including the Chat Completions API path and Anthropic Messages API providers that reject trailing assistant messages as unsupported prefill.

Reproduction

  1. Use Hermes with provider: litellm (or any OpenAI-compatible custom endpoint) routing to a mix of models
  2. Have a thinking model (e.g. MiniMax M2.5, Kimi K2.5) return a reasoning-only response (only reasoning_content, no visible content)
  3. The empty assistant message gets appended to conversation history
  4. Next user message triggers a request with messages ending in: ...user → assistant("") → user
  5. The provider rejects with: "This model does not support assistant message prefill. The conversation must end with a user message."

Error Message

litellm.BadRequestError: OpenAIException - This model does not support assistant message prefill. 
The conversation must end with a user message.

This also occurs with Anthropic Messages API when using OAuth tokens (Claude Max subscriptions via CLIProxyAPI), which explicitly reject prefill.

Impact

  • Affects any multi-model setup where thinking models coexist with non-thinking models
  • Affects Claude Max subscription users routing through OpenAI-compatible proxies
  • The empty assistant message is invisible to the user but breaks subsequent API calls
  • The error repeats on every subsequent message until the session is reset

Root Cause

The fix in _chat_messages_to_responses_input() correctly handles the Responses API requirement, but the empty assistant message should either:

  1. Be stripped before sending to non-Responses-API providers (Chat Completions / Messages API)
  2. Be tagged as Responses-API-specific so it does not get included in Chat Completions API requests
  3. Be cleaned up from the conversation history after the Responses API turn completes

Workaround

LiteLLM custom callback that strips trailing empty assistant messages:

class StripPrefillCallback(CustomLogger):
    async def async_pre_call_hook(self, user_api_key_dict, cache, data, call_type):
        messages = data.get("messages", [])
        if messages and messages[-1].get("role") == "assistant":
            content = messages[-1].get("content", "")
            if not content or len(str(content)) < 50:
                data["messages"] = messages[:-1]
        return data

Environment

  • Hermes Agent: commit 4c0c7f4c (post-update including e84d952d)
  • LiteLLM proxy with OpenAI-compatible endpoints
  • Thinking models (MiniMax, Kimi) producing reasoning-only responses
  • Claude Max via CLIProxyAPI (OAuth-based, rejects prefill)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions