Bug: Empty assistant message from reasoning-only fix leaks into Chat Completions API, causing prefill rejection

## Bug Description

Commit `e84d952d` ("fix(codex): handle reasoning-only responses and replay path #2070") adds an empty assistant message after reasoning-only responses to satisfy the Responses API `missing_following_item` requirement:

```python
# run_agent.py, _chat_messages_to_responses_input()
items.append({"role": "assistant", "content": ""})
```

This empty assistant message persists in the conversation history and gets sent on subsequent turns to **any** provider — including the Chat Completions API path and Anthropic Messages API providers that reject trailing assistant messages as unsupported prefill.

## Reproduction

1. Use Hermes with `provider: litellm` (or any OpenAI-compatible custom endpoint) routing to a mix of models
2. Have a thinking model (e.g. MiniMax M2.5, Kimi K2.5) return a reasoning-only response (only `reasoning_content`, no visible `content`)
3. The empty assistant message gets appended to conversation history
4. Next user message triggers a request with messages ending in: `...user → assistant("") → user`
5. The provider rejects with: `"This model does not support assistant message prefill. The conversation must end with a user message."`

## Error Message

```
litellm.BadRequestError: OpenAIException - This model does not support assistant message prefill. 
The conversation must end with a user message.
```

This also occurs with Anthropic Messages API when using OAuth tokens (Claude Max subscriptions via CLIProxyAPI), which explicitly reject prefill.

## Impact

- Affects any multi-model setup where thinking models coexist with non-thinking models
- Affects Claude Max subscription users routing through OpenAI-compatible proxies
- The empty assistant message is invisible to the user but breaks subsequent API calls
- The error repeats on every subsequent message until the session is reset

## Root Cause

The fix in `_chat_messages_to_responses_input()` correctly handles the Responses API requirement, but the empty assistant message should either:

1. Be stripped before sending to non-Responses-API providers (Chat Completions / Messages API)
2. Be tagged as Responses-API-specific so it does not get included in Chat Completions API requests
3. Be cleaned up from the conversation history after the Responses API turn completes

## Workaround

LiteLLM custom callback that strips trailing empty assistant messages:

```python
class StripPrefillCallback(CustomLogger):
    async def async_pre_call_hook(self, user_api_key_dict, cache, data, call_type):
        messages = data.get("messages", [])
        if messages and messages[-1].get("role") == "assistant":
            content = messages[-1].get("content", "")
            if not content or len(str(content)) < 50:
                data["messages"] = messages[:-1]
        return data
```

## Environment

- Hermes Agent: commit `4c0c7f4c` (post-update including `e84d952d`)
- LiteLLM proxy with OpenAI-compatible endpoints
- Thinking models (MiniMax, Kimi) producing reasoning-only responses
- Claude Max via CLIProxyAPI (OAuth-based, rejects prefill)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Empty assistant message from reasoning-only fix leaks into Chat Completions API, causing prefill rejection #2128

Bug Description

Reproduction

Error Message

Impact

Root Cause

Workaround

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Bug: Empty assistant message from reasoning-only fix leaks into Chat Completions API, causing prefill rejection #2128

Description

Bug Description

Reproduction

Error Message

Impact

Root Cause

Workaround

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions