[Bug]: HTTP 400 "reasoning_content must be passed back" with deepseek-v4-pro in cron/auxiliary path (thinking mode works in main loop, breaks elsewhere)

### Bug Description

Hermes correctly preserves `reasoning_content` in the main loop (`run_agent.py::_copy_reasoning_content_for_api`). I verified this with a two-turn round-trip test against `deepseek-v4-pro` — it passes.

In production, however, a long-running cron job consistently fails with HTTP 400 `reasoning_content must be passed back to the API` after several auxiliary calls (`title_generation`, `vision_analyze`, `auxiliary auto-detect`). The error happens ~6 minutes after the last `title_generation` call, with no user intervention between.

This looks like the same passthrough is missing in `auxiliary_client` and/or the context compressor path, not in the main loop.

### Steps to Reproduce

1. Set the main model to `deepseek-v4-pro` via custom provider (`https://api.deepseek.com`, `api: openai-completions`). Thinking is enabled by default on v4-pro.
2. Register a cron job that runs a non-trivial multi-step task (with at least one tool call + at least one vision/title/session-search auxiliary trigger).
3. Let the job run for 15–30 minutes.
4. Observe `Non-retryable client error: Error code: 400 ... The reasoning_content in the thinking mode must be passed back to the API.` in the log.

### Expected Behavior

Either:
- `reasoning_content` is preserved along the auxiliary / compression / cron paths the same way it is in the main loop, **or**
- auxiliary calls default to `thinking: disabled` (they don't need CoT for title generation / vision descriptions / session search anyway).

### Actual Behavior

HTTP 400 on a DeepSeek v4-pro call somewhere in the cron → auxiliary → main-loop chain; the session becomes "poisoned" and cannot recover without clearing history.

### Affected Component

CLI (interactive chat)

### Messaging Platform (if gateway-related)

N/A (CLI only)

### Debug Report

```shell
2026-04-24 16:33:46 INFO  cron.scheduler: Running job 'qwen-auto-setup' (d33d5e95…)
2026-04-24 16:33:47 INFO  agent.auxiliary_client: Vision auto-detect: using main provider custom (deepseek-v4-pro)
2026-04-24 16:33:47 INFO  agent.auxiliary_client: Auxiliary auto-detect: using main provider custom (deepseek-v4-pro)
2026-04-24 16:41:22 INFO  agent.auxiliary_client: Vision auto-detect: using main provider custom (deepseek-v4-pro)
2026-04-24 16:41:38 INFO  agent.auxiliary_client: Auxiliary auto-detect: using main provider custom (deepseek-v4-pro)
2026-04-24 16:48:44 INFO  agent.auxiliary_client: Auxiliary title_generation: using auto (deepseek-v4-pro) at https://api.deepseek.com
2026-04-24 16:55:18 ERROR [cron_d33d5e95…] root: Non-retryable client error:
  Error code: 400 - {'error': {'message': 'The `reasoning_content` in the thinking mode must be passed back to the API.',
                               'type': 'invalid_request_error', 'code': 'invalid_request_error'}}


Note the 6-minute gap between the last auxiliary call (`title_generation` at 16:48:44) and the 400 (at 16:55:18) — main-loop tool calls happened in between but are at DEBUG level, not in this excerpt. Full log can be attached on request.

Two-turn round-trip test (model `deepseek-v4-pro`, thinking enabled) with history `user → assistant(+reasoning_content) → user` returns HTTP 200. So `_copy_reasoning_content_for_api` itself works — the issue is elsewhere in the request-assembly chain.

1. **`agent/auxiliary_client.py`** — auxiliary tasks (`title_generation`, `vision_analyze`, `session_search`) assemble their own minimal `messages` payload and may not carry `reasoning_content` even when the selected model requires it. Likely related: #9571 (GLM 5.1 `title_generation` produces empty content because reasoning eats the `max_tokens: 30` budget).
2. **Context compressor** (`agent/context_engine.py`) — when rebuilding assistant messages from summaries, `tool_calls` may survive while the matching `reasoning_content` is lost. Analogous to #11096 for Anthropic extended thinking: "The final block in an assistant message cannot be `thinking`."

- #9571 — `title_generation` breaks on reasoning model (GLM 5.1), same auxiliary path
- #11096 — HTTP 400 on compressed assistant messages (Anthropic extended thinking)
- #13927 — HTTP 400 with OpenRouter when tools are enabled
```

### Operating System

Ubuntu: 24.04.2

### Python Version

3.11.8

### Hermes Version

0.11.0

### Additional Logs / Traceback (optional)

```shell

```

### Root Cause Analysis (optional)

_No response_

### Proposed Fix (optional)

_No response_

### Are you willing to submit a PR for this?

- [ ] I'd like to fix this myself and submit a PR

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: HTTP 400 "reasoning_content must be passed back" with deepseek-v4-pro in cron/auxiliary path (thinking mode works in main loop, breaks elsewhere) #15213

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Messaging Platform (if gateway-related)

Debug Report

Operating System

Python Version

Hermes Version

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

Proposed Fix (optional)

Are you willing to submit a PR for this?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug]: HTTP 400 "reasoning_content must be passed back" with deepseek-v4-pro in cron/auxiliary path (thinking mode works in main loop, breaks elsewhere) #15213

Description

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Messaging Platform (if gateway-related)

Debug Report

Operating System

Python Version

Hermes Version

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

Proposed Fix (optional)

Are you willing to submit a PR for this?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions