Three bugs in v0.9.0: context_length override, thinking-block sessions, config.yaml vs env precedence

Hi team — surfacing three bugs we hit running v0.9.0 (`v2026.4.13`, commit `1af2e18d`) in production. All three are reproducible on a standard install; all three have local workarounds we've shipped, but the root cause lives upstream.

### Environment

- Hermes v0.9.0 / v2026.4.13, commit `1af2e18d`
- Running on Ubuntu 24.04, Python 3.11 venv
- Five Hermes instances (`prime` on Anthropic direct, three Horsemen on Gemini / GPT-5.4 proxy / Grok, one JiMMY on local llama-server)
- Platforms: Telegram + Signal via signal-cli HTTP bridge

---

### Bug 1 — `MINIMUM_CONTEXT_LENGTH` gate ignores the documented `model.context_length` config override

**Commit that introduced it:** `c8aff7463` ("prevent agent from stopping mid-task", 2026-04-11) added the 64K minimum context guard at `run_agent.py:1355-1366`.

**Symptom:** Agents running against models with a native context window below 64K (e.g. Qwen 3.5 35B-A3B at 32K) fail at agent init with `ValueError: context_length below MINIMUM_CONTEXT_LENGTH`. Every `hermes chat` invocation fails in ~1s.

**Why it's a bug:** The error message explicitly documents an escape hatch:
> *"...or set `model.context_length` in config.yaml to override"*

But the guard fires unconditionally — it doesn't check whether the operator set `model.context_length` in `config.yaml`. The escape hatch is cosmetic.

**Minimal repro:**
1. Set `model.context_length: 32000` in `config.yaml`
2. Run `hermes chat -q "hello" -Q`
3. Error: agent init fails despite the override

**Our fix:** `run_agent.py:1358-1366`, add `and _config_context_length is None` to the gate condition. `_config_context_length` is already a local in the same `__init__`.

```python
from agent.model_metadata import MINIMUM_CONTEXT_LENGTH
_ctx = getattr(self.context_compressor, "context_length", 0)
if _ctx and _ctx < MINIMUM_CONTEXT_LENGTH and _config_context_length is None:
    raise ValueError(...)
```

**Impact on us:** Every agent run against a sub-64K model broke at init. With the guard change, all Tier 1 evals pass against that model.

---

### Bug 2 — Trailing `thinking` block in assistant turn poisons the session → every subsequent API call returns HTTP 400

**Symptom:** Once per session, typically after context compression or session truncation, an assistant turn is left with a `thinking` (or `redacted_thinking`) block as its final content block — no text or tool_use block after it. Anthropic's API then rejects every subsequent request with:

```
HTTP 400 - messages.N: The final block in an assistant message cannot be `thinking`.
```

The session is effectively poisoned until manually purged (move the `.jsonl` out of `sessions/`).

**Relevant call site:** `agent/anthropic_adapter.py`, `convert_messages_to_anthropic()`. The function already has extensive thinking-block signature management (strip/downgrade/merge), but doesn't enforce that the final block in an assistant message is non-thinking.

**Reproduction:** Hard to trigger deterministically — seems to correlate with thinking-mode interrupts and context compression. We have two captured request dumps where `messages[5]` and `messages[9]` each end on a lone `thinking` block that Anthropic rejects. An anonymized minimal repro skeleton (one of the dumps with all user text, system prompt, tools, and auth stripped) is included inline below (full raw dumps available privately on request).

**How to detect in the wild:** Any afflicted session `.jsonl` will have an assistant turn whose final content block is `thinking` or `redacted_thinking`. Grep each session file: if the last assistant `content[-1].type` is one of those, the session is poisoned.

**Our fix:** Added a guard at the end of `convert_messages_to_anthropic()`, after the existing thinking-block processing loop. If any assistant message's final block is still `thinking`/`redacted_thinking`, append a minimal text block:

```python
for m in result:
    if m.get("role") != "assistant" or not isinstance(m.get("content"), list) or not m["content"]:
        continue
    last_block = m["content"][-1]
    if isinstance(last_block, dict) and last_block.get("type") in _THINKING_TYPES:
        m["content"].append({"type": "text", "text": "(continuing)"})
```

Validated by replay: our captured 400-causing request now converts to a valid payload with `messages[9]` ending in the sentinel text block. Reasoning content above is preserved.

**Root cause guess:** Probably in `agent/context_engine.py` or the compressor — upstream mutation snips the trailing text/tool_use while leaving the thinking block in place. A proper fix would prevent creating the state, not just paper over it; our guard is a safety net.

**Impact on us:** Session corruption causes user-visible "Error code: 400" fallback messages (Prime emits them to the chat as a last-ditch reply). Every operator restart cost us a full conversation.

---

### Bug 3 — `config.yaml` `signal.enabled: false` is silently ignored when `SIGNAL_*` env vars are set

**Relevant code:** `gateway/config.py:820-830`:

```python
signal_url = os.getenv("SIGNAL_HTTP_URL")
signal_account = os.getenv("SIGNAL_ACCOUNT")
if signal_url and signal_account:
    if Platform.SIGNAL not in config.platforms:
        config.platforms[Platform.SIGNAL] = PlatformConfig()
    config.platforms[Platform.SIGNAL].enabled = True  # ← unconditional
    ...
```

**Symptom:** Setting `signal.enabled: false` in both the top-level `signal:` block and the `platforms.signal:` block of `config.yaml` is silently overwritten when both `SIGNAL_HTTP_URL` and `SIGNAL_ACCOUNT` env vars are present. The `enabled = True` assignment on line 827 clobbers the YAML flag with no log line or warning.

**Impact on us:** We wanted to disable Prime's Signal integration (to give ownership of the Signal reply path to a separate service). We set `enabled: false` in YAML and restarted — but Prime silently re-enabled Signal from the env vars. Diagnosing this took ~30 minutes of confusion. The fix was to comment out the env vars in `.env`; the YAML flag was never honored.

**Suggested fix:** Env vars should supply defaults *when YAML is absent*, not override explicit YAML values. Either:

- Check `config.platforms[Platform.SIGNAL].enabled` before overwriting, OR
- Keep the YAML read as authoritative, and only use env vars to fill in missing fields (url/account) when the platform is already enabled in YAML

Same pattern likely exists for other platforms (mattermost, matrix, etc.) — any operator who uses `config.yaml` to disable a platform while env vars happen to be set will hit this silently.

---

## Meta

All three bugs have local patches applied to our shared-codebase install. I'll open three separate PRs (one per bug) for atomic review unless you'd prefer them bundled — just say the word. Full raw request dumps for Bug 2 available privately on request.

Thanks for the great work on v0.9 — it's been a meaningful improvement over v0.8 across the board, these are the few rough edges we hit.

---

## Appendix — Bug 2 minimal repro skeleton (anonymized)

<details>
<summary>Click to expand — 2.8 KB JSON, shape-only, no user data</summary>

```json
{
  "__note__": "Anonymized minimal repro skeleton for Hermes v0.9.0 Bug 2 (trailing thinking block \u2192 HTTP 400). Derived from a real production request dump; all user content, system prompt, tool definitions, phone numbers, and auth headers stripped. Shows only the message-block *shape* that Anthropic rejects.",
  "hermes_version": "v0.9.0 (v2026.4.13)",
  "hermes_commit": "1af2e18d",
  "session_id": "<redacted>",
  "anthropic_error": {
    "status_code": 400,
    "type": "invalid_request_error",
    "message": "messages.9: The final block in an assistant message cannot be `thinking`.",
    "request_id": "req_011Ca7V8wAqYKB785YXoudTE"
  },
  "request_skeleton": {
    "model": "claude-sonnet-4-6",
    "max_tokens": 64000,
    "thinking_enabled": true,
    "system_blocks": 2,
    "tools_count": 41,
    "messages": [
      {
        "role": "user",
        "content_type": "string",
        "content_length_chars": 150
      },
      {
        "role": "assistant",
        "content_blocks": [
          {
            "type": "text",
            "text_length_chars": 67
          }
        ]
      },
      {
        "role": "user",
        "content_type": "string",
        "content_length_chars": 156
      },
      {
        "role": "assistant",
        "content_blocks": [
          {
            "type": "text",
            "text_length_chars": 49
          }
        ]
      },
      {
        "role": "user",
        "content_type": "string",
        "content_length_chars": 158
      },
      {
        "role": "assistant",
        "content_blocks": [
          {
            "type": "text",
            "text_length_chars": 78
          },
          {
            "type": "text",
            "text_length_chars": 74
          }
        ]
      },
      {
        "role": "user",
        "content_type": "string",
        "content_length_chars": 157
      },
      {
        "role": "assistant",
        "content_blocks": [
          {
            "type": "text",
            "text_length_chars": 93
          }
        ]
      },
      {
        "role": "user",
        "content_type": "string",
        "content_length_chars": 15
      },
      {
        "role": "assistant",
        "content_blocks": [
          {
            "type": "thinking",
            "signature_present": true,
            "thinking_length_chars": 38
          }
        ]
      }
    ]
  },
  "diagnosis": "messages[9] is an assistant turn whose content is a single `thinking` block with no trailing `text` or `tool_use` block. Anthropic's API rejects the whole request. Every subsequent call in the session fails identically until the .jsonl is manually removed from sessions/. Detect in the wild: grep each session .jsonl for any assistant turn whose final content block is `thinking` or `redacted_thinking`."
}
```

</details>



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Three bugs in v0.9.0: context_length override, thinking-block sessions, config.yaml vs env precedence #11096

Environment

Bug 1 — `MINIMUM_CONTEXT_LENGTH` gate ignores the documented `model.context_length` config override

Bug 2 — Trailing `thinking` block in assistant turn poisons the session → every subsequent API call returns HTTP 400

Bug 3 — `config.yaml` `signal.enabled: false` is silently ignored when `SIGNAL_*` env vars are set

Meta

Appendix — Bug 2 minimal repro skeleton (anonymized)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Three bugs in v0.9.0: context_length override, thinking-block sessions, config.yaml vs env precedence #11096

Description

Environment

Bug 1 — MINIMUM_CONTEXT_LENGTH gate ignores the documented model.context_length config override

Bug 2 — Trailing thinking block in assistant turn poisons the session → every subsequent API call returns HTTP 400

Bug 3 — config.yaml signal.enabled: false is silently ignored when SIGNAL_* env vars are set

Meta

Appendix — Bug 2 minimal repro skeleton (anonymized)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Bug 1 — `MINIMUM_CONTEXT_LENGTH` gate ignores the documented `model.context_length` config override

Bug 2 — Trailing `thinking` block in assistant turn poisons the session → every subsequent API call returns HTTP 400

Bug 3 — `config.yaml` `signal.enabled: false` is silently ignored when `SIGNAL_*` env vars are set