Hi team — surfacing three bugs we hit running v0.9.0 (v2026.4.13, commit 1af2e18d) in production. All three are reproducible on a standard install; all three have local workarounds we've shipped, but the root cause lives upstream.
Environment
- Hermes v0.9.0 / v2026.4.13, commit
1af2e18d
- Running on Ubuntu 24.04, Python 3.11 venv
- Five Hermes instances (
prime on Anthropic direct, three Horsemen on Gemini / GPT-5.4 proxy / Grok, one JiMMY on local llama-server)
- Platforms: Telegram + Signal via signal-cli HTTP bridge
Bug 1 — MINIMUM_CONTEXT_LENGTH gate ignores the documented model.context_length config override
Commit that introduced it: c8aff7463 ("prevent agent from stopping mid-task", 2026-04-11) added the 64K minimum context guard at run_agent.py:1355-1366.
Symptom: Agents running against models with a native context window below 64K (e.g. Qwen 3.5 35B-A3B at 32K) fail at agent init with ValueError: context_length below MINIMUM_CONTEXT_LENGTH. Every hermes chat invocation fails in ~1s.
Why it's a bug: The error message explicitly documents an escape hatch:
"...or set model.context_length in config.yaml to override"
But the guard fires unconditionally — it doesn't check whether the operator set model.context_length in config.yaml. The escape hatch is cosmetic.
Minimal repro:
- Set
model.context_length: 32000 in config.yaml
- Run
hermes chat -q "hello" -Q
- Error: agent init fails despite the override
Our fix: run_agent.py:1358-1366, add and _config_context_length is None to the gate condition. _config_context_length is already a local in the same __init__.
from agent.model_metadata import MINIMUM_CONTEXT_LENGTH
_ctx = getattr(self.context_compressor, "context_length", 0)
if _ctx and _ctx < MINIMUM_CONTEXT_LENGTH and _config_context_length is None:
raise ValueError(...)
Impact on us: Every agent run against a sub-64K model broke at init. With the guard change, all Tier 1 evals pass against that model.
Bug 2 — Trailing thinking block in assistant turn poisons the session → every subsequent API call returns HTTP 400
Symptom: Once per session, typically after context compression or session truncation, an assistant turn is left with a thinking (or redacted_thinking) block as its final content block — no text or tool_use block after it. Anthropic's API then rejects every subsequent request with:
HTTP 400 - messages.N: The final block in an assistant message cannot be `thinking`.
The session is effectively poisoned until manually purged (move the .jsonl out of sessions/).
Relevant call site: agent/anthropic_adapter.py, convert_messages_to_anthropic(). The function already has extensive thinking-block signature management (strip/downgrade/merge), but doesn't enforce that the final block in an assistant message is non-thinking.
Reproduction: Hard to trigger deterministically — seems to correlate with thinking-mode interrupts and context compression. We have two captured request dumps where messages[5] and messages[9] each end on a lone thinking block that Anthropic rejects. An anonymized minimal repro skeleton (one of the dumps with all user text, system prompt, tools, and auth stripped) is included inline below (full raw dumps available privately on request).
How to detect in the wild: Any afflicted session .jsonl will have an assistant turn whose final content block is thinking or redacted_thinking. Grep each session file: if the last assistant content[-1].type is one of those, the session is poisoned.
Our fix: Added a guard at the end of convert_messages_to_anthropic(), after the existing thinking-block processing loop. If any assistant message's final block is still thinking/redacted_thinking, append a minimal text block:
for m in result:
if m.get("role") != "assistant" or not isinstance(m.get("content"), list) or not m["content"]:
continue
last_block = m["content"][-1]
if isinstance(last_block, dict) and last_block.get("type") in _THINKING_TYPES:
m["content"].append({"type": "text", "text": "(continuing)"})
Validated by replay: our captured 400-causing request now converts to a valid payload with messages[9] ending in the sentinel text block. Reasoning content above is preserved.
Root cause guess: Probably in agent/context_engine.py or the compressor — upstream mutation snips the trailing text/tool_use while leaving the thinking block in place. A proper fix would prevent creating the state, not just paper over it; our guard is a safety net.
Impact on us: Session corruption causes user-visible "Error code: 400" fallback messages (Prime emits them to the chat as a last-ditch reply). Every operator restart cost us a full conversation.
Bug 3 — config.yaml signal.enabled: false is silently ignored when SIGNAL_* env vars are set
Relevant code: gateway/config.py:820-830:
signal_url = os.getenv("SIGNAL_HTTP_URL")
signal_account = os.getenv("SIGNAL_ACCOUNT")
if signal_url and signal_account:
if Platform.SIGNAL not in config.platforms:
config.platforms[Platform.SIGNAL] = PlatformConfig()
config.platforms[Platform.SIGNAL].enabled = True # ← unconditional
...
Symptom: Setting signal.enabled: false in both the top-level signal: block and the platforms.signal: block of config.yaml is silently overwritten when both SIGNAL_HTTP_URL and SIGNAL_ACCOUNT env vars are present. The enabled = True assignment on line 827 clobbers the YAML flag with no log line or warning.
Impact on us: We wanted to disable Prime's Signal integration (to give ownership of the Signal reply path to a separate service). We set enabled: false in YAML and restarted — but Prime silently re-enabled Signal from the env vars. Diagnosing this took ~30 minutes of confusion. The fix was to comment out the env vars in .env; the YAML flag was never honored.
Suggested fix: Env vars should supply defaults when YAML is absent, not override explicit YAML values. Either:
- Check
config.platforms[Platform.SIGNAL].enabled before overwriting, OR
- Keep the YAML read as authoritative, and only use env vars to fill in missing fields (url/account) when the platform is already enabled in YAML
Same pattern likely exists for other platforms (mattermost, matrix, etc.) — any operator who uses config.yaml to disable a platform while env vars happen to be set will hit this silently.
Meta
All three bugs have local patches applied to our shared-codebase install. I'll open three separate PRs (one per bug) for atomic review unless you'd prefer them bundled — just say the word. Full raw request dumps for Bug 2 available privately on request.
Thanks for the great work on v0.9 — it's been a meaningful improvement over v0.8 across the board, these are the few rough edges we hit.
Appendix — Bug 2 minimal repro skeleton (anonymized)
Click to expand — 2.8 KB JSON, shape-only, no user data
{
"__note__": "Anonymized minimal repro skeleton for Hermes v0.9.0 Bug 2 (trailing thinking block \u2192 HTTP 400). Derived from a real production request dump; all user content, system prompt, tool definitions, phone numbers, and auth headers stripped. Shows only the message-block *shape* that Anthropic rejects.",
"hermes_version": "v0.9.0 (v2026.4.13)",
"hermes_commit": "1af2e18d",
"session_id": "<redacted>",
"anthropic_error": {
"status_code": 400,
"type": "invalid_request_error",
"message": "messages.9: The final block in an assistant message cannot be `thinking`.",
"request_id": "req_011Ca7V8wAqYKB785YXoudTE"
},
"request_skeleton": {
"model": "claude-sonnet-4-6",
"max_tokens": 64000,
"thinking_enabled": true,
"system_blocks": 2,
"tools_count": 41,
"messages": [
{
"role": "user",
"content_type": "string",
"content_length_chars": 150
},
{
"role": "assistant",
"content_blocks": [
{
"type": "text",
"text_length_chars": 67
}
]
},
{
"role": "user",
"content_type": "string",
"content_length_chars": 156
},
{
"role": "assistant",
"content_blocks": [
{
"type": "text",
"text_length_chars": 49
}
]
},
{
"role": "user",
"content_type": "string",
"content_length_chars": 158
},
{
"role": "assistant",
"content_blocks": [
{
"type": "text",
"text_length_chars": 78
},
{
"type": "text",
"text_length_chars": 74
}
]
},
{
"role": "user",
"content_type": "string",
"content_length_chars": 157
},
{
"role": "assistant",
"content_blocks": [
{
"type": "text",
"text_length_chars": 93
}
]
},
{
"role": "user",
"content_type": "string",
"content_length_chars": 15
},
{
"role": "assistant",
"content_blocks": [
{
"type": "thinking",
"signature_present": true,
"thinking_length_chars": 38
}
]
}
]
},
"diagnosis": "messages[9] is an assistant turn whose content is a single `thinking` block with no trailing `text` or `tool_use` block. Anthropic's API rejects the whole request. Every subsequent call in the session fails identically until the .jsonl is manually removed from sessions/. Detect in the wild: grep each session .jsonl for any assistant turn whose final content block is `thinking` or `redacted_thinking`."
}
Hi team — surfacing three bugs we hit running v0.9.0 (
v2026.4.13, commit1af2e18d) in production. All three are reproducible on a standard install; all three have local workarounds we've shipped, but the root cause lives upstream.Environment
1af2e18dprimeon Anthropic direct, three Horsemen on Gemini / GPT-5.4 proxy / Grok, one JiMMY on local llama-server)Bug 1 —
MINIMUM_CONTEXT_LENGTHgate ignores the documentedmodel.context_lengthconfig overrideCommit that introduced it:
c8aff7463("prevent agent from stopping mid-task", 2026-04-11) added the 64K minimum context guard atrun_agent.py:1355-1366.Symptom: Agents running against models with a native context window below 64K (e.g. Qwen 3.5 35B-A3B at 32K) fail at agent init with
ValueError: context_length below MINIMUM_CONTEXT_LENGTH. Everyhermes chatinvocation fails in ~1s.Why it's a bug: The error message explicitly documents an escape hatch:
But the guard fires unconditionally — it doesn't check whether the operator set
model.context_lengthinconfig.yaml. The escape hatch is cosmetic.Minimal repro:
model.context_length: 32000inconfig.yamlhermes chat -q "hello" -QOur fix:
run_agent.py:1358-1366, addand _config_context_length is Noneto the gate condition._config_context_lengthis already a local in the same__init__.Impact on us: Every agent run against a sub-64K model broke at init. With the guard change, all Tier 1 evals pass against that model.
Bug 2 — Trailing
thinkingblock in assistant turn poisons the session → every subsequent API call returns HTTP 400Symptom: Once per session, typically after context compression or session truncation, an assistant turn is left with a
thinking(orredacted_thinking) block as its final content block — no text or tool_use block after it. Anthropic's API then rejects every subsequent request with:The session is effectively poisoned until manually purged (move the
.jsonlout ofsessions/).Relevant call site:
agent/anthropic_adapter.py,convert_messages_to_anthropic(). The function already has extensive thinking-block signature management (strip/downgrade/merge), but doesn't enforce that the final block in an assistant message is non-thinking.Reproduction: Hard to trigger deterministically — seems to correlate with thinking-mode interrupts and context compression. We have two captured request dumps where
messages[5]andmessages[9]each end on a lonethinkingblock that Anthropic rejects. An anonymized minimal repro skeleton (one of the dumps with all user text, system prompt, tools, and auth stripped) is included inline below (full raw dumps available privately on request).How to detect in the wild: Any afflicted session
.jsonlwill have an assistant turn whose final content block isthinkingorredacted_thinking. Grep each session file: if the last assistantcontent[-1].typeis one of those, the session is poisoned.Our fix: Added a guard at the end of
convert_messages_to_anthropic(), after the existing thinking-block processing loop. If any assistant message's final block is stillthinking/redacted_thinking, append a minimal text block:Validated by replay: our captured 400-causing request now converts to a valid payload with
messages[9]ending in the sentinel text block. Reasoning content above is preserved.Root cause guess: Probably in
agent/context_engine.pyor the compressor — upstream mutation snips the trailing text/tool_use while leaving the thinking block in place. A proper fix would prevent creating the state, not just paper over it; our guard is a safety net.Impact on us: Session corruption causes user-visible "Error code: 400" fallback messages (Prime emits them to the chat as a last-ditch reply). Every operator restart cost us a full conversation.
Bug 3 —
config.yamlsignal.enabled: falseis silently ignored whenSIGNAL_*env vars are setRelevant code:
gateway/config.py:820-830:Symptom: Setting
signal.enabled: falsein both the top-levelsignal:block and theplatforms.signal:block ofconfig.yamlis silently overwritten when bothSIGNAL_HTTP_URLandSIGNAL_ACCOUNTenv vars are present. Theenabled = Trueassignment on line 827 clobbers the YAML flag with no log line or warning.Impact on us: We wanted to disable Prime's Signal integration (to give ownership of the Signal reply path to a separate service). We set
enabled: falsein YAML and restarted — but Prime silently re-enabled Signal from the env vars. Diagnosing this took ~30 minutes of confusion. The fix was to comment out the env vars in.env; the YAML flag was never honored.Suggested fix: Env vars should supply defaults when YAML is absent, not override explicit YAML values. Either:
config.platforms[Platform.SIGNAL].enabledbefore overwriting, ORSame pattern likely exists for other platforms (mattermost, matrix, etc.) — any operator who uses
config.yamlto disable a platform while env vars happen to be set will hit this silently.Meta
All three bugs have local patches applied to our shared-codebase install. I'll open three separate PRs (one per bug) for atomic review unless you'd prefer them bundled — just say the word. Full raw request dumps for Bug 2 available privately on request.
Thanks for the great work on v0.9 — it's been a meaningful improvement over v0.8 across the board, these are the few rough edges we hit.
Appendix — Bug 2 minimal repro skeleton (anonymized)
Click to expand — 2.8 KB JSON, shape-only, no user data
{ "__note__": "Anonymized minimal repro skeleton for Hermes v0.9.0 Bug 2 (trailing thinking block \u2192 HTTP 400). Derived from a real production request dump; all user content, system prompt, tool definitions, phone numbers, and auth headers stripped. Shows only the message-block *shape* that Anthropic rejects.", "hermes_version": "v0.9.0 (v2026.4.13)", "hermes_commit": "1af2e18d", "session_id": "<redacted>", "anthropic_error": { "status_code": 400, "type": "invalid_request_error", "message": "messages.9: The final block in an assistant message cannot be `thinking`.", "request_id": "req_011Ca7V8wAqYKB785YXoudTE" }, "request_skeleton": { "model": "claude-sonnet-4-6", "max_tokens": 64000, "thinking_enabled": true, "system_blocks": 2, "tools_count": 41, "messages": [ { "role": "user", "content_type": "string", "content_length_chars": 150 }, { "role": "assistant", "content_blocks": [ { "type": "text", "text_length_chars": 67 } ] }, { "role": "user", "content_type": "string", "content_length_chars": 156 }, { "role": "assistant", "content_blocks": [ { "type": "text", "text_length_chars": 49 } ] }, { "role": "user", "content_type": "string", "content_length_chars": 158 }, { "role": "assistant", "content_blocks": [ { "type": "text", "text_length_chars": 78 }, { "type": "text", "text_length_chars": 74 } ] }, { "role": "user", "content_type": "string", "content_length_chars": 157 }, { "role": "assistant", "content_blocks": [ { "type": "text", "text_length_chars": 93 } ] }, { "role": "user", "content_type": "string", "content_length_chars": 15 }, { "role": "assistant", "content_blocks": [ { "type": "thinking", "signature_present": true, "thinking_length_chars": 38 } ] } ] }, "diagnosis": "messages[9] is an assistant turn whose content is a single `thinking` block with no trailing `text` or `tool_use` block. Anthropic's API rejects the whole request. Every subsequent call in the session fails identically until the .jsonl is manually removed from sessions/. Detect in the wild: grep each session .jsonl for any assistant turn whose final content block is `thinking` or `redacted_thinking`." }