fix(context): handle JSON decode errors in context compression by 0xharryriddle · Pull Request #22248 · NousResearch/hermes-agent

0xharryriddle · 2026-05-09T03:37:43Z

What does this PR do?

Fixes context compression failure when the auxiliary LLM provider returns non-JSON responses (HTML, plain text, malformed JSON). The compression handler now explicitly catches json.JSONDecodeError, logs detailed diagnostics, and implements appropriate fallback behavior.

Why this approach:

Prevents loss of conversational context due to compression failures
Provides immediate fallback to main model when summary model returns invalid JSON
Reduces cooldown period for JSON decode errors on main model (30s vs 60s)
Maintains all existing error handling for other failure modes
Adds detailed logging to aid debugging of provider issues

Fixes context compression JSON decode errors: Expecting value: line X column Y (char Z)

Related Issue

Fixes #22244

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

agent/context_compressor.py: Added explicit isinstance(e, json.JSONDecodeError) check in _generate_summary exception handler, enhanced error logging with provider/model/base URL/position, implemented immediate fallback to main model, reduced cooldown to 30s for JSON decode errors on main model.

How to Test

Run the context compressor test suite:
```
scripts/run_tests.sh tests/agent/test_context_compressor.py -v
```
All 69 tests should pass.
(Optional) Manually verify by configuring an auxiliary model known to return non-JSON responses and trigger context compression; check logs for enhanced error messages.

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this fix/feature (no unrelated commits)
I've run scripts/run_tests.sh tests/agent/test_context_compressor.py and all tests pass
I've added tests for my changes (required for bug fixes, strongly encouraged for features) — existing tests cover the fix

Documentation & Housekeeping

I've updated relevant documentation (CHANGELOG.md) with the fix entry
I've updated AGENTS.md if I changed architecture or workflows — N/A (internal implementation detail)
I've considered cross-platform impact — change is pure Python, platform-independent
I've updated tool descriptions/schemas if I changed tool behavior — N/A
I've updated cli-config.yaml.example if I added/changed config keys — N/A

- Add explicit json.JSONDecodeError handling in context_compressor.py's _generate_summary method - Log detailed diagnostics (provider, model, base URL, error position) when auxiliary LLM returns non-JSON response - Fallback to main model immediately if separate summary model is configured - Reduce cooldown to 30s for JSON decode errors on main model - Preserve existing fallback logic for model not found, timeouts, and transient errors - Add CHANGELOG.md with fix entry

kshitijk4poor

Thanks for tackling this — the recursion bookkeeping is correct and the success path resets state cleanly. A few structural concerns before this lands.

Critical

No tests for the new branch. tests/agent/test_context_compressor.py has zero references to JSONDecodeError. The checklist marks "existing tests cover the fix", but none of them exercise the new path — neither the fallback-to-main, the recursion's _summary_model_fallen_back short-circuit, nor the 30-second cooldown branch. Please add at least one unit test that mocks call_llm to raise json.JSONDecodeError and asserts the fallback + cooldown behavior. This is the most important gap given that the trigger condition (provider returns Content-Type: application/json with a non-JSON body) is hard to reach in practice.

Warnings

Heavy duplication of the existing fallback-to-main pattern. The same if self.summary_model and self.summary_model != self.model and not _summary_model_fallen_back: ... summary_model = ""; _summary_failure_cooldown_until = 0.0; return self._generate_summary(...) block now appears three times in _generate_summary (the new JSONDecodeError branch, the existing 404/timeout branch, and the existing unknown-error best-effort branch). The cleaner landing is one of:
- Extract a small _fallback_to_main_model(e, reason) helper and let the three branches share it, or
- Detect JSON decode failures via the existing unknown-error branch (it would already catch this today) and just let JSON-decode errors set a shorter cooldown via a parameter.
  As written, the third copy of the pattern is maintenance debt without a clear functional separation.
Unrelated whitespace regression — please revert. The PR also changes two lines that have nothing to do with the fix:
```
-    self.summary_model = ""  # empty = use main model
+    self.summary_model = "" # empty = use main model
```
on what are now lines 990 and 1022. Two spaces before # is the PEP-8 / E261 form. Please drop these two hunks from the PR.
CHANGELOG.md is scope creep. The repo doesn't currently track a project-level CHANGELOG.md (releases are documented via RELEASE_v*.md). Adding a brand-new CHANGELOG.md inside a bug-fix PR introduces a convention that hasn't been agreed on. Please either drop the file or propose it in a separate documentation PR. The fix's release notes belong in the next RELEASE_v*.md, not a new top-level file.
self.provider may render as None in the error log. Line 916 logs self.provider directly, while self.base_url on line 919 uses or "default". self.provider defaults to None for some configurations and will print as Provider: None. Mirror the or "auto" / or "default" guard for consistency.

Suggestions

Reachability is real but narrow. call_llm doesn't parse JSON itself, so a raw json.JSONDecodeError only escapes when the underlying OpenAI SDK runs response.json() on a body that claims Content-Type: application/json but isn't valid JSON (misconfigured proxy 502 page, Cloudflare HTML challenge with the wrong content-type, malformed SSE frame). Common providers that obey the content-type contract won't hit this. If you want to also catch the more common case where a provider returns HTML with the correct content-type, consider widening the trigger to isinstance(e, json.JSONDecodeError) or "expecting value" in str(e).lower() — the OpenAI SDK can wrap the raw decode error inside APIResponseValidationError whose __cause__ is the JSONDecodeError but whose str(e) still carries the substring.
Truncate _last_aux_model_failure_error consistently. The existing branches truncate to 220 chars (lines 985-987, 1017-1019); the new branch sets f"Non-JSON response: {e}" without a cap. For very large bad bodies this can swell error logs and surface noisy errors via /usage. Apply the same 220-char cap.
Cooldown delta is small. 30s vs 60s won't change much in practice — the actual win, when reachable, is the immediate fallback to the main model, not the cooldown reduction. Worth keeping for symmetry but not the headline benefit.

Looks Good

Recursion does not self-deadlock: self._summary_failure_cooldown_until = 0.0 is set before the recursive call, so the cooldown gate at the top of _generate_summary allows the retry through.
The _summary_model_fallen_back = True sentinel correctly prevents an infinite recursion if the main model also returns invalid JSON — control falls through to the 30s cooldown branch on the second hit.
Success path resets both _summary_failure_cooldown_until and _summary_model_fallen_back so subsequent compactions start clean.
The 69 existing tests in tests/agent/test_context_compressor.py still pass against this PR.

Verdict: Request changes. The fix targets a real (if uncommon) failure mode, but it needs a unit test, the duplicated fallback should be consolidated, and the unrelated whitespace + new CHANGELOG.md should be dropped.

kshitijk4poor

Inline comments for the review above.

kshitijk4poor · 2026-05-09T08:46:37Z

Thanks for digging into this! Closing in favor of #22416, which builds on your work with a few structural changes:

Extracted _fallback_to_main_for_compression so the new JSON-decode branch shares one helper with the existing 404/timeout and unknown-error branches (instead of being a third copy of the same body).
Widened detection to also catch APIResponseValidationError-style wrapping via an "expecting value" substring match — the SDK doesn't always raise a raw JSONDecodeError.
Added unit tests for all three paths (raw decode, wrapped exception, 30s cooldown on main).
Dropped the new top-level CHANGELOG.md (the repo doesn't track one) and reverted the unrelated whitespace changes around the # empty = use main model comments.

Credit on the salvage commit. Closing #22244 via #22416.

@0xharryriddle

…22248 (#22416) When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON body with `Content-Type: application/json` — e.g. an HTML 502 page from a misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw `json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose message contains "expecting value"). Previously this fell through to the unknown-error branch and entered a 60s cooldown without retrying on the main model, dropping the middle conversation turns instead. This change folds JSON-decode detection into the existing fast-path fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring match for "expecting value", retry once on the main model, and use a shorter 30s cooldown when already on main (the body shape tends to flip back to valid quickly when the upstream proxy recovers). The three duplicated fallback bodies (model-not-found, unknown-error, JSON-decode) are consolidated into a single `_fallback_to_main_for_compression` helper that handles the shared bookkeeping (record aux-model failure for `/usage`-style callers, clear summary_model, clear cooldown). Also adds three unit tests covering: raw `JSONDecodeError` retries on main, substring-match for wrapped exceptions, and the 30s cooldown when already on main. Salvage of #22248 by @0xharryriddle. Closes #22244. Co-authored-by: Harry Riddle <ntconguit@gmail.com>

@0xharryriddle

…ousResearch#22248 (NousResearch#22416) When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON body with `Content-Type: application/json` — e.g. an HTML 502 page from a misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw `json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose message contains "expecting value"). Previously this fell through to the unknown-error branch and entered a 60s cooldown without retrying on the main model, dropping the middle conversation turns instead. This change folds JSON-decode detection into the existing fast-path fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring match for "expecting value", retry once on the main model, and use a shorter 30s cooldown when already on main (the body shape tends to flip back to valid quickly when the upstream proxy recovers). The three duplicated fallback bodies (model-not-found, unknown-error, JSON-decode) are consolidated into a single `_fallback_to_main_for_compression` helper that handles the shared bookkeeping (record aux-model failure for `/usage`-style callers, clear summary_model, clear cooldown). Also adds three unit tests covering: raw `JSONDecodeError` retries on main, substring-match for wrapped exceptions, and the 30s cooldown when already on main. Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244. Co-authored-by: Harry Riddle <ntconguit@gmail.com>

@0xharryriddle

…ousResearch#22248 (NousResearch#22416) When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON body with `Content-Type: application/json` — e.g. an HTML 502 page from a misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw `json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose message contains "expecting value"). Previously this fell through to the unknown-error branch and entered a 60s cooldown without retrying on the main model, dropping the middle conversation turns instead. This change folds JSON-decode detection into the existing fast-path fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring match for "expecting value", retry once on the main model, and use a shorter 30s cooldown when already on main (the body shape tends to flip back to valid quickly when the upstream proxy recovers). The three duplicated fallback bodies (model-not-found, unknown-error, JSON-decode) are consolidated into a single `_fallback_to_main_for_compression` helper that handles the shared bookkeeping (record aux-model failure for `/usage`-style callers, clear summary_model, clear cooldown). Also adds three unit tests covering: raw `JSONDecodeError` retries on main, substring-match for wrapped exceptions, and the 30s cooldown when already on main. Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244. Co-authored-by: Harry Riddle <ntconguit@gmail.com>

@0xharryriddle

…ousResearch#22248 (NousResearch#22416) When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON body with `Content-Type: application/json` — e.g. an HTML 502 page from a misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw `json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose message contains "expecting value"). Previously this fell through to the unknown-error branch and entered a 60s cooldown without retrying on the main model, dropping the middle conversation turns instead. This change folds JSON-decode detection into the existing fast-path fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring match for "expecting value", retry once on the main model, and use a shorter 30s cooldown when already on main (the body shape tends to flip back to valid quickly when the upstream proxy recovers). The three duplicated fallback bodies (model-not-found, unknown-error, JSON-decode) are consolidated into a single `_fallback_to_main_for_compression` helper that handles the shared bookkeeping (record aux-model failure for `/usage`-style callers, clear summary_model, clear cooldown). Also adds three unit tests covering: raw `JSONDecodeError` retries on main, substring-match for wrapped exceptions, and the 30s cooldown when already on main. Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244. Co-authored-by: Harry Riddle <ntconguit@gmail.com>

@0xharryriddle

…ousResearch#22248 (NousResearch#22416) When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON body with `Content-Type: application/json` — e.g. an HTML 502 page from a misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw `json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose message contains "expecting value"). Previously this fell through to the unknown-error branch and entered a 60s cooldown without retrying on the main model, dropping the middle conversation turns instead. This change folds JSON-decode detection into the existing fast-path fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring match for "expecting value", retry once on the main model, and use a shorter 30s cooldown when already on main (the body shape tends to flip back to valid quickly when the upstream proxy recovers). The three duplicated fallback bodies (model-not-found, unknown-error, JSON-decode) are consolidated into a single `_fallback_to_main_for_compression` helper that handles the shared bookkeeping (record aux-model failure for `/usage`-style callers, clear summary_model, clear cooldown). Also adds three unit tests covering: raw `JSONDecodeError` retries on main, substring-match for wrapped exceptions, and the 30s cooldown when already on main. Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244. Co-authored-by: Harry Riddle <ntconguit@gmail.com>

@0xharryriddle

…ousResearch#22248 (NousResearch#22416) When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON body with `Content-Type: application/json` — e.g. an HTML 502 page from a misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw `json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose message contains "expecting value"). Previously this fell through to the unknown-error branch and entered a 60s cooldown without retrying on the main model, dropping the middle conversation turns instead. This change folds JSON-decode detection into the existing fast-path fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring match for "expecting value", retry once on the main model, and use a shorter 30s cooldown when already on main (the body shape tends to flip back to valid quickly when the upstream proxy recovers). The three duplicated fallback bodies (model-not-found, unknown-error, JSON-decode) are consolidated into a single `_fallback_to_main_for_compression` helper that handles the shared bookkeeping (record aux-model failure for `/usage`-style callers, clear summary_model, clear cooldown). Also adds three unit tests covering: raw `JSONDecodeError` retries on main, substring-match for wrapped exceptions, and the 30s cooldown when already on main. Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244. Co-authored-by: Harry Riddle <ntconguit@gmail.com>

@0xharryriddle

…ousResearch#22248 (NousResearch#22416) When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON body with `Content-Type: application/json` — e.g. an HTML 502 page from a misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw `json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose message contains "expecting value"). Previously this fell through to the unknown-error branch and entered a 60s cooldown without retrying on the main model, dropping the middle conversation turns instead. This change folds JSON-decode detection into the existing fast-path fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring match for "expecting value", retry once on the main model, and use a shorter 30s cooldown when already on main (the body shape tends to flip back to valid quickly when the upstream proxy recovers). The three duplicated fallback bodies (model-not-found, unknown-error, JSON-decode) are consolidated into a single `_fallback_to_main_for_compression` helper that handles the shared bookkeeping (record aux-model failure for `/usage`-style callers, clear summary_model, clear cooldown). Also adds three unit tests covering: raw `JSONDecodeError` retries on main, substring-match for wrapped exceptions, and the 30s cooldown when already on main. Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244. Co-authored-by: Harry Riddle <ntconguit@gmail.com>

@0xharryriddle

…ousResearch#22248 (NousResearch#22416) When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON body with `Content-Type: application/json` — e.g. an HTML 502 page from a misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw `json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose message contains "expecting value"). Previously this fell through to the unknown-error branch and entered a 60s cooldown without retrying on the main model, dropping the middle conversation turns instead. This change folds JSON-decode detection into the existing fast-path fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring match for "expecting value", retry once on the main model, and use a shorter 30s cooldown when already on main (the body shape tends to flip back to valid quickly when the upstream proxy recovers). The three duplicated fallback bodies (model-not-found, unknown-error, JSON-decode) are consolidated into a single `_fallback_to_main_for_compression` helper that handles the shared bookkeeping (record aux-model failure for `/usage`-style callers, clear summary_model, clear cooldown). Also adds three unit tests covering: raw `JSONDecodeError` retries on main, substring-match for wrapped exceptions, and the 30s cooldown when already on main. Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244. Co-authored-by: Harry Riddle <ntconguit@gmail.com>

alt-glitch added type/bug Something isn't working comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists labels May 9, 2026

kshitijk4poor requested changes May 9, 2026

View reviewed changes

kshitijk4poor reviewed May 9, 2026

View reviewed changes

kshitijk4poor mentioned this pull request May 9, 2026

fix(context): handle JSON decode errors in compression — salvage of #22248 #22416

Merged

kshitijk4poor closed this May 9, 2026

bot-ted mentioned this pull request May 9, 2026

chore: sync with upstream main (2026-05-09) bot-ted/hermes-agent#25

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(context): handle JSON decode errors in context compression#22248

fix(context): handle JSON decode errors in context compression#22248
0xharryriddle wants to merge 1 commit into
NousResearch:mainfrom
0xharryriddle:fix/context-compression-failure

0xharryriddle commented May 9, 2026

Uh oh!

kshitijk4poor left a comment

Uh oh!

kshitijk4poor left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kshitijk4poor commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

0xharryriddle commented May 9, 2026

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Code

Documentation & Housekeeping

Uh oh!

kshitijk4poor left a comment

Choose a reason for hiding this comment

Critical

Warnings

Suggestions

Looks Good

Uh oh!

kshitijk4poor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kshitijk4poor commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants