fix(context): handle JSON decode errors in compression — salvage of #22248 by kshitijk4poor · Pull Request #22416 · NousResearch/hermes-agent

kshitijk4poor · 2026-05-09T08:46:26Z

Salvage of #22248 — fixes #22244

Closes #22244. Salvages PR #22248 by @0xharryriddle with the structural fixes from review.

Problem

When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON body with Content-Type: application/json — e.g. an HTML 502 page from a misconfigured gateway — the OpenAI SDK's response.json() raises a raw json.JSONDecodeError (or wraps it in APIResponseValidationError whose message contains "expecting value"). Previously this fell through to the unknown-error branch and entered a 60s cooldown without retrying on the main model, dropping the middle conversation turns instead.

Reported in #22244, with the canonical signature Failed to generate context summary: Expecting value: line X column Y (char Z).

Approach

Fold JSON-decode detection into the existing fast-path fallback chain (the same path that handles 404/timeout):

Detect by isinstance(e, JSONDecodeError) or substring match for "expecting value" — covers both raw decode errors and the SDK-wrapped APIResponseValidationError form.
Retry once on the main model when a separate summary_model is configured.
Use a shorter 30s cooldown when already on main (the body shape tends to flip back to valid quickly when the upstream proxy recovers); 60s for everything else.

The three duplicated fallback bodies (model-not-found, unknown-error, JSON-decode) collapse into one _fallback_to_main_for_compression(e, reason) helper that owns the shared bookkeeping (record aux-model failure for /usage-style callers, clear summary_model, clear cooldown).

Changes vs. the original PR

	Original #22248	This salvage
Detection	`isinstance(e, JSONDecodeError)` only	`isinstance` + `"expecting value"` substring (catches SDK-wrapped form)
Code shape	Third copy of the fallback-to-main body	Extracted `_fallback_to_main_for_compression` helper used by all three branches
Tests	None	3 new tests (raw `JSONDecodeError`, wrapped-exception substring, 30s cooldown on main)
`provider=None` rendering	Logged as `Provider: None`	`self.provider or "auto"` (mirrors `base_url or "default"`)
`_last_aux_model_failure_error` cap	Uncapped `"Non-JSON response: {e}"`	Shared 220-char cap via the helper
Whitespace regressions	Two `# empty` comments lost a space (PEP-8 E261)	Reverted
`CHANGELOG.md` (new top-level file)	Added	Dropped — repo doesn't track a project-level changelog

How to test

scripts/run_tests.sh tests/agent/test_context_compressor.py

72 passed (69 original + 3 new) on this branch.

@0xharryriddle

…ousResearch#22248 When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON body with `Content-Type: application/json` — e.g. an HTML 502 page from a misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw `json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose message contains "expecting value"). Previously this fell through to the unknown-error branch and entered a 60s cooldown without retrying on the main model, dropping the middle conversation turns instead. This change folds JSON-decode detection into the existing fast-path fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring match for "expecting value", retry once on the main model, and use a shorter 30s cooldown when already on main (the body shape tends to flip back to valid quickly when the upstream proxy recovers). The three duplicated fallback bodies (model-not-found, unknown-error, JSON-decode) are consolidated into a single `_fallback_to_main_for_compression` helper that handles the shared bookkeeping (record aux-model failure for `/usage`-style callers, clear summary_model, clear cooldown). Also adds three unit tests covering: raw `JSONDecodeError` retries on main, substring-match for wrapped exceptions, and the 30s cooldown when already on main. Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244. Co-authored-by: Harry Riddle <ntconguit@gmail.com>

@0xharryriddle

…ousResearch#22248 (NousResearch#22416) When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON body with `Content-Type: application/json` — e.g. an HTML 502 page from a misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw `json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose message contains "expecting value"). Previously this fell through to the unknown-error branch and entered a 60s cooldown without retrying on the main model, dropping the middle conversation turns instead. This change folds JSON-decode detection into the existing fast-path fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring match for "expecting value", retry once on the main model, and use a shorter 30s cooldown when already on main (the body shape tends to flip back to valid quickly when the upstream proxy recovers). The three duplicated fallback bodies (model-not-found, unknown-error, JSON-decode) are consolidated into a single `_fallback_to_main_for_compression` helper that handles the shared bookkeeping (record aux-model failure for `/usage`-style callers, clear summary_model, clear cooldown). Also adds three unit tests covering: raw `JSONDecodeError` retries on main, substring-match for wrapped exceptions, and the 30s cooldown when already on main. Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244. Co-authored-by: Harry Riddle <ntconguit@gmail.com>

@0xharryriddle

…ousResearch#22248 (NousResearch#22416) When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON body with `Content-Type: application/json` — e.g. an HTML 502 page from a misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw `json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose message contains "expecting value"). Previously this fell through to the unknown-error branch and entered a 60s cooldown without retrying on the main model, dropping the middle conversation turns instead. This change folds JSON-decode detection into the existing fast-path fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring match for "expecting value", retry once on the main model, and use a shorter 30s cooldown when already on main (the body shape tends to flip back to valid quickly when the upstream proxy recovers). The three duplicated fallback bodies (model-not-found, unknown-error, JSON-decode) are consolidated into a single `_fallback_to_main_for_compression` helper that handles the shared bookkeeping (record aux-model failure for `/usage`-style callers, clear summary_model, clear cooldown). Also adds three unit tests covering: raw `JSONDecodeError` retries on main, substring-match for wrapped exceptions, and the 30s cooldown when already on main. Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244. Co-authored-by: Harry Riddle <ntconguit@gmail.com>

@0xharryriddle

…ousResearch#22248 (NousResearch#22416) When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON body with `Content-Type: application/json` — e.g. an HTML 502 page from a misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw `json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose message contains "expecting value"). Previously this fell through to the unknown-error branch and entered a 60s cooldown without retrying on the main model, dropping the middle conversation turns instead. This change folds JSON-decode detection into the existing fast-path fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring match for "expecting value", retry once on the main model, and use a shorter 30s cooldown when already on main (the body shape tends to flip back to valid quickly when the upstream proxy recovers). The three duplicated fallback bodies (model-not-found, unknown-error, JSON-decode) are consolidated into a single `_fallback_to_main_for_compression` helper that handles the shared bookkeeping (record aux-model failure for `/usage`-style callers, clear summary_model, clear cooldown). Also adds three unit tests covering: raw `JSONDecodeError` retries on main, substring-match for wrapped exceptions, and the 30s cooldown when already on main. Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244. Co-authored-by: Harry Riddle <ntconguit@gmail.com>

@0xharryriddle

…ousResearch#22248 (NousResearch#22416) When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON body with `Content-Type: application/json` — e.g. an HTML 502 page from a misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw `json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose message contains "expecting value"). Previously this fell through to the unknown-error branch and entered a 60s cooldown without retrying on the main model, dropping the middle conversation turns instead. This change folds JSON-decode detection into the existing fast-path fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring match for "expecting value", retry once on the main model, and use a shorter 30s cooldown when already on main (the body shape tends to flip back to valid quickly when the upstream proxy recovers). The three duplicated fallback bodies (model-not-found, unknown-error, JSON-decode) are consolidated into a single `_fallback_to_main_for_compression` helper that handles the shared bookkeeping (record aux-model failure for `/usage`-style callers, clear summary_model, clear cooldown). Also adds three unit tests covering: raw `JSONDecodeError` retries on main, substring-match for wrapped exceptions, and the 30s cooldown when already on main. Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244. Co-authored-by: Harry Riddle <ntconguit@gmail.com>

@0xharryriddle

…ousResearch#22248 (NousResearch#22416) When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON body with `Content-Type: application/json` — e.g. an HTML 502 page from a misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw `json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose message contains "expecting value"). Previously this fell through to the unknown-error branch and entered a 60s cooldown without retrying on the main model, dropping the middle conversation turns instead. This change folds JSON-decode detection into the existing fast-path fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring match for "expecting value", retry once on the main model, and use a shorter 30s cooldown when already on main (the body shape tends to flip back to valid quickly when the upstream proxy recovers). The three duplicated fallback bodies (model-not-found, unknown-error, JSON-decode) are consolidated into a single `_fallback_to_main_for_compression` helper that handles the shared bookkeeping (record aux-model failure for `/usage`-style callers, clear summary_model, clear cooldown). Also adds three unit tests covering: raw `JSONDecodeError` retries on main, substring-match for wrapped exceptions, and the 30s cooldown when already on main. Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244. Co-authored-by: Harry Riddle <ntconguit@gmail.com>

@0xharryriddle

…ousResearch#22248 (NousResearch#22416) When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON body with `Content-Type: application/json` — e.g. an HTML 502 page from a misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw `json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose message contains "expecting value"). Previously this fell through to the unknown-error branch and entered a 60s cooldown without retrying on the main model, dropping the middle conversation turns instead. This change folds JSON-decode detection into the existing fast-path fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring match for "expecting value", retry once on the main model, and use a shorter 30s cooldown when already on main (the body shape tends to flip back to valid quickly when the upstream proxy recovers). The three duplicated fallback bodies (model-not-found, unknown-error, JSON-decode) are consolidated into a single `_fallback_to_main_for_compression` helper that handles the shared bookkeeping (record aux-model failure for `/usage`-style callers, clear summary_model, clear cooldown). Also adds three unit tests covering: raw `JSONDecodeError` retries on main, substring-match for wrapped exceptions, and the 30s cooldown when already on main. Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244. Co-authored-by: Harry Riddle <ntconguit@gmail.com>

@0xharryriddle

…ousResearch#22248 (NousResearch#22416) When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON body with `Content-Type: application/json` — e.g. an HTML 502 page from a misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw `json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose message contains "expecting value"). Previously this fell through to the unknown-error branch and entered a 60s cooldown without retrying on the main model, dropping the middle conversation turns instead. This change folds JSON-decode detection into the existing fast-path fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring match for "expecting value", retry once on the main model, and use a shorter 30s cooldown when already on main (the body shape tends to flip back to valid quickly when the upstream proxy recovers). The three duplicated fallback bodies (model-not-found, unknown-error, JSON-decode) are consolidated into a single `_fallback_to_main_for_compression` helper that handles the shared bookkeeping (record aux-model failure for `/usage`-style callers, clear summary_model, clear cooldown). Also adds three unit tests covering: raw `JSONDecodeError` retries on main, substring-match for wrapped exceptions, and the 30s cooldown when already on main. Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244. Co-authored-by: Harry Riddle <ntconguit@gmail.com>

kshitijk4poor mentioned this pull request May 9, 2026

fix(context): handle JSON decode errors in context compression #22248

Closed

12 tasks

kshitijk4poor merged commit c7e8add into NousResearch:main May 9, 2026
9 of 11 checks passed

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder labels May 9, 2026

bot-ted mentioned this pull request May 9, 2026

chore: sync with upstream main (2026-05-09) bot-ted/hermes-agent#25

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(context): handle JSON decode errors in compression — salvage of #22248#22416

fix(context): handle JSON decode errors in compression — salvage of #22248#22416
kshitijk4poor merged 1 commit into
NousResearch:mainfrom
kshitijk4poor:salvage/pr-22248-json-decode

kshitijk4poor commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kshitijk4poor commented May 9, 2026

Salvage of #22248 — fixes #22244

Problem

Approach

Changes vs. the original PR

How to test

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants