Skip to content

fix(context): handle JSON decode errors in compression — salvage of #22248#22416

Merged
kshitijk4poor merged 1 commit into
NousResearch:mainfrom
kshitijk4poor:salvage/pr-22248-json-decode
May 9, 2026
Merged

fix(context): handle JSON decode errors in compression — salvage of #22248#22416
kshitijk4poor merged 1 commit into
NousResearch:mainfrom
kshitijk4poor:salvage/pr-22248-json-decode

Conversation

@kshitijk4poor

Copy link
Copy Markdown
Collaborator

Salvage of #22248fixes #22244

Closes #22244. Salvages PR #22248 by @0xharryriddle with the structural fixes from review.

Problem

When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON body with Content-Type: application/json — e.g. an HTML 502 page from a misconfigured gateway — the OpenAI SDK's response.json() raises a raw json.JSONDecodeError (or wraps it in APIResponseValidationError whose message contains "expecting value"). Previously this fell through to the unknown-error branch and entered a 60s cooldown without retrying on the main model, dropping the middle conversation turns instead.

Reported in #22244, with the canonical signature Failed to generate context summary: Expecting value: line X column Y (char Z).

Approach

Fold JSON-decode detection into the existing fast-path fallback chain (the same path that handles 404/timeout):

  • Detect by isinstance(e, JSONDecodeError) or substring match for "expecting value" — covers both raw decode errors and the SDK-wrapped APIResponseValidationError form.
  • Retry once on the main model when a separate summary_model is configured.
  • Use a shorter 30s cooldown when already on main (the body shape tends to flip back to valid quickly when the upstream proxy recovers); 60s for everything else.

The three duplicated fallback bodies (model-not-found, unknown-error, JSON-decode) collapse into one _fallback_to_main_for_compression(e, reason) helper that owns the shared bookkeeping (record aux-model failure for /usage-style callers, clear summary_model, clear cooldown).

Changes vs. the original PR

Original #22248 This salvage
Detection isinstance(e, JSONDecodeError) only isinstance + "expecting value" substring (catches SDK-wrapped form)
Code shape Third copy of the fallback-to-main body Extracted _fallback_to_main_for_compression helper used by all three branches
Tests None 3 new tests (raw JSONDecodeError, wrapped-exception substring, 30s cooldown on main)
provider=None rendering Logged as Provider: None self.provider or "auto" (mirrors base_url or "default")
_last_aux_model_failure_error cap Uncapped "Non-JSON response: {e}" Shared 220-char cap via the helper
Whitespace regressions Two # empty comments lost a space (PEP-8 E261) Reverted
CHANGELOG.md (new top-level file) Added Dropped — repo doesn't track a project-level changelog

How to test

scripts/run_tests.sh tests/agent/test_context_compressor.py

72 passed (69 original + 3 new) on this branch.

…ousResearch#22248

When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON
body with `Content-Type: application/json` — e.g. an HTML 502 page from a
misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw
`json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose
message contains "expecting value"). Previously this fell through to the
unknown-error branch and entered a 60s cooldown without retrying on the
main model, dropping the middle conversation turns instead.

This change folds JSON-decode detection into the existing fast-path
fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring
match for "expecting value", retry once on the main model, and use a
shorter 30s cooldown when already on main (the body shape tends to flip
back to valid quickly when the upstream proxy recovers).

The three duplicated fallback bodies (model-not-found, unknown-error,
JSON-decode) are consolidated into a single `_fallback_to_main_for_compression`
helper that handles the shared bookkeeping (record aux-model failure for
`/usage`-style callers, clear summary_model, clear cooldown).

Also adds three unit tests covering: raw `JSONDecodeError` retries on main,
substring-match for wrapped exceptions, and the 30s cooldown when already
on main.

Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244.

Co-authored-by: Harry Riddle <ntconguit@gmail.com>
@kshitijk4poor kshitijk4poor merged commit c7e8add into NousResearch:main May 9, 2026
9 of 11 checks passed
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder labels May 9, 2026
JZKK720 pushed a commit to JZKK720/hermes-agent that referenced this pull request May 11, 2026
…ousResearch#22248 (NousResearch#22416)

When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON
body with `Content-Type: application/json` — e.g. an HTML 502 page from a
misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw
`json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose
message contains "expecting value"). Previously this fell through to the
unknown-error branch and entered a 60s cooldown without retrying on the
main model, dropping the middle conversation turns instead.

This change folds JSON-decode detection into the existing fast-path
fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring
match for "expecting value", retry once on the main model, and use a
shorter 30s cooldown when already on main (the body shape tends to flip
back to valid quickly when the upstream proxy recovers).

The three duplicated fallback bodies (model-not-found, unknown-error,
JSON-decode) are consolidated into a single `_fallback_to_main_for_compression`
helper that handles the shared bookkeeping (record aux-model failure for
`/usage`-style callers, clear summary_model, clear cooldown).

Also adds three unit tests covering: raw `JSONDecodeError` retries on main,
substring-match for wrapped exceptions, and the 30s cooldown when already
on main.

Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244.

Co-authored-by: Harry Riddle <ntconguit@gmail.com>
rmulligan pushed a commit to rmulligan/hermes-agent that referenced this pull request May 11, 2026
…ousResearch#22248 (NousResearch#22416)

When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON
body with `Content-Type: application/json` — e.g. an HTML 502 page from a
misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw
`json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose
message contains "expecting value"). Previously this fell through to the
unknown-error branch and entered a 60s cooldown without retrying on the
main model, dropping the middle conversation turns instead.

This change folds JSON-decode detection into the existing fast-path
fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring
match for "expecting value", retry once on the main model, and use a
shorter 30s cooldown when already on main (the body shape tends to flip
back to valid quickly when the upstream proxy recovers).

The three duplicated fallback bodies (model-not-found, unknown-error,
JSON-decode) are consolidated into a single `_fallback_to_main_for_compression`
helper that handles the shared bookkeeping (record aux-model failure for
`/usage`-style callers, clear summary_model, clear cooldown).

Also adds three unit tests covering: raw `JSONDecodeError` retries on main,
substring-match for wrapped exceptions, and the 30s cooldown when already
on main.

Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244.

Co-authored-by: Harry Riddle <ntconguit@gmail.com>
JinyuID pushed a commit to JinyuID/hermes-agent that referenced this pull request May 11, 2026
…ousResearch#22248 (NousResearch#22416)

When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON
body with `Content-Type: application/json` — e.g. an HTML 502 page from a
misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw
`json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose
message contains "expecting value"). Previously this fell through to the
unknown-error branch and entered a 60s cooldown without retrying on the
main model, dropping the middle conversation turns instead.

This change folds JSON-decode detection into the existing fast-path
fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring
match for "expecting value", retry once on the main model, and use a
shorter 30s cooldown when already on main (the body shape tends to flip
back to valid quickly when the upstream proxy recovers).

The three duplicated fallback bodies (model-not-found, unknown-error,
JSON-decode) are consolidated into a single `_fallback_to_main_for_compression`
helper that handles the shared bookkeeping (record aux-model failure for
`/usage`-style callers, clear summary_model, clear cooldown).

Also adds three unit tests covering: raw `JSONDecodeError` retries on main,
substring-match for wrapped exceptions, and the 30s cooldown when already
on main.

Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244.

Co-authored-by: Harry Riddle <ntconguit@gmail.com>
jsboige pushed a commit to jsboige/hermes-agent that referenced this pull request May 14, 2026
…ousResearch#22248 (NousResearch#22416)

When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON
body with `Content-Type: application/json` — e.g. an HTML 502 page from a
misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw
`json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose
message contains "expecting value"). Previously this fell through to the
unknown-error branch and entered a 60s cooldown without retrying on the
main model, dropping the middle conversation turns instead.

This change folds JSON-decode detection into the existing fast-path
fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring
match for "expecting value", retry once on the main model, and use a
shorter 30s cooldown when already on main (the body shape tends to flip
back to valid quickly when the upstream proxy recovers).

The three duplicated fallback bodies (model-not-found, unknown-error,
JSON-decode) are consolidated into a single `_fallback_to_main_for_compression`
helper that handles the shared bookkeeping (record aux-model failure for
`/usage`-style callers, clear summary_model, clear cooldown).

Also adds three unit tests covering: raw `JSONDecodeError` retries on main,
substring-match for wrapped exceptions, and the 30s cooldown when already
on main.

Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244.

Co-authored-by: Harry Riddle <ntconguit@gmail.com>
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request May 25, 2026
…ousResearch#22248 (NousResearch#22416)

When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON
body with `Content-Type: application/json` — e.g. an HTML 502 page from a
misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw
`json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose
message contains "expecting value"). Previously this fell through to the
unknown-error branch and entered a 60s cooldown without retrying on the
main model, dropping the middle conversation turns instead.

This change folds JSON-decode detection into the existing fast-path
fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring
match for "expecting value", retry once on the main model, and use a
shorter 30s cooldown when already on main (the body shape tends to flip
back to valid quickly when the upstream proxy recovers).

The three duplicated fallback bodies (model-not-found, unknown-error,
JSON-decode) are consolidated into a single `_fallback_to_main_for_compression`
helper that handles the shared bookkeeping (record aux-model failure for
`/usage`-style callers, clear summary_model, clear cooldown).

Also adds three unit tests covering: raw `JSONDecodeError` retries on main,
substring-match for wrapped exceptions, and the 30s cooldown when already
on main.

Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244.

Co-authored-by: Harry Riddle <ntconguit@gmail.com>
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…ousResearch#22248 (NousResearch#22416)

When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON
body with `Content-Type: application/json` — e.g. an HTML 502 page from a
misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw
`json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose
message contains "expecting value"). Previously this fell through to the
unknown-error branch and entered a 60s cooldown without retrying on the
main model, dropping the middle conversation turns instead.

This change folds JSON-decode detection into the existing fast-path
fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring
match for "expecting value", retry once on the main model, and use a
shorter 30s cooldown when already on main (the body shape tends to flip
back to valid quickly when the upstream proxy recovers).

The three duplicated fallback bodies (model-not-found, unknown-error,
JSON-decode) are consolidated into a single `_fallback_to_main_for_compression`
helper that handles the shared bookkeeping (record aux-model failure for
`/usage`-style callers, clear summary_model, clear cooldown).

Also adds three unit tests covering: raw `JSONDecodeError` retries on main,
substring-match for wrapped exceptions, and the 30s cooldown when already
on main.

Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244.

Co-authored-by: Harry Riddle <ntconguit@gmail.com>
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
…ousResearch#22248 (NousResearch#22416)

When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON
body with `Content-Type: application/json` — e.g. an HTML 502 page from a
misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw
`json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose
message contains "expecting value"). Previously this fell through to the
unknown-error branch and entered a 60s cooldown without retrying on the
main model, dropping the middle conversation turns instead.

This change folds JSON-decode detection into the existing fast-path
fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring
match for "expecting value", retry once on the main model, and use a
shorter 30s cooldown when already on main (the body shape tends to flip
back to valid quickly when the upstream proxy recovers).

The three duplicated fallback bodies (model-not-found, unknown-error,
JSON-decode) are consolidated into a single `_fallback_to_main_for_compression`
helper that handles the shared bookkeeping (record aux-model failure for
`/usage`-style callers, clear summary_model, clear cooldown).

Also adds three unit tests covering: raw `JSONDecodeError` retries on main,
substring-match for wrapped exceptions, and the 30s cooldown when already
on main.

Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244.

Co-authored-by: Harry Riddle <ntconguit@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Context Compression Failure

2 participants