Skip to content

fix(context): handle JSON decode errors in context compression#22248

Closed
0xharryriddle wants to merge 1 commit into
NousResearch:mainfrom
0xharryriddle:fix/context-compression-failure
Closed

fix(context): handle JSON decode errors in context compression#22248
0xharryriddle wants to merge 1 commit into
NousResearch:mainfrom
0xharryriddle:fix/context-compression-failure

Conversation

@0xharryriddle

Copy link
Copy Markdown
Contributor

What does this PR do?

Fixes context compression failure when the auxiliary LLM provider returns non-JSON responses (HTML, plain text, malformed JSON). The compression handler now explicitly catches json.JSONDecodeError, logs detailed diagnostics, and implements appropriate fallback behavior.

Why this approach:

  • Prevents loss of conversational context due to compression failures
  • Provides immediate fallback to main model when summary model returns invalid JSON
  • Reduces cooldown period for JSON decode errors on main model (30s vs 60s)
  • Maintains all existing error handling for other failure modes
  • Adds detailed logging to aid debugging of provider issues

Fixes context compression JSON decode errors: Expecting value: line X column Y (char Z)

Related Issue

Fixes #22244

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

  • agent/context_compressor.py: Added explicit isinstance(e, json.JSONDecodeError) check in _generate_summary exception handler, enhanced error logging with provider/model/base URL/position, implemented immediate fallback to main model, reduced cooldown to 30s for JSON decode errors on main model.

How to Test

  1. Run the context compressor test suite:

    scripts/run_tests.sh tests/agent/test_context_compressor.py -v

    All 69 tests should pass.

  2. (Optional) Manually verify by configuring an auxiliary model known to return non-JSON responses and trigger context compression; check logs for enhanced error messages.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run scripts/run_tests.sh tests/agent/test_context_compressor.py and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features) — existing tests cover the fix

Documentation & Housekeeping

  • I've updated relevant documentation (CHANGELOG.md) with the fix entry
  • I've updated AGENTS.md if I changed architecture or workflows — N/A (internal implementation detail)
  • I've considered cross-platform impact — change is pure Python, platform-independent
  • I've updated tool descriptions/schemas if I changed tool behavior — N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — N/A

- Add explicit json.JSONDecodeError handling in context_compressor.py's _generate_summary method
- Log detailed diagnostics (provider, model, base URL, error position) when auxiliary LLM returns non-JSON response
- Fallback to main model immediately if separate summary model is configured
- Reduce cooldown to 30s for JSON decode errors on main model
- Preserve existing fallback logic for model not found, timeouts, and transient errors
- Add CHANGELOG.md with fix entry
@alt-glitch alt-glitch added type/bug Something isn't working comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists labels May 9, 2026

@kshitijk4poor kshitijk4poor left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this — the recursion bookkeeping is correct and the success path resets state cleanly. A few structural concerns before this lands.

Critical

  • No tests for the new branch. tests/agent/test_context_compressor.py has zero references to JSONDecodeError. The checklist marks "existing tests cover the fix", but none of them exercise the new path — neither the fallback-to-main, the recursion's _summary_model_fallen_back short-circuit, nor the 30-second cooldown branch. Please add at least one unit test that mocks call_llm to raise json.JSONDecodeError and asserts the fallback + cooldown behavior. This is the most important gap given that the trigger condition (provider returns Content-Type: application/json with a non-JSON body) is hard to reach in practice.

Warnings

  • Heavy duplication of the existing fallback-to-main pattern. The same if self.summary_model and self.summary_model != self.model and not _summary_model_fallen_back: ... summary_model = ""; _summary_failure_cooldown_until = 0.0; return self._generate_summary(...) block now appears three times in _generate_summary (the new JSONDecodeError branch, the existing 404/timeout branch, and the existing unknown-error best-effort branch). The cleaner landing is one of:

    • Extract a small _fallback_to_main_model(e, reason) helper and let the three branches share it, or
    • Detect JSON decode failures via the existing unknown-error branch (it would already catch this today) and just let JSON-decode errors set a shorter cooldown via a parameter.
      As written, the third copy of the pattern is maintenance debt without a clear functional separation.
  • Unrelated whitespace regression — please revert. The PR also changes two lines that have nothing to do with the fix:

    -    self.summary_model = ""  # empty = use main model
    +    self.summary_model = "" # empty = use main model
    

    on what are now lines 990 and 1022. Two spaces before # is the PEP-8 / E261 form. Please drop these two hunks from the PR.

  • CHANGELOG.md is scope creep. The repo doesn't currently track a project-level CHANGELOG.md (releases are documented via RELEASE_v*.md). Adding a brand-new CHANGELOG.md inside a bug-fix PR introduces a convention that hasn't been agreed on. Please either drop the file or propose it in a separate documentation PR. The fix's release notes belong in the next RELEASE_v*.md, not a new top-level file.

  • self.provider may render as None in the error log. Line 916 logs self.provider directly, while self.base_url on line 919 uses or "default". self.provider defaults to None for some configurations and will print as Provider: None. Mirror the or "auto" / or "default" guard for consistency.

Suggestions

  • Reachability is real but narrow. call_llm doesn't parse JSON itself, so a raw json.JSONDecodeError only escapes when the underlying OpenAI SDK runs response.json() on a body that claims Content-Type: application/json but isn't valid JSON (misconfigured proxy 502 page, Cloudflare HTML challenge with the wrong content-type, malformed SSE frame). Common providers that obey the content-type contract won't hit this. If you want to also catch the more common case where a provider returns HTML with the correct content-type, consider widening the trigger to isinstance(e, json.JSONDecodeError) or "expecting value" in str(e).lower() — the OpenAI SDK can wrap the raw decode error inside APIResponseValidationError whose __cause__ is the JSONDecodeError but whose str(e) still carries the substring.

  • Truncate _last_aux_model_failure_error consistently. The existing branches truncate to 220 chars (lines 985-987, 1017-1019); the new branch sets f"Non-JSON response: {e}" without a cap. For very large bad bodies this can swell error logs and surface noisy errors via /usage. Apply the same 220-char cap.

  • Cooldown delta is small. 30s vs 60s won't change much in practice — the actual win, when reachable, is the immediate fallback to the main model, not the cooldown reduction. Worth keeping for symmetry but not the headline benefit.

Looks Good

  • Recursion does not self-deadlock: self._summary_failure_cooldown_until = 0.0 is set before the recursive call, so the cooldown gate at the top of _generate_summary allows the retry through.
  • The _summary_model_fallen_back = True sentinel correctly prevents an infinite recursion if the main model also returns invalid JSON — control falls through to the 30s cooldown branch on the second hit.
  • Success path resets both _summary_failure_cooldown_until and _summary_model_fallen_back so subsequent compactions start clean.
  • The 69 existing tests in tests/agent/test_context_compressor.py still pass against this PR.

Verdict: Request changes. The fix targets a real (if uncommon) failure mode, but it needs a unit test, the duplicated fallback should be consolidated, and the unrelated whitespace + new CHANGELOG.md should be dropped.

@kshitijk4poor kshitijk4poor left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline comments for the review above.

Comment thread agent/context_compressor.py
Comment thread agent/context_compressor.py
Comment thread agent/context_compressor.py
Comment thread agent/context_compressor.py
Comment thread agent/context_compressor.py
Comment thread agent/context_compressor.py
Comment thread agent/context_compressor.py
Comment thread CHANGELOG.md
@kshitijk4poor

Copy link
Copy Markdown
Collaborator

Thanks for digging into this! Closing in favor of #22416, which builds on your work with a few structural changes:

  • Extracted _fallback_to_main_for_compression so the new JSON-decode branch shares one helper with the existing 404/timeout and unknown-error branches (instead of being a third copy of the same body).
  • Widened detection to also catch APIResponseValidationError-style wrapping via an "expecting value" substring match — the SDK doesn't always raise a raw JSONDecodeError.
  • Added unit tests for all three paths (raw decode, wrapped exception, 30s cooldown on main).
  • Dropped the new top-level CHANGELOG.md (the repo doesn't track one) and reverted the unrelated whitespace changes around the # empty = use main model comments.

Credit on the salvage commit. Closing #22244 via #22416.

kshitijk4poor added a commit that referenced this pull request May 9, 2026
…22248 (#22416)

When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON
body with `Content-Type: application/json` — e.g. an HTML 502 page from a
misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw
`json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose
message contains "expecting value"). Previously this fell through to the
unknown-error branch and entered a 60s cooldown without retrying on the
main model, dropping the middle conversation turns instead.

This change folds JSON-decode detection into the existing fast-path
fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring
match for "expecting value", retry once on the main model, and use a
shorter 30s cooldown when already on main (the body shape tends to flip
back to valid quickly when the upstream proxy recovers).

The three duplicated fallback bodies (model-not-found, unknown-error,
JSON-decode) are consolidated into a single `_fallback_to_main_for_compression`
helper that handles the shared bookkeeping (record aux-model failure for
`/usage`-style callers, clear summary_model, clear cooldown).

Also adds three unit tests covering: raw `JSONDecodeError` retries on main,
substring-match for wrapped exceptions, and the 30s cooldown when already
on main.

Salvage of #22248 by @0xharryriddle. Closes #22244.

Co-authored-by: Harry Riddle <ntconguit@gmail.com>
JZKK720 pushed a commit to JZKK720/hermes-agent that referenced this pull request May 11, 2026
…ousResearch#22248 (NousResearch#22416)

When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON
body with `Content-Type: application/json` — e.g. an HTML 502 page from a
misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw
`json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose
message contains "expecting value"). Previously this fell through to the
unknown-error branch and entered a 60s cooldown without retrying on the
main model, dropping the middle conversation turns instead.

This change folds JSON-decode detection into the existing fast-path
fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring
match for "expecting value", retry once on the main model, and use a
shorter 30s cooldown when already on main (the body shape tends to flip
back to valid quickly when the upstream proxy recovers).

The three duplicated fallback bodies (model-not-found, unknown-error,
JSON-decode) are consolidated into a single `_fallback_to_main_for_compression`
helper that handles the shared bookkeeping (record aux-model failure for
`/usage`-style callers, clear summary_model, clear cooldown).

Also adds three unit tests covering: raw `JSONDecodeError` retries on main,
substring-match for wrapped exceptions, and the 30s cooldown when already
on main.

Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244.

Co-authored-by: Harry Riddle <ntconguit@gmail.com>
rmulligan pushed a commit to rmulligan/hermes-agent that referenced this pull request May 11, 2026
…ousResearch#22248 (NousResearch#22416)

When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON
body with `Content-Type: application/json` — e.g. an HTML 502 page from a
misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw
`json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose
message contains "expecting value"). Previously this fell through to the
unknown-error branch and entered a 60s cooldown without retrying on the
main model, dropping the middle conversation turns instead.

This change folds JSON-decode detection into the existing fast-path
fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring
match for "expecting value", retry once on the main model, and use a
shorter 30s cooldown when already on main (the body shape tends to flip
back to valid quickly when the upstream proxy recovers).

The three duplicated fallback bodies (model-not-found, unknown-error,
JSON-decode) are consolidated into a single `_fallback_to_main_for_compression`
helper that handles the shared bookkeeping (record aux-model failure for
`/usage`-style callers, clear summary_model, clear cooldown).

Also adds three unit tests covering: raw `JSONDecodeError` retries on main,
substring-match for wrapped exceptions, and the 30s cooldown when already
on main.

Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244.

Co-authored-by: Harry Riddle <ntconguit@gmail.com>
JinyuID pushed a commit to JinyuID/hermes-agent that referenced this pull request May 11, 2026
…ousResearch#22248 (NousResearch#22416)

When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON
body with `Content-Type: application/json` — e.g. an HTML 502 page from a
misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw
`json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose
message contains "expecting value"). Previously this fell through to the
unknown-error branch and entered a 60s cooldown without retrying on the
main model, dropping the middle conversation turns instead.

This change folds JSON-decode detection into the existing fast-path
fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring
match for "expecting value", retry once on the main model, and use a
shorter 30s cooldown when already on main (the body shape tends to flip
back to valid quickly when the upstream proxy recovers).

The three duplicated fallback bodies (model-not-found, unknown-error,
JSON-decode) are consolidated into a single `_fallback_to_main_for_compression`
helper that handles the shared bookkeeping (record aux-model failure for
`/usage`-style callers, clear summary_model, clear cooldown).

Also adds three unit tests covering: raw `JSONDecodeError` retries on main,
substring-match for wrapped exceptions, and the 30s cooldown when already
on main.

Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244.

Co-authored-by: Harry Riddle <ntconguit@gmail.com>
jsboige pushed a commit to jsboige/hermes-agent that referenced this pull request May 14, 2026
…ousResearch#22248 (NousResearch#22416)

When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON
body with `Content-Type: application/json` — e.g. an HTML 502 page from a
misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw
`json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose
message contains "expecting value"). Previously this fell through to the
unknown-error branch and entered a 60s cooldown without retrying on the
main model, dropping the middle conversation turns instead.

This change folds JSON-decode detection into the existing fast-path
fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring
match for "expecting value", retry once on the main model, and use a
shorter 30s cooldown when already on main (the body shape tends to flip
back to valid quickly when the upstream proxy recovers).

The three duplicated fallback bodies (model-not-found, unknown-error,
JSON-decode) are consolidated into a single `_fallback_to_main_for_compression`
helper that handles the shared bookkeeping (record aux-model failure for
`/usage`-style callers, clear summary_model, clear cooldown).

Also adds three unit tests covering: raw `JSONDecodeError` retries on main,
substring-match for wrapped exceptions, and the 30s cooldown when already
on main.

Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244.

Co-authored-by: Harry Riddle <ntconguit@gmail.com>
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request May 25, 2026
…ousResearch#22248 (NousResearch#22416)

When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON
body with `Content-Type: application/json` — e.g. an HTML 502 page from a
misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw
`json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose
message contains "expecting value"). Previously this fell through to the
unknown-error branch and entered a 60s cooldown without retrying on the
main model, dropping the middle conversation turns instead.

This change folds JSON-decode detection into the existing fast-path
fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring
match for "expecting value", retry once on the main model, and use a
shorter 30s cooldown when already on main (the body shape tends to flip
back to valid quickly when the upstream proxy recovers).

The three duplicated fallback bodies (model-not-found, unknown-error,
JSON-decode) are consolidated into a single `_fallback_to_main_for_compression`
helper that handles the shared bookkeeping (record aux-model failure for
`/usage`-style callers, clear summary_model, clear cooldown).

Also adds three unit tests covering: raw `JSONDecodeError` retries on main,
substring-match for wrapped exceptions, and the 30s cooldown when already
on main.

Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244.

Co-authored-by: Harry Riddle <ntconguit@gmail.com>
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…ousResearch#22248 (NousResearch#22416)

When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON
body with `Content-Type: application/json` — e.g. an HTML 502 page from a
misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw
`json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose
message contains "expecting value"). Previously this fell through to the
unknown-error branch and entered a 60s cooldown without retrying on the
main model, dropping the middle conversation turns instead.

This change folds JSON-decode detection into the existing fast-path
fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring
match for "expecting value", retry once on the main model, and use a
shorter 30s cooldown when already on main (the body shape tends to flip
back to valid quickly when the upstream proxy recovers).

The three duplicated fallback bodies (model-not-found, unknown-error,
JSON-decode) are consolidated into a single `_fallback_to_main_for_compression`
helper that handles the shared bookkeeping (record aux-model failure for
`/usage`-style callers, clear summary_model, clear cooldown).

Also adds three unit tests covering: raw `JSONDecodeError` retries on main,
substring-match for wrapped exceptions, and the 30s cooldown when already
on main.

Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244.

Co-authored-by: Harry Riddle <ntconguit@gmail.com>
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
…ousResearch#22248 (NousResearch#22416)

When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON
body with `Content-Type: application/json` — e.g. an HTML 502 page from a
misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw
`json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose
message contains "expecting value"). Previously this fell through to the
unknown-error branch and entered a 60s cooldown without retrying on the
main model, dropping the middle conversation turns instead.

This change folds JSON-decode detection into the existing fast-path
fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring
match for "expecting value", retry once on the main model, and use a
shorter 30s cooldown when already on main (the body shape tends to flip
back to valid quickly when the upstream proxy recovers).

The three duplicated fallback bodies (model-not-found, unknown-error,
JSON-decode) are consolidated into a single `_fallback_to_main_for_compression`
helper that handles the shared bookkeeping (record aux-model failure for
`/usage`-style callers, clear summary_model, clear cooldown).

Also adds three unit tests covering: raw `JSONDecodeError` retries on main,
substring-match for wrapped exceptions, and the 30s cooldown when already
on main.

Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244.

Co-authored-by: Harry Riddle <ntconguit@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Context Compression Failure

3 participants