fix(context): handle JSON decode errors in context compression#22248
fix(context): handle JSON decode errors in context compression#222480xharryriddle wants to merge 1 commit into
Conversation
- Add explicit json.JSONDecodeError handling in context_compressor.py's _generate_summary method - Log detailed diagnostics (provider, model, base URL, error position) when auxiliary LLM returns non-JSON response - Fallback to main model immediately if separate summary model is configured - Reduce cooldown to 30s for JSON decode errors on main model - Preserve existing fallback logic for model not found, timeouts, and transient errors - Add CHANGELOG.md with fix entry
kshitijk4poor
left a comment
There was a problem hiding this comment.
Thanks for tackling this — the recursion bookkeeping is correct and the success path resets state cleanly. A few structural concerns before this lands.
Critical
- No tests for the new branch.
tests/agent/test_context_compressor.pyhas zero references toJSONDecodeError. The checklist marks "existing tests cover the fix", but none of them exercise the new path — neither the fallback-to-main, the recursion's_summary_model_fallen_backshort-circuit, nor the 30-second cooldown branch. Please add at least one unit test that mockscall_llmto raisejson.JSONDecodeErrorand asserts the fallback + cooldown behavior. This is the most important gap given that the trigger condition (provider returnsContent-Type: application/jsonwith a non-JSON body) is hard to reach in practice.
Warnings
-
Heavy duplication of the existing fallback-to-main pattern. The same
if self.summary_model and self.summary_model != self.model and not _summary_model_fallen_back: ... summary_model = ""; _summary_failure_cooldown_until = 0.0; return self._generate_summary(...)block now appears three times in_generate_summary(the new JSONDecodeError branch, the existing 404/timeout branch, and the existing unknown-error best-effort branch). The cleaner landing is one of:- Extract a small
_fallback_to_main_model(e, reason)helper and let the three branches share it, or - Detect JSON decode failures via the existing unknown-error branch (it would already catch this today) and just let JSON-decode errors set a shorter cooldown via a parameter.
As written, the third copy of the pattern is maintenance debt without a clear functional separation.
- Extract a small
-
Unrelated whitespace regression — please revert. The PR also changes two lines that have nothing to do with the fix:
- self.summary_model = "" # empty = use main model + self.summary_model = "" # empty = use main modelon what are now lines 990 and 1022. Two spaces before
#is the PEP-8 / E261 form. Please drop these two hunks from the PR. -
CHANGELOG.mdis scope creep. The repo doesn't currently track a project-levelCHANGELOG.md(releases are documented viaRELEASE_v*.md). Adding a brand-newCHANGELOG.mdinside a bug-fix PR introduces a convention that hasn't been agreed on. Please either drop the file or propose it in a separate documentation PR. The fix's release notes belong in the nextRELEASE_v*.md, not a new top-level file. -
self.providermay render asNonein the error log. Line 916 logsself.providerdirectly, whileself.base_urlon line 919 usesor "default".self.providerdefaults toNonefor some configurations and will print asProvider: None. Mirror theor "auto"/or "default"guard for consistency.
Suggestions
-
Reachability is real but narrow.
call_llmdoesn't parse JSON itself, so a rawjson.JSONDecodeErroronly escapes when the underlying OpenAI SDK runsresponse.json()on a body that claimsContent-Type: application/jsonbut isn't valid JSON (misconfigured proxy 502 page, Cloudflare HTML challenge with the wrong content-type, malformed SSE frame). Common providers that obey the content-type contract won't hit this. If you want to also catch the more common case where a provider returns HTML with the correct content-type, consider widening the trigger toisinstance(e, json.JSONDecodeError) or "expecting value" in str(e).lower()— the OpenAI SDK can wrap the raw decode error insideAPIResponseValidationErrorwhose__cause__is theJSONDecodeErrorbut whosestr(e)still carries the substring. -
Truncate
_last_aux_model_failure_errorconsistently. The existing branches truncate to 220 chars (lines 985-987, 1017-1019); the new branch setsf"Non-JSON response: {e}"without a cap. For very large bad bodies this can swell error logs and surface noisy errors via/usage. Apply the same 220-char cap. -
Cooldown delta is small. 30s vs 60s won't change much in practice — the actual win, when reachable, is the immediate fallback to the main model, not the cooldown reduction. Worth keeping for symmetry but not the headline benefit.
Looks Good
- Recursion does not self-deadlock:
self._summary_failure_cooldown_until = 0.0is set before the recursive call, so the cooldown gate at the top of_generate_summaryallows the retry through. - The
_summary_model_fallen_back = Truesentinel correctly prevents an infinite recursion if the main model also returns invalid JSON — control falls through to the 30s cooldown branch on the second hit. - Success path resets both
_summary_failure_cooldown_untiland_summary_model_fallen_backso subsequent compactions start clean. - The 69 existing tests in
tests/agent/test_context_compressor.pystill pass against this PR.
Verdict: Request changes. The fix targets a real (if uncommon) failure mode, but it needs a unit test, the duplicated fallback should be consolidated, and the unrelated whitespace + new CHANGELOG.md should be dropped.
kshitijk4poor
left a comment
There was a problem hiding this comment.
Inline comments for the review above.
|
Thanks for digging into this! Closing in favor of #22416, which builds on your work with a few structural changes:
|
…22248 (#22416) When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON body with `Content-Type: application/json` — e.g. an HTML 502 page from a misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw `json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose message contains "expecting value"). Previously this fell through to the unknown-error branch and entered a 60s cooldown without retrying on the main model, dropping the middle conversation turns instead. This change folds JSON-decode detection into the existing fast-path fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring match for "expecting value", retry once on the main model, and use a shorter 30s cooldown when already on main (the body shape tends to flip back to valid quickly when the upstream proxy recovers). The three duplicated fallback bodies (model-not-found, unknown-error, JSON-decode) are consolidated into a single `_fallback_to_main_for_compression` helper that handles the shared bookkeeping (record aux-model failure for `/usage`-style callers, clear summary_model, clear cooldown). Also adds three unit tests covering: raw `JSONDecodeError` retries on main, substring-match for wrapped exceptions, and the 30s cooldown when already on main. Salvage of #22248 by @0xharryriddle. Closes #22244. Co-authored-by: Harry Riddle <ntconguit@gmail.com>
…ousResearch#22248 (NousResearch#22416) When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON body with `Content-Type: application/json` — e.g. an HTML 502 page from a misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw `json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose message contains "expecting value"). Previously this fell through to the unknown-error branch and entered a 60s cooldown without retrying on the main model, dropping the middle conversation turns instead. This change folds JSON-decode detection into the existing fast-path fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring match for "expecting value", retry once on the main model, and use a shorter 30s cooldown when already on main (the body shape tends to flip back to valid quickly when the upstream proxy recovers). The three duplicated fallback bodies (model-not-found, unknown-error, JSON-decode) are consolidated into a single `_fallback_to_main_for_compression` helper that handles the shared bookkeeping (record aux-model failure for `/usage`-style callers, clear summary_model, clear cooldown). Also adds three unit tests covering: raw `JSONDecodeError` retries on main, substring-match for wrapped exceptions, and the 30s cooldown when already on main. Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244. Co-authored-by: Harry Riddle <ntconguit@gmail.com>
…ousResearch#22248 (NousResearch#22416) When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON body with `Content-Type: application/json` — e.g. an HTML 502 page from a misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw `json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose message contains "expecting value"). Previously this fell through to the unknown-error branch and entered a 60s cooldown without retrying on the main model, dropping the middle conversation turns instead. This change folds JSON-decode detection into the existing fast-path fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring match for "expecting value", retry once on the main model, and use a shorter 30s cooldown when already on main (the body shape tends to flip back to valid quickly when the upstream proxy recovers). The three duplicated fallback bodies (model-not-found, unknown-error, JSON-decode) are consolidated into a single `_fallback_to_main_for_compression` helper that handles the shared bookkeeping (record aux-model failure for `/usage`-style callers, clear summary_model, clear cooldown). Also adds three unit tests covering: raw `JSONDecodeError` retries on main, substring-match for wrapped exceptions, and the 30s cooldown when already on main. Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244. Co-authored-by: Harry Riddle <ntconguit@gmail.com>
…ousResearch#22248 (NousResearch#22416) When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON body with `Content-Type: application/json` — e.g. an HTML 502 page from a misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw `json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose message contains "expecting value"). Previously this fell through to the unknown-error branch and entered a 60s cooldown without retrying on the main model, dropping the middle conversation turns instead. This change folds JSON-decode detection into the existing fast-path fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring match for "expecting value", retry once on the main model, and use a shorter 30s cooldown when already on main (the body shape tends to flip back to valid quickly when the upstream proxy recovers). The three duplicated fallback bodies (model-not-found, unknown-error, JSON-decode) are consolidated into a single `_fallback_to_main_for_compression` helper that handles the shared bookkeeping (record aux-model failure for `/usage`-style callers, clear summary_model, clear cooldown). Also adds three unit tests covering: raw `JSONDecodeError` retries on main, substring-match for wrapped exceptions, and the 30s cooldown when already on main. Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244. Co-authored-by: Harry Riddle <ntconguit@gmail.com>
…ousResearch#22248 (NousResearch#22416) When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON body with `Content-Type: application/json` — e.g. an HTML 502 page from a misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw `json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose message contains "expecting value"). Previously this fell through to the unknown-error branch and entered a 60s cooldown without retrying on the main model, dropping the middle conversation turns instead. This change folds JSON-decode detection into the existing fast-path fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring match for "expecting value", retry once on the main model, and use a shorter 30s cooldown when already on main (the body shape tends to flip back to valid quickly when the upstream proxy recovers). The three duplicated fallback bodies (model-not-found, unknown-error, JSON-decode) are consolidated into a single `_fallback_to_main_for_compression` helper that handles the shared bookkeeping (record aux-model failure for `/usage`-style callers, clear summary_model, clear cooldown). Also adds three unit tests covering: raw `JSONDecodeError` retries on main, substring-match for wrapped exceptions, and the 30s cooldown when already on main. Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244. Co-authored-by: Harry Riddle <ntconguit@gmail.com>
…ousResearch#22248 (NousResearch#22416) When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON body with `Content-Type: application/json` — e.g. an HTML 502 page from a misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw `json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose message contains "expecting value"). Previously this fell through to the unknown-error branch and entered a 60s cooldown without retrying on the main model, dropping the middle conversation turns instead. This change folds JSON-decode detection into the existing fast-path fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring match for "expecting value", retry once on the main model, and use a shorter 30s cooldown when already on main (the body shape tends to flip back to valid quickly when the upstream proxy recovers). The three duplicated fallback bodies (model-not-found, unknown-error, JSON-decode) are consolidated into a single `_fallback_to_main_for_compression` helper that handles the shared bookkeeping (record aux-model failure for `/usage`-style callers, clear summary_model, clear cooldown). Also adds three unit tests covering: raw `JSONDecodeError` retries on main, substring-match for wrapped exceptions, and the 30s cooldown when already on main. Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244. Co-authored-by: Harry Riddle <ntconguit@gmail.com>
…ousResearch#22248 (NousResearch#22416) When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON body with `Content-Type: application/json` — e.g. an HTML 502 page from a misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw `json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose message contains "expecting value"). Previously this fell through to the unknown-error branch and entered a 60s cooldown without retrying on the main model, dropping the middle conversation turns instead. This change folds JSON-decode detection into the existing fast-path fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring match for "expecting value", retry once on the main model, and use a shorter 30s cooldown when already on main (the body shape tends to flip back to valid quickly when the upstream proxy recovers). The three duplicated fallback bodies (model-not-found, unknown-error, JSON-decode) are consolidated into a single `_fallback_to_main_for_compression` helper that handles the shared bookkeeping (record aux-model failure for `/usage`-style callers, clear summary_model, clear cooldown). Also adds three unit tests covering: raw `JSONDecodeError` retries on main, substring-match for wrapped exceptions, and the 30s cooldown when already on main. Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244. Co-authored-by: Harry Riddle <ntconguit@gmail.com>
…ousResearch#22248 (NousResearch#22416) When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON body with `Content-Type: application/json` — e.g. an HTML 502 page from a misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw `json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose message contains "expecting value"). Previously this fell through to the unknown-error branch and entered a 60s cooldown without retrying on the main model, dropping the middle conversation turns instead. This change folds JSON-decode detection into the existing fast-path fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring match for "expecting value", retry once on the main model, and use a shorter 30s cooldown when already on main (the body shape tends to flip back to valid quickly when the upstream proxy recovers). The three duplicated fallback bodies (model-not-found, unknown-error, JSON-decode) are consolidated into a single `_fallback_to_main_for_compression` helper that handles the shared bookkeeping (record aux-model failure for `/usage`-style callers, clear summary_model, clear cooldown). Also adds three unit tests covering: raw `JSONDecodeError` retries on main, substring-match for wrapped exceptions, and the 30s cooldown when already on main. Salvage of NousResearch#22248 by @0xharryriddle. Closes NousResearch#22244. Co-authored-by: Harry Riddle <ntconguit@gmail.com>
What does this PR do?
Fixes context compression failure when the auxiliary LLM provider returns non-JSON responses (HTML, plain text, malformed JSON). The compression handler now explicitly catches
json.JSONDecodeError, logs detailed diagnostics, and implements appropriate fallback behavior.Why this approach:
Fixes context compression JSON decode errors:
Expecting value: line X column Y (char Z)Related Issue
Fixes #22244
Type of Change
Changes Made
isinstance(e, json.JSONDecodeError)check in_generate_summaryexception handler, enhanced error logging with provider/model/base URL/position, implemented immediate fallback to main model, reduced cooldown to 30s for JSON decode errors on main model.How to Test
Run the context compressor test suite:
All 69 tests should pass.
(Optional) Manually verify by configuring an auxiliary model known to return non-JSON responses and trigger context compression; check logs for enhanced error messages.
Checklist
Code
fix(scope):,feat(scope):, etc.)scripts/run_tests.sh tests/agent/test_context_compressor.pyand all tests passDocumentation & Housekeeping
AGENTS.mdif I changed architecture or workflows — N/A (internal implementation detail)cli-config.yaml.exampleif I added/changed config keys — N/A