fix(computer-use): cap AX elements array to prevent context blowup (#22865)#30145
Merged
Conversation
…22865) `computer_use(action='capture', mode='ax')` returned the full AX element list verbatim in the JSON response. Dense Electron / Obsidian / JetBrains UIs publish 500+ AX nodes (one reproduction in #22865 returned 597 elements against Obsidian), so a single capture could consume enough context to trigger compression failures or render the session unusable. The human-readable `_format_elements` summary is already capped at 40 lines, so the truncation gap was invisible to anyone reading the summary output. Add a `max_elements` argument to the tool schema, default 100, that trims the AX `elements` array. When the cap fires, the response surfaces `total_elements` and `truncated_elements` and appends a "raise max_elements or pass app= to narrow" hint to the summary so the model knows the JSON view is partial and can re-issue with a tighter scope. Validation is centralized in `_coerce_max_elements`: missing / non-integer / sub-1 inputs fall back to the default cap, so the protection can never be silently disabled by a malformed tool-call argument. The cap only affects AX-mode JSON; `mode='som'` and `mode='vision'` keep returning a screenshot + image-aware summary unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four findings from Copilot's review on PR #22891, all in the AX elements-array cap added by 22fa1ed: 1. The truncation note ("response truncated to N of M elements") was appended unconditionally — including in the som/vision multimodal path, whose response carries a screenshot rather than an `elements` array. The note described a payload field that wasn't present. Moved the note into the AX-text branch where the array actually appears. 2. `_format_elements(cap.elements)` ran on the full untrimmed list with its own `max_lines=40` cap, so a caller passing `max_elements=10` would see summary lines referencing `#11..#40` even though the JSON `elements` array only held #1..#10. Format on `visible_elements` instead so the summary indices always exist in the response. 3. `_coerce_max_elements` enforced a lower bound but no upper bound, so `max_elements=10_000_000` silently disabled the safeguard and reintroduced the original context-blow-up. Added a hard cap (`_MAX_ALLOWED_MAX_ELEMENTS = 1000`) that clamps oversized values. 4. The schema string said "Default 100" but the property carried no `default` field, and claimed `max_elements` had no effect on som/ vision while the image-missing fallback path can still return an elements array. Added `"default": 100`, `"maximum": 1000`, and clarified the fallback-path wording. Each finding gets a regression test: - test_capture_ax_clamps_oversized_max_elements_to_hard_cap - test_capture_ax_summary_indices_match_returned_elements - test_capture_multimodal_summary_omits_truncation_note - test_schema_max_elements_documents_default_and_upper_bound Verified with `pytest tests/tools/test_computer_use.py` (53 passed, including the 5 new cases). Confirmed each new test fails on the pre-fix code path before applying the production change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The cherry-pick of #22891 (max_elements cap) reshuffled _capture_response so summary was assigned inside both the multimodal and AX branches, but #30126's aux-vision routing call (_route_capture_through_aux_vision) fires BEFORE either branch and references the not-yet-bound name. Compute summary once up-front, keep the AX-branch rebuild for the truncation note.
Contributor
🔎 Lint report:
|
Contributor
4 tasks
3 tasks
AhmetArif0
added a commit
to AhmetArif0/hermes-agent
that referenced
this pull request
May 22, 2026
_route_capture_through_aux_vision returned cap.elements verbatim, so dense SOM captures (600+ AX nodes on Electron/Slack) routed via auxiliary.vision still produced oversized tool results that could exhaust session context — the same NousResearch#22865 shape that PR NousResearch#30145 fixed for the AX-only path. Fix: pass visible_elements (already capped by max_elements in _capture_response) to _route_capture_through_aux_vision and use it in the returned JSON. Add total_elements and truncated_elements fields for parity with the AX path so the model knows the response is partial. 3 regression tests added: default cap (600→100), explicit override (300→50), no truncated_elements field when under cap.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Salvages #22891 onto current main. Fixes #22865 —
computer_use(action='capture', mode='ax')against dense AX trees (Electron apps like Obsidian) no longer dumps the full elements array into the tool result.The bug
Reporter's session captured an Obsidian window and got back ~597 AX elements.
_format_elements()truncated the human-readable summary but_capture_response()still returned the fullelementsarray in the JSON payload — blowing up context or tripping compression failures.The fix
tools/computer_use/schema.py— new optionalmax_elementsinteger parameter (minimum: 1, maximum: 1000, default: 100); schema documents both default and upper bound.tools/computer_use/tool.py—_DEFAULT_MAX_ELEMENTS = 100,_MAX_ALLOWED_MAX_ELEMENTS = 1000, new_coerce_max_elements()validator that falls back to default for malformed input (negative, zero, non-int) so a caller can't accidentally re-introduce unbounded behavior._capture_response()slicescap.elements[:max_elements], surfacestotal_elementsandtruncated_elementsfields, and appends a "(response truncated to N of M elements; raise max_elements or pass app= to narrow)" note to the human summary so the model knows the JSON view is partial.tests/tools/test_computer_use.py— 9 regression tests: schema exposure (2), default cap behavior on 600-element tree (1), explicit override (1), below-cap backwards-compat (1), invalid input fallback (4).Validation
elementsarray to prevent context blowup (#22865) #22891 (the original + a Copilot-review follow-up).elementsarray to prevent context blowup (#22865) #22891 onto a main that already includes fix(computer_use): route SOM/vision captures via auxiliary.vision (#24015) #30126 (aux-vision routing) produced anUnboundLocalError— fix(computer-use): cap AXelementsarray to prevent context blowup (#22865) #22891 reshuffled_capture_response()sosummarywas bound inside both branches, but fix(computer_use): route SOM/vision captures via auxiliary.vision (#24015) #30126's_route_capture_through_aux_vision(cap, summary)call fires BEFORE either branch and referenced the not-yet-bound name. Fixed by buildingsummaryonce up-front; the AX path still rebuilds it after appending the truncation note.Coverage
Credit @briandevans (PR #22891). Author already in AUTHOR_MAP.
Closes #22891.
Infographic