fix(mcp): avoid inline screenshot blobs in tool results#4743
Conversation
|
|
There was a problem hiding this comment.
3 issues found across 3 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="browser_use/mcp/server.py">
<violation number="1" location="browser_use/mcp/server.py:974">
P2: Screenshot files are persisted with `delete=False` but never cleaned up, causing unbounded temp-file growth and retention of sensitive captures.</violation>
</file>
<file name="tests/ci/test_mcp_screenshot_results.py">
<violation number="1" location="tests/ci/test_mcp_screenshot_results.py:51">
P2: This negative assertion checks for plaintext bytes instead of the actual base64 screenshot payload, so it can miss an inline screenshot regression.</violation>
<violation number="2" location="tests/ci/test_mcp_screenshot_results.py:72">
P2: This negative assertion checks for plaintext bytes instead of the actual base64 screenshot payload, so it can miss an inline screenshot regression.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
|
I'll get the CLA signed — will follow up once it's done. |
|
I'll get the CLA signed and follow up here once it's done. |
|
I'll get the CLA signed — will follow up once it's done. |
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="browser_use/mcp/server.py">
<violation number="1" location="browser_use/mcp/server.py:1171">
P2: Screenshot path tracking is popped before session close succeeds, so a failed close can lose cleanup state and leave orphaned temp files.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
|
For the wrong element being clicked: The fix should use more specific selectors: def find_element(page, description):
# Build a confidence score for each element
candidates = page.query_selector_all('*')
best_match = None
best_score = 0
for elem in candidates:
score = 0
# Tag match
if description.tag and elem.tag == description.tag:
score += 1
# Text match (fuzzy)
if description.text:
if description.text.lower() in elem.text.lower():
score += 3
# ID/class match
if description.id and description.id in (elem.id or ''):
score += 2
# Position match (prefer visible elements)
if elem.is_visible():
score += 1
if score > best_score:
best_score = score
best_match = elem
return best_match if best_score > 0 else None |
|
For the MCP inline screenshot issue: Return URL reference instead of inline base64: import tempfile, base64
def format_screenshot_for_mcp(screenshot_b64: str) -> dict:
return {
'type': 'image',
'data': f'data:image/jpeg;base64,{screenshot_b64}',
'mime_type': 'image/jpeg'
} |
browser_get_stateandbrowser_screenshotwere returning inline PNG blobs in MCP tool results. Clients that replay tool history back through Anthropic can tripCould not process imageon every later turn once that screenshot payload lands in the conversation.This stores MCP screenshots as temporary PNG files and returns the local path in the JSON payload instead of inlining base64 image data. I also taught the in-repo MCP client to lift image blocks into
ActionResult.imagesso MCP image results still flow through the browser-use message pipeline without getting stringified into prompt text.Tests:
pytest tests/ci/test_mcp_screenshot_results.py -qpytest tests/ci/test_file_system_llm_integration.py -q -k "image_stored_in_message_manager or agent_message_prompt_includes_images or image_end_to_end"Fixes #4742
Summary by cubic
Stop inlining screenshot base64 in MCP tool results. Screenshots are saved to temp files, returned as local paths, and cleaned up on successful session close to avoid Anthropic “Could not process image” errors when replaying tool history. Fixes #4742.
browser_get_stateandbrowser_screenshotwrite PNGs to a temp dir and returnscreenshot_pathin JSON; no base64 inlined.types.ImageContentintoActionResult.images(with correct extensions) and keeps text inextracted_content.Written for commit f094a5b. Summary will update on new commits. Review in cubic