Skip to content

fix(mcp): avoid inline screenshot blobs in tool results#4743

Open
onthebed wants to merge 3 commits into
browser-use:mainfrom
onthebed:clawoss/fix/4742-mcp-screenshot-history
Open

fix(mcp): avoid inline screenshot blobs in tool results#4743
onthebed wants to merge 3 commits into
browser-use:mainfrom
onthebed:clawoss/fix/4742-mcp-screenshot-history

Conversation

@onthebed

@onthebed onthebed commented Apr 25, 2026

Copy link
Copy Markdown

browser_get_state and browser_screenshot were returning inline PNG blobs in MCP tool results. Clients that replay tool history back through Anthropic can trip Could not process image on every later turn once that screenshot payload lands in the conversation.

This stores MCP screenshots as temporary PNG files and returns the local path in the JSON payload instead of inlining base64 image data. I also taught the in-repo MCP client to lift image blocks into ActionResult.images so MCP image results still flow through the browser-use message pipeline without getting stringified into prompt text.

Tests:

  • pytest tests/ci/test_mcp_screenshot_results.py -q
  • pytest tests/ci/test_file_system_llm_integration.py -q -k "image_stored_in_message_manager or agent_message_prompt_includes_images or image_end_to_end"

Fixes #4742


Summary by cubic

Stop inlining screenshot base64 in MCP tool results. Screenshots are saved to temp files, returned as local paths, and cleaned up on successful session close to avoid Anthropic “Could not process image” errors when replaying tool history. Fixes #4742.

  • Bug Fixes
    • browser_get_state and browser_screenshot write PNGs to a temp dir and return screenshot_path in JSON; no base64 inlined.
    • Server stores screenshots in a dedicated temp dir, deletes them on successful close (warns if a file can’t be removed), and preserves files/state if close fails for retry.
    • Updated tool descriptions to reflect path-based screenshot results.
    • MCP client lifts types.ImageContent into ActionResult.images (with correct extensions) and keeps text in extracted_content.
    • Added tests for file-backed screenshots, cleanup on close, preservation on close failure, and client image extraction.

Written for commit f094a5b. Summary will update on new commits. Review in cubic

@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 3 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="browser_use/mcp/server.py">

<violation number="1" location="browser_use/mcp/server.py:974">
P2: Screenshot files are persisted with `delete=False` but never cleaned up, causing unbounded temp-file growth and retention of sensitive captures.</violation>
</file>

<file name="tests/ci/test_mcp_screenshot_results.py">

<violation number="1" location="tests/ci/test_mcp_screenshot_results.py:51">
P2: This negative assertion checks for plaintext bytes instead of the actual base64 screenshot payload, so it can miss an inline screenshot regression.</violation>

<violation number="2" location="tests/ci/test_mcp_screenshot_results.py:72">
P2: This negative assertion checks for plaintext bytes instead of the actual base64 screenshot payload, so it can miss an inline screenshot regression.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

Comment thread browser_use/mcp/server.py
Comment thread tests/ci/test_mcp_screenshot_results.py Outdated
Comment thread tests/ci/test_mcp_screenshot_results.py Outdated
@onthebed

Copy link
Copy Markdown
Author

I'll get the CLA signed — will follow up once it's done.

@onthebed

Copy link
Copy Markdown
Author

I'll get the CLA signed and follow up here once it's done.

@onthebed

Copy link
Copy Markdown
Author

I'll get the CLA signed — will follow up once it's done.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="browser_use/mcp/server.py">

<violation number="1" location="browser_use/mcp/server.py:1171">
P2: Screenshot path tracking is popped before session close succeeds, so a failed close can lose cleanup state and leave orphaned temp files.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

Comment thread browser_use/mcp/server.py Outdated
@PHclaw

PHclaw commented Apr 29, 2026

Copy link
Copy Markdown

For the wrong element being clicked:

The fix should use more specific selectors:

def find_element(page, description):
    # Build a confidence score for each element
    candidates = page.query_selector_all('*')
    
    best_match = None
    best_score = 0
    
    for elem in candidates:
        score = 0
        
        # Tag match
        if description.tag and elem.tag == description.tag:
            score += 1
        
        # Text match (fuzzy)
        if description.text:
            if description.text.lower() in elem.text.lower():
                score += 3
        
        # ID/class match
        if description.id and description.id in (elem.id or ''):
            score += 2
        
        # Position match (prefer visible elements)
        if elem.is_visible():
            score += 1
        
        if score > best_score:
            best_score = score
            best_match = elem
    
    return best_match if best_score > 0 else None

@PHclaw

PHclaw commented Apr 29, 2026

Copy link
Copy Markdown

For the MCP inline screenshot issue:

Return URL reference instead of inline base64:

import tempfile, base64

def format_screenshot_for_mcp(screenshot_b64: str) -> dict:
    return {
        'type': 'image',
        'data': f'data:image/jpeg;base64,{screenshot_b64}',
        'mime_type': 'image/jpeg'
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Screenshot blob in tool result poisons conversation context → API 400 on all subsequent turns

4 participants