Description
When an MCP tool returns ImageContent in its response (e.g., screenshot tools, vision tools), Hermes's mcp_tool.py only processes TextContent blocks. ImageContent blocks are silently dropped, resulting in missing visual output and no indication to the user that an image was returned.
Root Cause
In mcp_tool.py, the response handling loop only checks for hasattr(block, "text"):
for block in (result.content or []):
if hasattr(block, "text"):
parts.append(block.text)
Per the MCP spec, tool results can contain ContentBlock objects of types:
TextContent (type="text") — handled ✓
ImageContent (type="image") — NOT handled ✗
EmbeddedResource — NOT handled ✗
When an MCP tool like Playwright's browser_take_screenshot or Anthropic's computer_use returns an image, the ImageContent block has data (base64) and mimeType attributes instead of text. These blocks pass the hasattr(block, "text") check (they have a default text="" attribute) but block.text is empty, so they get skipped silently.
Suggested Fix
Add handling for ImageContent blocks in the response loop:
for block in (result.content or []):
if hasattr(block, "text") and block.text:
parts.append(block.text)
elif hasattr(block, "data") and hasattr(block, "mimeType"):
# ImageContent — save to disk and reference in text
import base64, uuid
img_bytes = base64.b64decode(block.data)
fpath = os.path.join(hermes_home, "image_cache", f"mcp-img-{uuid.uuid4().hex[:8]}.png")
with open(fpath, "wb") as f:
f.write(img_bytes)
parts.append(f"[MCP tool returned an image: {fpath}]")
This saves the image to disk and includes a file path reference so the agent can process it.
Environment
- Hermes: v0.9.0
- Affects: all platforms
- Commonly triggered by: Playwright MCP (screenshots), computer-use-mcp (screenshots), any MCP tool returning visual output
Impact
Without this fix, any MCP tool that returns visual output (screenshots, charts, diagrams) appears to return nothing, making vision-related MCP tools unusable.
Description
When an MCP tool returns
ImageContentin its response (e.g., screenshot tools, vision tools), Hermes'smcp_tool.pyonly processesTextContentblocks. ImageContent blocks are silently dropped, resulting in missing visual output and no indication to the user that an image was returned.Root Cause
In
mcp_tool.py, the response handling loop only checks forhasattr(block, "text"):Per the MCP spec, tool results can contain
ContentBlockobjects of types:TextContent(type="text") — handled ✓ImageContent(type="image") — NOT handled ✗EmbeddedResource— NOT handled ✗When an MCP tool like Playwright's
browser_take_screenshotor Anthropic'scomputer_usereturns an image, theImageContentblock hasdata(base64) andmimeTypeattributes instead oftext. These blocks pass thehasattr(block, "text")check (they have a defaulttext=""attribute) butblock.textis empty, so they get skipped silently.Suggested Fix
Add handling for
ImageContentblocks in the response loop:This saves the image to disk and includes a file path reference so the agent can process it.
Environment
Impact
Without this fix, any MCP tool that returns visual output (screenshots, charts, diagrams) appears to return nothing, making vision-related MCP tools unusable.