Skip to content

honcho memory: sync_turn fails on multimodal (list) content — vision turns not recorded #30252

@zanphear

Description

@zanphear

Summary

HonchoMemoryProvider.sync_turn errors when handed multimodal list content (the OpenAI vision shape [{"type":"text",...},{"type":"image_url",...}]). Logged once per vision turn at WARNING level; the chat completion succeeds, but the turn is silently dropped from Honcho's memory — the user representation never learns about anything visual the assistant was shown.

Versions

  • Hermes Agent v0.14.0 (2026.5.16)
  • plugins/memory/honcho/__init__.py @ line 1120 (sync_turn)
  • Reproducible against any OpenAI-compat upstream — verified on local llama.cpp endpoints serving Gemma 4 / Qwen 3.6

Repro

With memory.provider: honcho set, send a multimodal chat completion through the gateway:

{
  "model": "hermes-agent",
  "messages": [{
    "role": "user",
    "content": [
      {"type": "text", "text": "what colour is this?"},
      {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
    ]
  }]
}

The reply lands fine, but the gateway log shows:

WARNING agent.memory_manager: Memory provider 'honcho' sync_turn failed:
expected string or bytes-like object, got 'list'

Cause

plugins/memory/honcho/__init__.py:1132:

clean_user_content = sanitize_context(user_content or "").strip()

user_content is the OpenAI multimodal list shape here, so sanitize_context(list).strip() raises. The exception gets caught by agent/memory_manager.py at the provider.sync_turn(...) call site and logged at WARNING, but the turn never reaches Honcho.

Suggested fix

Normalise list-shaped content to a flat string before sanitisation. Text parts concatenate; image parts become a literal [image] placeholder so Honcho's representation still knows visuals were exchanged.

def _flatten(c):
    if isinstance(c, list):
        parts = []
        for item in c:
            if isinstance(item, dict):
                t = item.get("type")
                if t == "text":
                    parts.append(item.get("text", ""))
                elif t in ("image_url", "input_image"):
                    parts.append("[image]")
                else:
                    parts.append(f"[{t}]")
            elif isinstance(item, str):
                parts.append(item)
        return "\n".join(p for p in parts if p)
    return c
user_content = _flatten(user_content)
assistant_content = _flatten(assistant_content)

Carrying this as a local patch and the gateway log is now clean; Honcho's bill workspace gets the textual half of the turn plus an [image] marker.

Scope note

Only the Honcho plugin was investigated. Other memory providers with the same sync_turn(user_content: str, ...) signature (mem0, byterover, supermemory, retaindb, holographic, openviking) may have the same defect — could be worth a shared helper in agent/memory_manager.py if so. Haven't verified.

Happy to send a PR if useful — drop a note here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havecomp/pluginsPlugin system and bundled pluginstool/memoryMemory tool and memory providerstype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions