Fix #521: inline multimodal read_file attachments#777
Conversation
- Extract _classify_attachment() replacing _is_image/_is_pdf/_is_inline_attachment to avoid computing os.path.splitext 4 times per read - Extract module-level MIME_TYPES constant from inline dict - Add substring guard before json.loads in _build_tool_result_messages to skip parsing for ~95% of text-file read_file results - Fix silent base64 stripping when attachment_kind is None by scoping hint/strip logic to known attachment types only - Extend MAX_TOOL_RESULT_CHARS truncation guard to all tool_messages, closing gap where inline user message with ~700KB data URL bypassed it - Replace nested ternary hint construction with explicit if/elif/else - Extract _is_openrouter property to deduplicate base_url checks - Add .strip() to multimodal part values for whitespace safety
2b1dc51 to
9984383
Compare
|
Thanks for the thorough work on this @kshitijk4poor — the cross-referencing with KiloCode and the provider gating show real attention to detail. However, we're going to pass on this approach for a few architectural reasons: 1. Synthetic user messages break message flow invariants 2. Context cost is too high 3. The existing flow already works well 4. Complexity in run_agent.py The PDF inline support being limited to "supported OpenRouter chat-completions paths" further underscores that this is too niche for the complexity it adds. Appreciate the contribution though — if you're interested in improving the image experience, enhancing the |
Summary
read_fileinline small image/PDF payloads into the next model turn instead of always bouncing the user to a separate vision stepWhat Changed
tools/file_operations.pyread_filerun_agent.pytools/file_tools.pyCross-Verification Against Original Implementation
This change is intentionally modeled after KiloCode's
readtool behavior and can be checked directly against these upstream references:The main architectural difference is intentional: KiloCode can return attachment objects directly from the tool result, while Hermes tool handlers still return JSON strings. Hermes therefore follows the same user-visible behavior through a different transport: sanitize the tool payload, then inject a synthetic multimodal follow-up message for the next turn.
Provider Behavior
Closes #521.