fix(models): handle binary file downloads in tool execution#190
Conversation
Download actions (e.g. googledrive_unified_download_file) serve raw binary
with the file's own MIME type and a Content-Disposition header, so
StackOneTool.execute() calling response.json() unconditionally raised
UnicodeDecodeError on the first non-UTF-8 byte.
Branch on Content-Type: parse JSON only for JSON media types (application/json
and +json suffixes); otherwise return the file as
{content: bytes, content_type, status_code, headers, file_name}, matching the
StackOne generated SDKs' download response shape. file_name is parsed from
Content-Disposition (incl. RFC 5987 filename*). Existing JSON behavior is
unchanged.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR updates StackOneTool.execute() to correctly handle file-download endpoints that return raw binary bodies by only JSON-parsing responses when the Content-Type indicates a JSON media type, and otherwise returning the raw bytes plus relevant metadata (including file_name parsed from Content-Disposition).
Changes:
- Added
Content-Typedetection to parse JSON only forapplication/json/+jsonresponses; otherwise return{content: bytes, content_type, status_code, headers, file_name}. - Added helper utilities for JSON media type detection and
Content-Dispositionfilename extraction (includingfilename*support). - Added tests covering binary bodies, missing
Content-Type, JSON with charset parameters, and the helper parsers.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| tests/test_tool_calling.py | Adds regression and helper unit tests for binary download handling and header parsing. |
| stackone_ai/models.py | Implements Content-Type-based branching in execute() and adds helper functions for JSON detection and filename extraction. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| extended = re.search(r"filename\*\s*=\s*[^']*'[^']*'([^;]+)", value, re.IGNORECASE) | ||
| if extended: | ||
| return unquote(extended.group(1).strip()) |
There was a problem hiding this comment.
Fixed in 10d7802: declared charset is now honoured via unquote(..., encoding=charset) with a UTF-8 fallback on unknown charsets, and surrounding quotes are stripped off non-conformant quoted filename* values (.strip('"')).
There was a problem hiding this comment.
1 issue found across 2 files
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
Add a "File Downloads" section covering the bytes-plus-metadata dict that execute()/call() return for non-JSON responses, the Content-Type detection rule, and the not-JSON-serializable caveat for LLM-facing re-serialization. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
0 issues found across 1 file (changes from recent commits).
Requires human review: Auto-approval blocked by 1 unresolved issue from previous reviews.
Re-trigger cubic
| extended = re.search(r"filename\*\s*=\s*[^']*'[^']*'([^;]+)", value, re.IGNORECASE) | ||
| if extended: | ||
| return unquote(extended.group(1).strip()) | ||
| quoted = re.search(r'filename\s*=\s*"([^"]*)"', value, re.IGNORECASE) |
There was a problem hiding this comment.
Fixed in 10d7802 — charset is parsed from the filename* value and passed to unquote(..., encoding=...), falling back to UTF-8 on an unknown charset label.
There was a problem hiding this comment.
This s great catch by copilot but only in Python for parity we should optionally change this to the node too?
| # RFC 5987 extended form is percent-decoded and takes precedence. | ||
| ("attachment; filename=\"fallback.txt\"; filename*=UTF-8''na%C3%AFve.txt", "naïve.txt"), | ||
| ("attachment", None), |
There was a problem hiding this comment.
Added in 10d7802: ISO-8859-1 (non-UTF-8) charset, an unknown-charset fallback, and a quoted filename* case in test_filename_from_content_disposition.
The Content-Disposition filename* branch captured the charset in its regex but discarded it, so unquote() always decoded with UTF-8. RFC 5987 also permits ISO-8859-1, so an ISO-8859-1 percent-encoded filename decoded to mojibake despite the helper claiming RFC 5987 support. - Decode the extended value with its declared charset; fall back to UTF-8 for an unknown or empty charset label (LookupError) instead of raising. - Strip surrounding quotes off non-conformant quoted filename* values. - Add tests for ISO-8859-1, an unknown charset, and a quoted filename*. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
0 issues found across 2 files (changes from recent commits).
Requires human review: This change modifies core execution logic to handle binary responses differently, altering the return shape for non-JSON tool results, which could impact existing callers that rely on JSON-deserialized responses.
Re-trigger cubic
Summary
googledrive_unified_download_file) serve raw binary with the file's own MIME type and aContent-Dispositionheader, soStackOneTool.execute()callingresponse.json()unconditionally raisedUnicodeDecodeErroron the first non-UTF-8 byte.execute()now branches onContent-Type: JSON media types (application/jsonand+jsonsuffixes) are parsed as before; any other type returns the file as{content: bytes, content_type, status_code, headers, file_name}, matching the StackOne generated SDKs' download response shape.file_nameis parsed fromContent-Disposition(including RFC 5987filename*).Note:
contentholds raw bytes and is not JSON-serializable — callers that re-serialize tool results (e.g. for an LLM) should handle that key. The siblingstackone-ai-nodeSDK has the sameresponse.json()gap and warrants a matching fix.🤖 Generated with Claude Code
Summary by cubic
Fixes
StackOneTool.execute()failing on binary downloads by parsing JSON only when theContent-Typeis JSON and returning bytes + metadata for other types. Improves filename parsing to honor RFC 5987 charsets. Adds README docs explaining the file-download return shape and the raw-bytes caveat.application/jsonor+jsontypes; otherwise return{content: bytes, content_type, status_code, headers, file_name}.file_namefromContent-Disposition, including RFC 5987filename*; honor declared charsets (UTF-8/ISO-8859-1), fall back to UTF-8 on unknown labels, and strip quotes on non-conformant values.googledrive_unified_download_file) as well.Content-Type, JSON with charset, ISO-8859-1 and unknownfilename*charsets, and the helper parsers.Written for commit 10d7802. Summary will update on new commits.