tools: refactor tool call parsing and enable streaming#10415
tools: refactor tool call parsing and enable streaming#10415ParthSareen merged 24 commits intomainfrom
Conversation
9530834 to
88d836f
Compare
There was a problem hiding this comment.
Pull Request Overview
This PR adds support for Python tools and streaming in the server by enhancing tool call parsing and response handling.
- Introduces new tests (server/tools_test.go and server/model_test.go) to validate various tool call scenarios.
- Implements new helper functions in server/tools.go and server/model.go for parsing JSON tool call objects and extracting tool tokens from templates.
- Updates server/routes.go to improve tool call handling in streamed responses.
Reviewed Changes
Copilot reviewed 5 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| server/tools_test.go | Added comprehensive tests for tool call execution scenarios with different token formats. |
| server/tools.go | Introduced functions to parse JSON tool calls and route parsing based on token prefixes. |
| server/routes.go | Modified ChatHandler logic to correctly detect and process tool calls during streaming. |
| server/model_test.go | Refactored tests for tool token extraction, ensuring expected behavior across edge cases. |
| server/model.go | Added functions to extract text and tokens from templates, improving template-based parsing. |
Files not reviewed (2)
- server/testdata/tools/qwen2.5-coder.gotmpl: Language not supported
- server/testdata/tools/qwen2.5-coder.out: Language not supported
Comments suppressed due to low confidence (1)
server/tools_test.go:215
- [nitpick] Consider renaming 'errCheck' to a more descriptive name such as 'parseErrorOccurred' to improve readability.
errCheck := 1
There was a problem hiding this comment.
Pull Request Overview
This PR refactors the tool call parsing logic and enhances streaming support in the server. Key changes include updates to tool call parsing functions, adjustments to the streaming handler in ChatHandler, and improvements to template-based tool extraction and testing.
Reviewed Changes
Copilot reviewed 5 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| server/tools_test.go | Added and updated test cases for various tool call scenarios. |
| server/tools.go | Refactored tool call parsing logic with dedicated functions for JSON parsing and routing. |
| server/routes.go | Revised ChatHandler to enable streaming content and integrate tool call parsing during stream. |
| server/model_test.go | Updated tests for tool token extraction and template-based tool call detection. |
| server/model.go | Improved the extraction of tool call tokens from templates and refactored related helper functions. |
Files not reviewed (2)
- server/testdata/tools/qwen2.5-coder.gotmpl: Language not supported
- server/testdata/tools/qwen2.5-coder.out: Language not supported
server/model.go
Outdated
| start := -1 | ||
| end := -1 | ||
| for i, r := range tokenText { | ||
| if r == '<' || r == '[' { |
There was a problem hiding this comment.
This seems fragile. Why are we checking these fixed characters vs:
- Cut the prefix
- Begin partial parsing of tool calls
I think [ and ] might be needed to handle arrays though
There was a problem hiding this comment.
This function is to find the prefix 🙃
You can't split over whitespace as a template might have something like <tool_token>[ which should just be <tool_token> I tried to make it to capture enough cases but happy to just balance square brackets if that makes more sense.
jmorganca
left a comment
There was a problem hiding this comment.
This is looking good but needs more cleanup and simplification
Compatibility-wise it'll be the same as it is today - which is we stream out fully parsed tool calls and not the content that's being parsed out but recognized as tool calls immediately. It's partially why this PR took so long - there is not a good way to recognize whether a model is going to make a tool call or not given how they were trained. And to have a general solution to support a wide variety of models is a bit difficult. I'm sure we'll have more bumps through this big change (and I apologize in advance!) but please let me know if you do end up running into more issues with this. |
|
Nicely done @ParthSareen! Great to experience the |
|
Hi, first of all, thanks for this great work. I just compiled from the main branch and everything works nice. Just one small problem, there seems to be no line breaks at all or very few line breaks streaming back from multiple qwen modes if I provide the tool list. If I don't provide any tools, it will return fine with proper line breaks. This behavior never happened on 0.7.0 or the new 0.7.1, so I wonder if it is related to this PR. Thanks in advance. |
Thanks! And yeah I think I know what you're talking about. It's either the template or a particular section in the code for returning data. Will make sure it's fixed before release. Either will patch the branch or have new models to download. There will be an update to llama4 and qwen2.5-coder already. |
Oh! That's wonderful! Just to make sure that we are on the same page, I have a snippet that I get from a web searching agent that I made I already updated to the template (https://github.com/ollama/ollama/pull/10415/files#diff-f323d3e042fdba259d77406751678f2d96fe23987e80bf402aa3e354ca45798b) in this PR, before updating there is almost no line breaks, after updating there are some, but just very strange as you can see above. |
|
@ParthSareen Hi, I got more time to take a deeper look at your PR today, I find if I comment your modified tools.go from line 141 to 143 which is The line breaks are returned perfectly now. Maybe this replacement is done too broad (that should only be done for tool call section)? Is this really necessary for parsing tool prefix? Sorry but I didn't really understand after reading the logic here why needing to remove line breaks for prefix detection? Is it for some edge cases? But anyway, hope this info helps! Also, just commenting them seems to have no side effects for streaming tool calls with qwen3 and mistral small |
|
@benhaotang thanks for also digging in! My gut also was pointing to that so thanks for confirming. It's for the case where a prefix has a |
根因分析: 1. 参考 GitHub issues #26971, #27925 确认这是已知问题 2. 使用 bind_tools() 后,即使工具列表为空也会导致流式失效 3. 这是 LangChain + Ollama 集成的问题,不是 LangGraph 的问题 解决方案: 1. 分离流式和工具调用的 LLM 实例 2. self.llm 用于流式响应(不绑定工具) 3. self.llm_with_tools 用于需要工具的场景 4. 在 _chat 方法中使用 llm_with_tools 技术细节: - 问题来源:langchain-ai/langchain#26971 - Ollama PR:ollama/ollama#10415 - 这是上游正在修复的问题,我们的方案是临时解决 测试: - 添加 test_bind_tools_streaming.py 验证问题和解决方案 - 对比有无 bind_tools 的流式行为差异 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Demo
Simple tool calling
weather.py.mp4
Search tool calling
search.mp4
Incremental Parsing
Breaking Changes
This PR also warrants a change to the
qwen2.5-codertemplate due to incremental JSON parsing and focusing on tool prefixes.There a possibility that other models will break and this can happen in a couple considerations:
I'd say these are model specific problems that should pertain to training or templating.
Closes: #7014, #7886, #9632, ollama/ollama-python#463, #10712
Follow ups: