tools: refactor tool call parsing and enable streaming by ParthSareen · Pull Request #10415 · ollama/ollama

ParthSareen · 2025-04-25T23:35:48Z

Demo

Simple tool calling

weather.py.mp4

Search tool calling

search.mp4

Incremental Parsing

Enables streaming and eventually extending to other types of tools - e.g. Python
Dynamically finds the tool special token to use as prefix check
Still allows for JSON tool parsing as a fallback if the response starts with a JSON parsable type

Breaking Changes

This PR also warrants a change to the qwen2.5-coder template due to incremental JSON parsing and focusing on tool prefixes.

There a possibility that other models will break and this can happen in a couple considerations:

Model has a tool prefix defined, however does not output the prefix before making a tool call.
Model does not output a JSON tool call right away if it does not have a prefix. Previously we'd greedily parse over the accumulated content and get all JSON.

I'd say these are model specific problems that should pertain to training or templating.

Closes: #7014, #7886, #9632, ollama/ollama-python#463, #10712

Follow ups:

Template updates for qwen and llama4
Remove leading spaces in general
Consider setting done to true
Python function calling: server/python_tools: add python tool parsing logic #10453
Use full prefix instead of single token for tools
"Pipelining" and sending JSON in multiple tool call setting faster

Copilot

Pull Request Overview

This PR adds support for Python tools and streaming in the server by enhancing tool call parsing and response handling.

Introduces new tests (server/tools_test.go and server/model_test.go) to validate various tool call scenarios.
Implements new helper functions in server/tools.go and server/model.go for parsing JSON tool call objects and extracting tool tokens from templates.
Updates server/routes.go to improve tool call handling in streamed responses.

Reviewed Changes

Copilot reviewed 5 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
server/tools_test.go	Added comprehensive tests for tool call execution scenarios with different token formats.
server/tools.go	Introduced functions to parse JSON tool calls and route parsing based on token prefixes.
server/routes.go	Modified ChatHandler logic to correctly detect and process tool calls during streaming.
server/model_test.go	Refactored tests for tool token extraction, ensuring expected behavior across edge cases.
server/model.go	Added functions to extract text and tokens from templates, improving template-based parsing.

Files not reviewed (2)

server/testdata/tools/qwen2.5-coder.gotmpl: Language not supported
server/testdata/tools/qwen2.5-coder.out: Language not supported

Comments suppressed due to low confidence (1)

server/tools_test.go:215

[nitpick] Consider renaming 'errCheck' to a more descriptive name such as 'parseErrorOccurred' to improve readability.

errCheck := 1

server/routes.go

server/model.go

server/routes.go

Copilot

Pull Request Overview

This PR refactors the tool call parsing logic and enhances streaming support in the server. Key changes include updates to tool call parsing functions, adjustments to the streaming handler in ChatHandler, and improvements to template-based tool extraction and testing.

Reviewed Changes

Copilot reviewed 5 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
server/tools_test.go	Added and updated test cases for various tool call scenarios.
server/tools.go	Refactored tool call parsing logic with dedicated functions for JSON parsing and routing.
server/routes.go	Revised ChatHandler to enable streaming content and integrate tool call parsing during stream.
server/model_test.go	Updated tests for tool token extraction and template-based tool call detection.
server/model.go	Improved the extraction of tool call tokens from templates and refactored related helper functions.

Files not reviewed (2)

server/testdata/tools/qwen2.5-coder.gotmpl: Language not supported
server/testdata/tools/qwen2.5-coder.out: Language not supported

server/model.go

server/routes.go

server/tools.go

server/model.go

jmorganca · 2025-05-03T23:38:57Z

server/model.go

+	start := -1
+	end := -1
+	for i, r := range tokenText {
+		if r == '<' || r == '[' {


This seems fragile. Why are we checking these fixed characters vs:

Cut the prefix

Begin partial parsing of tool calls

I think [ and ] might be needed to handle arrays though

This function is to find the prefix 🙃

You can't split over whitespace as a template might have something like <tool_token>[ which should just be <tool_token> I tried to make it to capture enough cases but happy to just balance square brackets if that makes more sense.

server/routes.go

server/tools.go

server/tools_test.go

jmorganca

This is looking good but needs more cleanup and simplification

server/routes.go

ParthSareen · 2025-05-22T18:53:55Z

Thank you, happy to hear! I used the documentation from OpenAI a lot for reference to ensure compatibility with multiple providers that support it. https://platform.openai.com/docs/guides/function-calling?api-mode=chat#streaming https://platform.openai.com/docs/api-reference/chat-streaming/streaming

In what way would the chunks and tool_call contents be streamed differently? I assume it will still need to be accumulated?

Thank you for all your work on this! <3

Compatibility-wise it'll be the same as it is today - which is we stream out fully parsed tool calls and not the content that's being parsed out but recognized as tool calls immediately. It's partially why this PR took so long - there is not a good way to recognize whether a model is going to make a tool call or not given how they were trained. And to have a general solution to support a wide variety of models is a bit difficult. I'm sure we'll have more bumps through this big change (and I apologize in advance!) but please let me know if you do end up running into more issues with this.

digitalextremist · 2025-05-24T02:17:17Z

Nicely done @ParthSareen! Great to experience the Ollama team dynamic on this. Thank you all for the quality work!

benhaotang · 2025-05-24T15:46:20Z

Hi, first of all, thanks for this great work. I just compiled from the main branch and everything works nice. Just one small problem, there seems to be no line breaks at all or very few line breaks streaming back from multiple qwen modes if I provide the tool list. If I don't provide any tools, it will return fine with proper line breaks. This behavior never happened on 0.7.0 or the new 0.7.1, so I wonder if it is related to this PR. Thanks in advance.

ParthSareen · 2025-05-24T17:01:41Z

Hi, first of all, thanks for this great work. I just compiled from the main branch and everything works nice. Just one small problem, there seems to be no line breaks at all or very few line breaks streaming back from multiple qwen modes if I provide the tool list. If I don't provide any tools, it will return fine with proper line breaks. This behavior never happened on 0.7.0 or the new 0.7.1, so I wonder if it is related to this PR. Thanks in advance.

Thanks! And yeah I think I know what you're talking about. It's either the template or a particular section in the code for returning data. Will make sure it's fixed before release. Either will patch the branch or have new models to download. There will be an update to llama4 and qwen2.5-coder already.

benhaotang · 2025-05-24T17:40:15Z

Hi, first of all, thanks for this great work. I just compiled from the main branch and everything works nice. Just one small problem, there seems to be no line breaks at all or very few line breaks streaming back from multiple qwen modes if I provide the tool list. If I don't provide any tools, it will return fine with proper line breaks. This behavior never happened on 0.7.0 or the new 0.7.1, so I wonder if it is related to this PR. Thanks in advance.

Thanks! And yeah I think I know what you're talking about. It's either the template or a particular section in the code for returning data. Will make sure it's fixed before release. Either will patch the branch or have new models to download. There will be an update to llama4 and qwen2.5-coder already.

Oh! That's wonderful! Just to make sure that we are on the same page, I have a snippet that I get from a web searching agent that I made

Anthropic recently released **Claude 4**, a significant update to its AI model lineup, featuring two primary versions: **Claude Opus 4** and **Claude Sonnet 4**. Here's a breakdown of the key details:  ---  ### **1. Key Features of Claude 4** - **Claude Opus 4**: 
  - **Best-in-class coding model**: Excels at complex, long-running tasks and agent workflows (e.g., coding, problem-solving).   - **Extended thinking with tool use**: Can alternate between reasoning and tool use (e.g., web search) for deeper problem-solving.   - **Memory and continuity**: Improved ability to retain context over extended interactions, enabling tasks like multi-step reasoning or prolonged agent workflows.   - **Safety measures**: Released under stricter safeguards (AI Safety Level 3) due to concerns about potential misuse in sensitive areas like bioweapon development. Anthropic emphasizes caution in deployment.  - **Claude Sonnet 4**: 
  - **Enhanced reasoning and coding**: A major upgrade ......

I already updated to the template (https://github.com/ollama/ollama/pull/10415/files#diff-f323d3e042fdba259d77406751678f2d96fe23987e80bf402aa3e354ca45798b) in this PR, before updating there is almost no line breaks, after updating there are some, but just very strange as you can see above.

benhaotang · 2025-05-25T19:28:00Z

@ParthSareen Hi, I got more time to take a deeper look at your PR today, I find if I comment your modified tools.go from line 141 to 143 which is

	if strings.ContainsRune(s, '\n') {
		s = strings.ReplaceAll(s, "\n", " ")
	}

The line breaks are returned perfectly now. Maybe this replacement is done too broad (that should only be done for tool call section)? Is this really necessary for parsing tool prefix? Sorry but I didn't really understand after reading the logic here why needing to remove line breaks for prefix detection? Is it for some edge cases? But anyway, hope this info helps!

Also, just commenting them seems to have no side effects for streaming tool calls with qwen3 and mistral small

ParthSareen · 2025-05-25T22:53:24Z

@benhaotang thanks for also digging in! My gut also was pointing to that so thanks for confirming.

It's for the case where a prefix has a \n in it e.g. [TOOL_CALL] \n [ where the model might output a space instead of a \n. To mitigate that we replace the newlines. However I think I should be able to move this to just the prefix checking portion rather than globally in the parser. Thanks for the help 😄

tools/tools_utils.go

根因分析： 1. 参考 GitHub issues #26971, #27925 确认这是已知问题 2. 使用 bind_tools() 后，即使工具列表为空也会导致流式失效 3. 这是 LangChain + Ollama 集成的问题，不是 LangGraph 的问题解决方案： 1. 分离流式和工具调用的 LLM 实例 2. self.llm 用于流式响应（不绑定工具） 3. self.llm_with_tools 用于需要工具的场景 4. 在 _chat 方法中使用 llm_with_tools 技术细节： - 问题来源：langchain-ai/langchain#26971 - Ollama PR：ollama/ollama#10415 - 这是上游正在修复的问题，我们的方案是临时解决测试： - 添加 test_bind_tools_streaming.py 验证问题和解决方案 - 对比有无 bind_tools 的流式行为差异 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

ParthSareen force-pushed the tool-parsing branch 3 times, most recently from 9530834 to 88d836f Compare May 1, 2025 23:30

ParthSareen changed the title ~~model: support python tools and streaming~~ server: support python tools and streaming May 1, 2025

ParthSareen requested a review from Copilot May 1, 2025 23:40

Copilot AI reviewed May 1, 2025

View reviewed changes

server/routes.go Outdated Show resolved Hide resolved

server/model.go Outdated Show resolved Hide resolved

jmorganca reviewed May 1, 2025

View reviewed changes

server/model.go Outdated Show resolved Hide resolved

jmorganca reviewed May 1, 2025

View reviewed changes

server/routes.go Outdated Show resolved Hide resolved

ParthSareen changed the title ~~server: support python tools and streaming~~ server: refactor tool call parsing and enable streaming May 2, 2025

ParthSareen force-pushed the tool-parsing branch from 5ae3c15 to 0bdb8fe Compare May 2, 2025 21:17

ParthSareen changed the title ~~server: refactor tool call parsing and enable streaming~~ server/tools: refactor tool call parsing and enable streaming May 2, 2025

ParthSareen requested a review from Copilot May 2, 2025 21:18

ParthSareen marked this pull request as ready for review May 2, 2025 21:18

Copilot AI reviewed May 2, 2025

View reviewed changes

server/model.go Outdated Show resolved Hide resolved

server/routes.go Outdated Show resolved Hide resolved

server/tools.go Outdated Show resolved Hide resolved

ParthSareen force-pushed the tool-parsing branch from 0bdb8fe to 34065c2 Compare May 2, 2025 23:24

ParthSareen mentioned this pull request May 3, 2025

Streaming Doesn't Work When Using Tools ollama/ollama-python#463

Closed