Skip to content

tools: refactor tool call parsing and enable streaming#10415

Merged
ParthSareen merged 24 commits intomainfrom
tool-parsing
May 23, 2025
Merged

tools: refactor tool call parsing and enable streaming#10415
ParthSareen merged 24 commits intomainfrom
tool-parsing

Conversation

@ParthSareen
Copy link
Member

@ParthSareen ParthSareen commented Apr 25, 2025

Demo

Simple tool calling

weather.py.mp4

Search tool calling

search.mp4

Incremental Parsing

  • Enables streaming and eventually extending to other types of tools - e.g. Python
  • Dynamically finds the tool special token to use as prefix check
  • Still allows for JSON tool parsing as a fallback if the response starts with a JSON parsable type

Breaking Changes

This PR also warrants a change to the qwen2.5-coder template due to incremental JSON parsing and focusing on tool prefixes.

There a possibility that other models will break and this can happen in a couple considerations:

  1. Model has a tool prefix defined, however does not output the prefix before making a tool call.
  2. Model does not output a JSON tool call right away if it does not have a prefix. Previously we'd greedily parse over the accumulated content and get all JSON.

I'd say these are model specific problems that should pertain to training or templating.

Closes: #7014, #7886, #9632, ollama/ollama-python#463, #10712

Follow ups:

  • Template updates for qwen and llama4
  • Remove leading spaces in general
  • Consider setting done to true
  • Python function calling: server/python_tools: add python tool parsing logic #10453
  • Use full prefix instead of single token for tools
  • "Pipelining" and sending JSON in multiple tool call setting faster

@ParthSareen ParthSareen force-pushed the tool-parsing branch 3 times, most recently from 9530834 to 88d836f Compare May 1, 2025 23:30
@ParthSareen ParthSareen changed the title model: support python tools and streaming server: support python tools and streaming May 1, 2025
@ParthSareen ParthSareen requested a review from Copilot May 1, 2025 23:40
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for Python tools and streaming in the server by enhancing tool call parsing and response handling.

  • Introduces new tests (server/tools_test.go and server/model_test.go) to validate various tool call scenarios.
  • Implements new helper functions in server/tools.go and server/model.go for parsing JSON tool call objects and extracting tool tokens from templates.
  • Updates server/routes.go to improve tool call handling in streamed responses.

Reviewed Changes

Copilot reviewed 5 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
server/tools_test.go Added comprehensive tests for tool call execution scenarios with different token formats.
server/tools.go Introduced functions to parse JSON tool calls and route parsing based on token prefixes.
server/routes.go Modified ChatHandler logic to correctly detect and process tool calls during streaming.
server/model_test.go Refactored tests for tool token extraction, ensuring expected behavior across edge cases.
server/model.go Added functions to extract text and tokens from templates, improving template-based parsing.
Files not reviewed (2)
  • server/testdata/tools/qwen2.5-coder.gotmpl: Language not supported
  • server/testdata/tools/qwen2.5-coder.out: Language not supported
Comments suppressed due to low confidence (1)

server/tools_test.go:215

  • [nitpick] Consider renaming 'errCheck' to a more descriptive name such as 'parseErrorOccurred' to improve readability.
errCheck := 1

@ParthSareen ParthSareen changed the title server: support python tools and streaming server: refactor tool call parsing and enable streaming May 2, 2025
@ParthSareen ParthSareen changed the title server: refactor tool call parsing and enable streaming server/tools: refactor tool call parsing and enable streaming May 2, 2025
@ParthSareen ParthSareen requested a review from Copilot May 2, 2025 21:18
@ParthSareen ParthSareen marked this pull request as ready for review May 2, 2025 21:18
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the tool call parsing logic and enhances streaming support in the server. Key changes include updates to tool call parsing functions, adjustments to the streaming handler in ChatHandler, and improvements to template-based tool extraction and testing.

Reviewed Changes

Copilot reviewed 5 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
server/tools_test.go Added and updated test cases for various tool call scenarios.
server/tools.go Refactored tool call parsing logic with dedicated functions for JSON parsing and routing.
server/routes.go Revised ChatHandler to enable streaming content and integrate tool call parsing during stream.
server/model_test.go Updated tests for tool token extraction and template-based tool call detection.
server/model.go Improved the extraction of tool call tokens from templates and refactored related helper functions.
Files not reviewed (2)
  • server/testdata/tools/qwen2.5-coder.gotmpl: Language not supported
  • server/testdata/tools/qwen2.5-coder.out: Language not supported

server/model.go Outdated
start := -1
end := -1
for i, r := range tokenText {
if r == '<' || r == '[' {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fragile. Why are we checking these fixed characters vs:

  1. Cut the prefix
  2. Begin partial parsing of tool calls

I think [ and ] might be needed to handle arrays though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is to find the prefix 🙃

You can't split over whitespace as a template might have something like <tool_token>[ which should just be <tool_token> I tried to make it to capture enough cases but happy to just balance square brackets if that makes more sense.

Copy link
Member

@jmorganca jmorganca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good but needs more cleanup and simplification

@ParthSareen
Copy link
Member Author

Thank you, happy to hear! I used the documentation from OpenAI a lot for reference to ensure compatibility with multiple providers that support it. https://platform.openai.com/docs/guides/function-calling?api-mode=chat#streaming https://platform.openai.com/docs/api-reference/chat-streaming/streaming

In what way would the chunks and tool_call contents be streamed differently? I assume it will still need to be accumulated?

Thank you for all your work on this! <3

Compatibility-wise it'll be the same as it is today - which is we stream out fully parsed tool calls and not the content that's being parsed out but recognized as tool calls immediately. It's partially why this PR took so long - there is not a good way to recognize whether a model is going to make a tool call or not given how they were trained. And to have a general solution to support a wide variety of models is a bit difficult. I'm sure we'll have more bumps through this big change (and I apologize in advance!) but please let me know if you do end up running into more issues with this.

@ParthSareen ParthSareen merged commit e8b981f into main May 23, 2025
8 checks passed
@ParthSareen ParthSareen deleted the tool-parsing branch May 23, 2025 21:19
@digitalextremist
Copy link

digitalextremist commented May 24, 2025

Nicely done @ParthSareen! Great to experience the Ollama team dynamic on this. Thank you all for the quality work!

@benhaotang
Copy link
Contributor

benhaotang commented May 24, 2025

Hi, first of all, thanks for this great work. I just compiled from the main branch and everything works nice. Just one small problem, there seems to be no line breaks at all or very few line breaks streaming back from multiple qwen modes if I provide the tool list. If I don't provide any tools, it will return fine with proper line breaks. This behavior never happened on 0.7.0 or the new 0.7.1, so I wonder if it is related to this PR. Thanks in advance.

@ParthSareen
Copy link
Member Author

Hi, first of all, thanks for this great work. I just compiled from the main branch and everything works nice. Just one small problem, there seems to be no line breaks at all or very few line breaks streaming back from multiple qwen modes if I provide the tool list. If I don't provide any tools, it will return fine with proper line breaks. This behavior never happened on 0.7.0 or the new 0.7.1, so I wonder if it is related to this PR. Thanks in advance.

Thanks! And yeah I think I know what you're talking about. It's either the template or a particular section in the code for returning data. Will make sure it's fixed before release. Either will patch the branch or have new models to download. There will be an update to llama4 and qwen2.5-coder already.

@benhaotang
Copy link
Contributor

Hi, first of all, thanks for this great work. I just compiled from the main branch and everything works nice. Just one small problem, there seems to be no line breaks at all or very few line breaks streaming back from multiple qwen modes if I provide the tool list. If I don't provide any tools, it will return fine with proper line breaks. This behavior never happened on 0.7.0 or the new 0.7.1, so I wonder if it is related to this PR. Thanks in advance.

Thanks! And yeah I think I know what you're talking about. It's either the template or a particular section in the code for returning data. Will make sure it's fixed before release. Either will patch the branch or have new models to download. There will be an update to llama4 and qwen2.5-coder already.

Oh! That's wonderful! Just to make sure that we are on the same page, I have a snippet that I get from a web searching agent that I made

Anthropic recently released **Claude 4**, a significant update to its AI model lineup, featuring two primary versions: **Claude Opus 4** and **Claude Sonnet 4**. Here's a breakdown of the key details:  ---  ### **1. Key Features of Claude 4** - **Claude Opus 4**: 
  - **Best-in-class coding model**: Excels at complex, long-running tasks and agent workflows (e.g., coding, problem-solving).   - **Extended thinking with tool use**: Can alternate between reasoning and tool use (e.g., web search) for deeper problem-solving.   - **Memory and continuity**: Improved ability to retain context over extended interactions, enabling tasks like multi-step reasoning or prolonged agent workflows.   - **Safety measures**: Released under stricter safeguards (AI Safety Level 3) due to concerns about potential misuse in sensitive areas like bioweapon development. Anthropic emphasizes caution in deployment.  - **Claude Sonnet 4**: 
  - **Enhanced reasoning and coding**: A major upgrade ......

I already updated to the template (https://github.com/ollama/ollama/pull/10415/files#diff-f323d3e042fdba259d77406751678f2d96fe23987e80bf402aa3e354ca45798b) in this PR, before updating there is almost no line breaks, after updating there are some, but just very strange as you can see above.

@benhaotang
Copy link
Contributor

benhaotang commented May 25, 2025

@ParthSareen Hi, I got more time to take a deeper look at your PR today, I find if I comment your modified tools.go from line 141 to 143 which is

	if strings.ContainsRune(s, '\n') {
		s = strings.ReplaceAll(s, "\n", " ")
	}

The line breaks are returned perfectly now. Maybe this replacement is done too broad (that should only be done for tool call section)? Is this really necessary for parsing tool prefix? Sorry but I didn't really understand after reading the logic here why needing to remove line breaks for prefix detection? Is it for some edge cases? But anyway, hope this info helps!

Also, just commenting them seems to have no side effects for streaming tool calls with qwen3 and mistral small

@ParthSareen
Copy link
Member Author

@benhaotang thanks for also digging in! My gut also was pointing to that so thanks for confirming.

It's for the case where a prefix has a \n in it e.g. [TOOL_CALL] \n [ where the model might output a space instead of a \n. To mitigate that we replace the newlines. However I think I should be able to move this to just the prefix checking portion rather than globally in the parser. Thanks for the help 😄

halfcrazy pushed a commit to halfcrazy/ollama that referenced this pull request Jun 19, 2025
zuozuo added a commit to zuozuo/learn_mate that referenced this pull request Jun 24, 2025
根因分析:
1. 参考 GitHub issues #26971, #27925 确认这是已知问题
2. 使用 bind_tools() 后,即使工具列表为空也会导致流式失效
3. 这是 LangChain + Ollama 集成的问题,不是 LangGraph 的问题

解决方案:
1. 分离流式和工具调用的 LLM 实例
2. self.llm 用于流式响应(不绑定工具)
3. self.llm_with_tools 用于需要工具的场景
4. 在 _chat 方法中使用 llm_with_tools

技术细节:
- 问题来源:langchain-ai/langchain#26971
- Ollama PR:ollama/ollama#10415
- 这是上游正在修复的问题,我们的方案是临时解决

测试:
- 添加 test_bind_tools_streaming.py 验证问题和解决方案
- 对比有无 bind_tools 的流式行为差异

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Tobix99 pushed a commit to Tobix99/ollama that referenced this pull request Jun 26, 2025
jrschumacher pushed a commit to jrschumacher/ollama that referenced this pull request Dec 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Better Tool Call parsing