-
Notifications
You must be signed in to change notification settings - Fork 943
Closed
Description
Description
When using tools with stream=True in the chat function, the streaming output doesn't work as expected. The responses are not yielded incrementally, but instead, they seem to be processed only after all tool calls are completed.
Steps to Reproduce
- Use the following code to set up a chat with a tool-enabled model:
import ollama
OLLAMA_CLIENT = ollama.Client(host="172.16.2.96:11434")
def get_location() -> str:
"""
Get the current geographic location.
Returns:
str: Current geographic location.
"""
return "Shanghai"
def get_weather(location: str) -> str:
"""
Get the weather conditions for a specific location.
Args:
location (str): Geographic location.
Returns:
str: Weather conditions.
"""
return "Sunny, temperature 25°C."
available_functions = {
'get_location': get_location,
'get_weather': get_weather,
}
def chat_generate(options):
tool_calls = []
for part in OLLAMA_CLIENT.chat(**options):
yield part.message.content # Expecting incremental streaming here
options["messages"].append(part.message)
if part.message.tool_calls:
for tool in part.message.tool_calls:
if function_to_call := available_functions.get(tool.function.name):
output = function_to_call(**tool.function.arguments)
tool_calls.append({
'role': 'tool', 'content': str(output), 'name': tool.function.name
})
for result in tool_calls:
print(result)
options["messages"].append(result)
if len(tool_calls) > 0:
for result in chat_generate(options):
yield result
for result in chat_generate({
"model": "qwq",
"messages": [
{"role": "user", "content": "How is the weather now?"},
],
"stream": True,
"tools": [get_location, get_weather]
}):
print(result)- Run the script and observe the behavior.
Expected Behavior
The response should be streamed incrementally, even when tool calls are involved.
Actual Behavior
- The response is blocked until all tool calls are completed.
- No incremental output is yielded while waiting for the tool responses.
Environment
- Ollama Server Version: 0.5.12
- Ollama Python Client Version: 0.4.7
- Python: 3.8.5
- OS: windows11
Additional Information
It seems like the OLLAMA_CLIENT.chat function doesn't return partial results when tool calls are required. Is there a way to make tool calls asynchronous while maintaining streaming behavior?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels