Skip to content

Streaming Doesn't Work When Using Tools #463

@VacantHusky

Description

@VacantHusky

Description

When using tools with stream=True in the chat function, the streaming output doesn't work as expected. The responses are not yielded incrementally, but instead, they seem to be processed only after all tool calls are completed.

Steps to Reproduce

  1. Use the following code to set up a chat with a tool-enabled model:
import ollama

OLLAMA_CLIENT = ollama.Client(host="172.16.2.96:11434")

def get_location() -> str:
    """
    Get the current geographic location.

    Returns:
    str: Current geographic location.
    """
    return "Shanghai"


def get_weather(location: str) -> str:
    """
    Get the weather conditions for a specific location.

    Args:
    location (str): Geographic location.

    Returns:
    str: Weather conditions.
    """
    return "Sunny, temperature 25°C."

available_functions = {
    'get_location': get_location,
    'get_weather': get_weather,
}

def chat_generate(options):
    tool_calls = []
    for part in OLLAMA_CLIENT.chat(**options):
        yield part.message.content  # Expecting incremental streaming here
        options["messages"].append(part.message)
        if part.message.tool_calls:
            for tool in part.message.tool_calls:
                if function_to_call := available_functions.get(tool.function.name):
                    output = function_to_call(**tool.function.arguments)
                    tool_calls.append({
                        'role': 'tool', 'content': str(output), 'name': tool.function.name
                    })

    for result in tool_calls:
        print(result)
        options["messages"].append(result)

    if len(tool_calls) > 0:
        for result in chat_generate(options):
            yield result

for result in chat_generate({
    "model": "qwq",
    "messages": [
        {"role": "user", "content": "How is the weather now?"},
    ],
    "stream": True,
    "tools": [get_location, get_weather]
}):
    print(result)
  1. Run the script and observe the behavior.

Expected Behavior

The response should be streamed incrementally, even when tool calls are involved.

Actual Behavior

  • The response is blocked until all tool calls are completed.
  • No incremental output is yielded while waiting for the tool responses.

Environment

  • Ollama Server Version: 0.5.12
  • Ollama Python Client Version: 0.4.7
  • Python: 3.8.5
  • OS: windows11

Additional Information

It seems like the OLLAMA_CLIENT.chat function doesn't return partial results when tool calls are required. Is there a way to make tool calls asynchronous while maintaining streaming behavior?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions