Skip to content

[bot] Mistral: Chat and agent tool-use responses lack child TOOL spans #378

@braintrust-bot

Description

@braintrust-bot

Summary

The Mistral integration captures tool_calls in the LLM span's output dictionary but does not create child SpanTypeAttribute.TOOL spans for individual tool calls. This is a tracing depth gap compared to the OpenAI, Anthropic, and Google GenAI integrations, which all decompose tool-use responses into dedicated child tool spans.

Mistral's Chat API and Agents API both support function calling / tool use. When a model response includes tool calls, users currently see them only as entries in the LLM span's output message — they cannot drill into individual tool invocations as separate spans in the Braintrust UI.

What is missing

The Mistral tracing module (py/src/braintrust/integrations/mistral/tracing.py) accumulates tool calls into the output message but never creates child spans:

Non-streaming path: Tool calls from response.choices[0].message.tool_calls are serialized into the output dict.

Streaming path (line ~599): _merge_tool_calls() accumulates streaming tool call deltas into the message dict — stored flat in the LLM span.

Span creation: Only SpanTypeAttribute.LLM spans are ever created. No SpanTypeAttribute.TOOL child spans exist anywhere in the file.

Both _CHAT_METADATA_KEYS and _AGENTS_METADATA_KEYS include parallel_tool_calls as metadata, confirming tool use is a recognized feature — but the span-level decomposition is missing.

Comparison with other integrations in this repo

Integration Tool call handling Child TOOL spans?
OpenAI (Chat Completions) _log_response_tool_spans() in tracing.py Yes
OpenAI (Responses API) _log_response_tool_spans() in tracing.py Yes
Anthropic _log_server_tool_spans() in tracing.py Yes
Google GenAI _finalize_interaction_tool_spans() in tracing.py Yes
Cohere tool_calls stored in LLM span output dict No (tracked separately)
Mistral tool_calls stored in LLM span output dict No

What child TOOL spans should capture

For each tool call in the response:

  • Span name: Tool function name (e.g. tool: web_search)
  • Span type: SpanTypeAttribute.TOOL
  • Input: Tool call arguments / parameters
  • Output: (empty for client-side tool calls)
  • Metadata: tool_call_id, tool type, tool index

This applies to both client.chat.complete() / client.chat.stream() and client.agents.complete() / client.agents.stream() tool-use responses.

Braintrust docs status

supported (partial) — The Mistral integration page documents chat completions and agents instrumentation. It does not mention tool span decomposition.

Upstream sources

  • Mistral Function Calling docs: https://docs.mistral.ai/capabilities/function_calling/
  • Mistral supports parallel tool calling via parallel_tool_calls parameter
  • Tool calls in responses follow the OpenAI-compatible format: id, type: "function", function.name, function.arguments
  • Agents API also supports tool use with the same response format

Local files inspected

  • py/src/braintrust/integrations/mistral/tracing.py — no SpanTypeAttribute.TOOL usage; tool calls stored flat in output message via _merge_tool_calls() (line 599)
  • py/src/braintrust/integrations/mistral/patchers.py — Chat, Embeddings, FIM, Agents, Transcriptions, Speech, OCR, Conversations patchers defined; no tool-span-related logic
  • py/src/braintrust/integrations/openai/tracing.py_log_response_tool_spans() creates child TOOL spans (for comparison)
  • py/src/braintrust/integrations/anthropic/tracing.py_log_server_tool_spans() creates child TOOL spans (for comparison)
  • py/src/braintrust/integrations/google_genai/tracing.py_finalize_interaction_tool_spans() creates child TOOL spans (for comparison)

Metadata

Metadata

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions