[bot] Mistral: Chat and agent tool-use responses lack child TOOL spans

## Summary

The Mistral integration captures `tool_calls` in the LLM span's output dictionary but does not create child `SpanTypeAttribute.TOOL` spans for individual tool calls. This is a tracing depth gap compared to the OpenAI, Anthropic, and Google GenAI integrations, which all decompose tool-use responses into dedicated child tool spans.

Mistral's Chat API and Agents API both support function calling / tool use. When a model response includes tool calls, users currently see them only as entries in the LLM span's output message — they cannot drill into individual tool invocations as separate spans in the Braintrust UI.

## What is missing

The Mistral tracing module (`py/src/braintrust/integrations/mistral/tracing.py`) accumulates tool calls into the output message but never creates child spans:

**Non-streaming path**: Tool calls from `response.choices[0].message.tool_calls` are serialized into the output dict.

**Streaming path** (line ~599): `_merge_tool_calls()` accumulates streaming tool call deltas into the message dict — stored flat in the LLM span.

**Span creation**: Only `SpanTypeAttribute.LLM` spans are ever created. No `SpanTypeAttribute.TOOL` child spans exist anywhere in the file.

Both `_CHAT_METADATA_KEYS` and `_AGENTS_METADATA_KEYS` include `parallel_tool_calls` as metadata, confirming tool use is a recognized feature — but the span-level decomposition is missing.

### Comparison with other integrations in this repo

| Integration | Tool call handling | Child TOOL spans? |
|---|---|---|
| **OpenAI** (Chat Completions) | `_log_response_tool_spans()` in `tracing.py` | **Yes** |
| **OpenAI** (Responses API) | `_log_response_tool_spans()` in `tracing.py` | **Yes** |
| **Anthropic** | `_log_server_tool_spans()` in `tracing.py` | **Yes** |
| **Google GenAI** | `_finalize_interaction_tool_spans()` in `tracing.py` | **Yes** |
| **Cohere** | `tool_calls` stored in LLM span output dict | **No** (tracked separately) |
| **Mistral** | `tool_calls` stored in LLM span output dict | **No** |

### What child TOOL spans should capture

For each tool call in the response:
- **Span name**: Tool function name (e.g. `tool: web_search`)
- **Span type**: `SpanTypeAttribute.TOOL`
- **Input**: Tool call arguments / parameters
- **Output**: (empty for client-side tool calls)
- **Metadata**: `tool_call_id`, tool type, tool index

This applies to both `client.chat.complete()` / `client.chat.stream()` and `client.agents.complete()` / `client.agents.stream()` tool-use responses.

## Braintrust docs status

**supported** (partial) — The [Mistral integration page](https://www.braintrust.dev/docs/integrations/ai-providers/mistral) documents chat completions and agents instrumentation. It does not mention tool span decomposition.

## Upstream sources

- Mistral Function Calling docs: https://docs.mistral.ai/capabilities/function_calling/
- Mistral supports parallel tool calling via `parallel_tool_calls` parameter
- Tool calls in responses follow the OpenAI-compatible format: `id`, `type: "function"`, `function.name`, `function.arguments`
- Agents API also supports tool use with the same response format

## Local files inspected

- `py/src/braintrust/integrations/mistral/tracing.py` — no `SpanTypeAttribute.TOOL` usage; tool calls stored flat in output message via `_merge_tool_calls()` (line 599)
- `py/src/braintrust/integrations/mistral/patchers.py` — Chat, Embeddings, FIM, Agents, Transcriptions, Speech, OCR, Conversations patchers defined; no tool-span-related logic
- `py/src/braintrust/integrations/openai/tracing.py` — `_log_response_tool_spans()` creates child TOOL spans (for comparison)
- `py/src/braintrust/integrations/anthropic/tracing.py` — `_log_server_tool_spans()` creates child TOOL spans (for comparison)
- `py/src/braintrust/integrations/google_genai/tracing.py` — `_finalize_interaction_tool_spans()` creates child TOOL spans (for comparison)

Integration	Tool call handling	Child TOOL spans?
OpenAI (Chat Completions)	`_log_response_tool_spans()` in `tracing.py`	Yes
OpenAI (Responses API)	`_log_response_tool_spans()` in `tracing.py`	Yes
Anthropic	`_log_server_tool_spans()` in `tracing.py`	Yes
Google GenAI	`_finalize_interaction_tool_spans()` in `tracing.py`	Yes
Cohere	`tool_calls` stored in LLM span output dict	No (tracked separately)
Mistral	`tool_calls` stored in LLM span output dict	No

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bot] Mistral: Chat and agent tool-use responses lack child TOOL spans #378

Summary

What is missing

Comparison with other integrations in this repo

What child TOOL spans should capture

Braintrust docs status

Upstream sources

Local files inspected

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[bot] Mistral: Chat and agent tool-use responses lack child TOOL spans #378

Description

Summary

What is missing

Comparison with other integrations in this repo

What child TOOL spans should capture

Braintrust docs status

Upstream sources

Local files inspected

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions