Add trace ingestion for Gateway endpoints#20358
Merged
TomeHirata merged 7 commits intomlflow:masterfrom Feb 4, 2026
Merged
Conversation
This was referenced Jan 27, 2026
f0c77e6 to
c7f3a23
Compare
Contributor
🛠 DevTools 🛠
Install mlflow from this PRFor Databricks, use the following command: |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds trace ingestion capabilities to MLflow Gateway endpoints, enabling automatic tracing and usage tracking for gateway invocations. It's part of a stacked PR series building on #20356.
Changes:
- Adds
experiment_idandusage_trackingfields to gateway endpoints for configuring trace destinations - Implements automatic trace creation with token usage tracking for all gateway invocations (chat, embeddings, passthrough)
- Adds
TracingProviderWrapperto instrument provider calls with tracing spans - Updates provider implementations (OpenAI, Anthropic, Gemini) to extract usage data from streaming responses
- Adds database migration and schema updates across all supported databases
- Implements frontend UI components for configuring usage tracking during endpoint creation
Reviewed changes
Copilot reviewed 37 out of 39 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| mlflow/store/db_migrations/versions/d0e1f2a3b4c5_add_experiment_id_to_endpoints.py | Database migration to add experiment_id and usage_tracking columns |
| mlflow/server/gateway_api.py | Gateway API changes to create traces for invocations |
| mlflow/gateway/providers/base.py | TracingProviderWrapper implementation for automatic instrumentation |
| mlflow/gateway/providers/*.py | Provider-specific changes to extract usage from streaming responses |
| mlflow/entities/gateway_endpoint.py | Entity updates to support new fields |
| mlflow/types/chat.py | Added usage field to ChatCompletionChunk for streaming usage |
| tests/gateway/providers/test_tracing.py | Comprehensive tests for tracing wrapper |
| Frontend files | UI components for usage tracking configuration |
Comments suppressed due to low confidence (1)
mlflow/gateway/providers/anthropic.py:507
- The usage data is only emitted in the final
message_deltachunk (line 502-507), but not in other streaming chunks. This means if there are multiple indices in the stream, each index will get the same accumulated usage data in the final message_delta. This could lead to usage being attributed to the wrong choice index or duplicated usage counts if consumed incorrectly. Consider documenting this behavior or ensuring usage is only emitted once per stream.
if resp["type"] == "message_delta":
# Capture output_tokens from message_delta
if delta_usage := resp.get("usage"):
usage_data["output_tokens"] = delta_usage.get("output_tokens")
# Include accumulated usage in the response
resp["_usage_data"] = usage_data
for index in indices:
yield AnthropicAdapter.model_to_chat_streaming(
{**resp, "index": index},
self.config,
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Contributor
|
Documentation preview for 6650a0d is available at: More info
|
c7f3a23 to
cc324f9
Compare
serena-ruan
reviewed
Jan 29, 2026
serena-ruan
reviewed
Jan 29, 2026
serena-ruan
reviewed
Jan 29, 2026
serena-ruan
reviewed
Jan 29, 2026
serena-ruan
reviewed
Jan 29, 2026
serena-ruan
reviewed
Jan 29, 2026
serena-ruan
reviewed
Jan 29, 2026
serena-ruan
reviewed
Jan 29, 2026
serena-ruan
reviewed
Jan 29, 2026
serena-ruan
reviewed
Jan 29, 2026
8d3f120 to
9ecca34
Compare
13e0217 to
b796c54
Compare
29 tasks
5b80f44 to
6b3ce52
Compare
Add a TracingProviderWrapper that instruments gateway providers with MLflow tracing spans. This wrapper captures: - Provider name and model information as span attributes - Token usage from streaming and non-streaming responses - Error handling with proper span status updates Also adds experiment_id to GatewayEndpointConfig to support tracing to specific experiments, and gateway-specific metadata keys for filtering traces by endpoint. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
6b3ce52 to
33b738c
Compare
serena-ruan
approved these changes
Feb 3, 2026
Collaborator
serena-ruan
left a comment
There was a problem hiding this comment.
Overall LGTM depending on the conclusion of https://github.com/mlflow/mlflow/pull/20358/changes#r2757178509
serena-ruan
reviewed
Feb 4, 2026
serena-ruan
reviewed
Feb 4, 2026
Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🥞 Stacked PR
Use this link to review incremental changes.
Related Issues/PRs
n/a
What changes are proposed in this pull request?
Generate a trace for gateway invocation
How is this PR tested?
Does this PR require documentation update?
Release Notes
Is this a user-facing change?
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/tracking: Tracking Service, tracking client APIs, autologgingarea/models: MLmodel format, model serialization/deserialization, flavorsarea/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registryarea/scoring: MLflow Model server, model deployment tools, Spark UDFsarea/evaluation: MLflow model evaluation features, evaluation metrics, and evaluation workflowsarea/gateway: MLflow AI Gateway client APIs, server, and third-party integrationsarea/prompts: MLflow prompt engineering features, prompt templates, and prompt managementarea/tracing: MLflow Tracing features, tracing APIs, and LLM tracing functionalityarea/projects: MLproject format, project running backendsarea/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/build: Build and test infrastructure for MLflowarea/docs: MLflow documentation pagesHow should the PR be classified in the release notes? Choose one:
rn/none- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/breaking-change- The PR will be mentioned in the "Breaking Changes" sectionrn/feature- A new user-facing feature worth mentioning in the release notesrn/bug-fix- A user-facing bug fix worth mentioning in the release notesrn/documentation- A user-facing documentation change worth mentioning in the release notesShould this PR be included in the next patch release?
Yesshould be selected for bug fixes, documentation updates, and other small changes.Noshould be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.What is a minor/patch release?
Bug fixes, doc updates and new features usually go into minor releases.
Bug fixes and doc updates usually go into patch releases.