Add trace ingestion for Gateway endpoints by TomeHirata · Pull Request #20358 · mlflow/mlflow

TomeHirata · 2026-01-27T10:13:34Z

🥞 Stacked PR

Use this link to review incremental changes.

stack/gateway-trace-ingestion [Files changed]
- stack/gateway-trace-api [Files changed]
  - stack/gateway-frontend-usage-ui [Files changed]

Related Issues/PRs

n/a

What changes are proposed in this pull request?

Generate a trace for gateway invocation

How is this PR tested?

Existing unit/integration tests
New unit/integration tests
Manual tests

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

How should the PR be classified in the release notes? Choose one:

rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

Should this PR be included in the next patch release?

Yes should be selected for bug fixes, documentation updates, and other small changes. No should be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.

What is a minor/patch release?

Minor release: a release that increments the second part of the version number (e.g., 1.2.0 -> 1.3.0).
Bug fixes, doc updates and new features usually go into minor releases.
Patch release: a release that increments the third part of the version number (e.g., 1.2.0 -> 1.2.1).
Bug fixes and doc updates usually go into patch releases.

Yes (this PR will be cherry-picked and included in the next patch release)
No (this PR will be included in the next minor release)

github-actions · 2026-01-28T10:21:42Z

🛠 DevTools 🛠

Install mlflow from this PR

# mlflow
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/20358/merge
# mlflow-skinny
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/20358/merge#subdirectory=libs/skinny

For Databricks, use the following command:

%sh curl -LsSf https://raw.githubusercontent.com/mlflow/mlflow/HEAD/dev/install-skinny.sh | sh -s pull/20358/merge

Copilot

Pull request overview

This PR adds trace ingestion capabilities to MLflow Gateway endpoints, enabling automatic tracing and usage tracking for gateway invocations. It's part of a stacked PR series building on #20356.

Changes:

Adds experiment_id and usage_tracking fields to gateway endpoints for configuring trace destinations
Implements automatic trace creation with token usage tracking for all gateway invocations (chat, embeddings, passthrough)
Adds TracingProviderWrapper to instrument provider calls with tracing spans
Updates provider implementations (OpenAI, Anthropic, Gemini) to extract usage data from streaming responses
Adds database migration and schema updates across all supported databases
Implements frontend UI components for configuring usage tracking during endpoint creation

Reviewed changes

Copilot reviewed 37 out of 39 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
mlflow/store/db_migrations/versions/d0e1f2a3b4c5_add_experiment_id_to_endpoints.py	Database migration to add experiment_id and usage_tracking columns
mlflow/server/gateway_api.py	Gateway API changes to create traces for invocations
mlflow/gateway/providers/base.py	TracingProviderWrapper implementation for automatic instrumentation
mlflow/gateway/providers/*.py	Provider-specific changes to extract usage from streaming responses
mlflow/entities/gateway_endpoint.py	Entity updates to support new fields
mlflow/types/chat.py	Added usage field to ChatCompletionChunk for streaming usage
tests/gateway/providers/test_tracing.py	Comprehensive tests for tracing wrapper
Frontend files	UI components for usage tracking configuration

Comments suppressed due to low confidence (1)

mlflow/gateway/providers/anthropic.py:507

The usage data is only emitted in the final message_delta chunk (line 502-507), but not in other streaming chunks. This means if there are multiple indices in the stream, each index will get the same accumulated usage data in the final message_delta. This could lead to usage being attributed to the wrong choice index or duplicated usage counts if consumed incorrectly. Consider documenting this behavior or ensuring usage is only emitted once per stream.

            if resp["type"] == "message_delta":
                # Capture output_tokens from message_delta
                if delta_usage := resp.get("usage"):
                    usage_data["output_tokens"] = delta_usage.get("output_tokens")
                # Include accumulated usage in the response
                resp["_usage_data"] = usage_data
                for index in indices:
                    yield AnthropicAdapter.model_to_chat_streaming(
                        {**resp, "index": index},
                        self.config,
                    )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mlflow/server/gateway_api.py

github-actions · 2026-01-28T10:31:09Z

Documentation preview for 6650a0d is available at:

https://pr-20358--mlflow-docs-preview.netlify.app/docs/latest/

More info

Ignore this comment if this PR does not change the documentation.
The preview is updated when a new commit is pushed to this PR.
This comment was created by this workflow run.
The documentation was built by this workflow run.

mlflow/gateway/providers/openai.py

mlflow/gateway/providers/tracing.py

mlflow/tracing/constant.py

mlflow/gateway/providers/tracing.py

Add a TracingProviderWrapper that instruments gateway providers with MLflow tracing spans. This wrapper captures: - Provider name and model information as span attributes - Token usage from streaming and non-streaming responses - Error handling with proper span status updates Also adds experiment_id to GatewayEndpointConfig to support tracing to specific experiments, and gateway-specific metadata keys for filtering traces by endpoint. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>

Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>

serena-ruan

Overall LGTM depending on the conclusion of https://github.com/mlflow/mlflow/pull/20358/changes#r2757178509

Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>

mlflow/gateway/providers/base.py

Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>

This was referenced Jan 27, 2026

link gateway and experiment #20356

Merged

Add usage section in endpoint page #20357

Merged

TomeHirata force-pushed the stack/gateway-trace-ingestion branch 7 times, most recently from f0c77e6 to c7f3a23 Compare January 28, 2026 07:36

TomeHirata added the team-review Trigger a team review request label Jan 28, 2026

TomeHirata marked this pull request as ready for review January 28, 2026 10:21

Copilot AI review requested due to automatic review settings January 28, 2026 10:21

github-actions bot requested review from daniellok-db and serena-ruan January 28, 2026 10:21

Copilot started reviewing on behalf of TomeHirata January 28, 2026 10:21 View session

Copilot AI reviewed Jan 28, 2026

View reviewed changes

mlflow/server/gateway_api.py Outdated Show resolved Hide resolved

github-actions bot added the rn/feature Mention under Features in Changelogs. label Jan 28, 2026

TomeHirata force-pushed the stack/gateway-trace-ingestion branch from c7f3a23 to cc324f9 Compare January 29, 2026 06:29