Add tracing integration to Gateway API endpoints#20495
Add tracing integration to Gateway API endpoints#20495TomeHirata merged 13 commits intomlflow:masterfrom
Conversation
c85cd23 to
f62f74c
Compare
🛠 DevTools 🛠
Install mlflow from this PRFor Databricks, use the following command: |
f62f74c to
8f4c23a
Compare
|
Documentation preview for cabbc7c is available at: More info
|
There was a problem hiding this comment.
Pull request overview
This PR adds first-class tracing / usage-tracking support for MLflow Gateway endpoints by persisting a tracing destination (experiment) on endpoints, instrumenting provider calls with spans (including streaming token usage where available), and exposing usage-tracking configuration in the Gateway UI.
Changes:
- Add
usage_tracking+experiment_idto Gateway endpoint persistence (DB schema, migrations, protos, entities, REST/SQL stores). - Instrument Gateway request handling and providers with MLflow tracing spans, including streamed token-usage extraction for select providers.
- Extend UI and tests to configure usage tracking + select experiments and validate trace creation.
Reviewed changes
Copilot reviewed 47 out of 49 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/tracking/test_rest_tracking.py | Adds coverage for auto-creating an experiment when usage_tracking=True. |
| tests/store/tracking/test_rest_store.py | Updates REST store request expectations to include usage_tracking. |
| tests/store/tracking/test_gateway_sql_store.py | Extends SQL store endpoint creation test with usage-tracking fields. |
| tests/server/test_gateway_api.py | Updates gateway API tests for traced providers, streaming, and trace creation. |
| tests/resources/db/latest_schema.sql | Updates latest test DB schema with new endpoint columns. |
| tests/gateway/schemas/test_completions.py | Validates streaming completions schema with optional usage. |
| tests/gateway/schemas/test_chat.py | Validates streaming chat schema with optional usage. |
| tests/gateway/providers/test_tracing.py | Adds unit tests for provider tracing wrapper behavior. |
| tests/gateway/providers/test_togetherai.py | Updates TogetherAI streaming expectations to include usage: None / usage final chunk. |
| tests/gateway/providers/test_openai.py | Updates OpenAI streaming tests for stream_options.include_usage + optional usage field. |
| tests/gateway/providers/test_gemini.py | Updates Gemini streaming tests to include optional usage. |
| tests/gateway/providers/test_cohere.py | Updates Cohere streaming tests to include usage objects. |
| tests/gateway/providers/test_anthropic.py | Updates Anthropic streaming tests + validates usage extraction changes. |
| tests/db/schemas/sqlite.sql | Adds experiment_id and usage_tracking to SQLite test schema. |
| tests/db/schemas/postgresql.sql | Adds experiment_id and usage_tracking to Postgres test schema. |
| tests/db/schemas/mysql.sql | Adds experiment_id and usage_tracking to MySQL test schema. |
| tests/db/schemas/mssql.sql | Adds experiment_id and usage_tracking to MSSQL test schema. |
| mlflow/types/chat.py | Adds optional usage field to chat completion chunk type. |
| mlflow/tracing/constant.py | Adds gateway-related trace metadata keys + provider/model span attributes. |
| mlflow/store/tracking/gateway/sqlalchemy_mixin.py | Persists experiment_id + usage_tracking for endpoint create/update. |
| mlflow/store/tracking/gateway/rest_mixin.py | Extends REST store endpoint create/update payloads with usage-tracking fields. |
| mlflow/store/tracking/gateway/entities.py | Adds experiment_id to resolved endpoint config entity. |
| mlflow/store/tracking/gateway/config_resolver.py | Propagates experiment_id into runtime endpoint config. |
| mlflow/store/tracking/gateway/abstract_mixin.py | Updates abstract store contract/docs for usage tracking + experiment handling. |
| mlflow/store/tracking/dbmodels/models.py | Adds new endpoint columns to SQLAlchemy DB model and entity conversion. |
| mlflow/store/db_migrations/versions/d0e1f2a3b4c5_add_experiment_id_to_endpoints.py | Adds Alembic migration for new endpoint columns. |
| mlflow/server/js/src/lang/default/en.json | Adds UI strings for usage tracking + experiment selection + “View traces”. |
| mlflow/server/js/src/gateway/types.ts | Extends Gateway endpoint/request types with usage-tracking fields. |
| mlflow/server/js/src/gateway/pages/EndpointPage.tsx | Passes experiment id into edit form renderer to surface trace link. |
| mlflow/server/js/src/gateway/hooks/useExperimentsForSelect.ts | Adds query hook to fetch experiments for selection. |
| mlflow/server/js/src/gateway/hooks/useCreateEndpointForm.ts | Adds usage tracking + experiment selection to create-endpoint submit payload. |
| mlflow/server/js/src/gateway/components/endpoint-form/EndpointFormRenderer.tsx | Adds create-time usage tracking toggle + experiment selector UI. |
| mlflow/server/js/src/gateway/components/edit-endpoint/EditEndpointFormRenderer.tsx | Adds “View traces” link when endpoint has an experiment id. |
| mlflow/server/js/src/gateway/components/create-endpoint/ExperimentSelect.tsx | Adds experiment select component for the create form. |
| mlflow/server/handlers.py | Adds server-side experiment auto-creation logic for usage-tracked endpoints. |
| mlflow/server/gateway_api.py | Adds gateway span creation + traced streaming response wrapper; wraps providers for tracing. |
| mlflow/protos/service_pb2.pyi | Updates Python proto stubs for new endpoint fields. |
| mlflow/protos/service.proto | Adds experiment_id + usage_tracking to gateway endpoint create/update protos. |
| mlflow/gateway/utils.py | Updates SSE serialization to use model_dump_json() for pydantic v2. |
| mlflow/gateway/schemas/completions.py | Adds optional usage to streaming completions response schema. |
| mlflow/gateway/providers/tracing.py | Introduces TracingProviderWrapper for provider method span instrumentation. |
| mlflow/gateway/providers/openai.py | Adds streamed usage extraction + injects stream_options.include_usage. |
| mlflow/gateway/providers/litellm.py | Adds get_provider_name() override for more accurate tracing labels. |
| mlflow/gateway/providers/gemini.py | Adds streamed usage extraction from usageMetadata. |
| mlflow/gateway/providers/base.py | Adds get_provider_name() default to support tracing/metrics naming. |
| mlflow/gateway/providers/anthropic.py | Adds streamed usage aggregation across events and exposes it on chunks. |
| mlflow/entities/gateway_endpoint.py | Extends GatewayEndpoint entity (proto conversion) with new fields. |
2d9066d to
acf47cd
Compare
0a1d073 to
b18ba2f
Compare
de9b744 to
8368e29
Compare
Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
|
/review ✅ Review completed. Review OutputThe code looks consistent with the implementation. Let me now finalize my review. After careful examination of all the changed lines in this PR, I have reviewed:
After thorough review of the changed lines only, here's my assessment: No issues found The PR changes are well-structured:
The code follows consistent patterns, has proper type hints, and the tests comprehensively cover the new tracing functionality including both synchronous and streaming cases. No issues found |
| endpoint_type: EndpointType, | ||
| enable_tracing: bool = True, | ||
| ) -> BaseProvider: | ||
| ) -> tuple[BaseProvider, GatewayEndpointConfig]: |
There was a problem hiding this comment.
This is surprising that we have EndpointConfig and GatewayEndpointConfig, it's confusing. I feel like we should rename EndpointConfig to ProviderConfig but not related to this PR.
Could we rename this function though, and probably use a NamedTuple for return value to be more explicit
There was a problem hiding this comment.
Yeah, I think we'll consolidate them once we deprecate or integrate the legacy gateway workflow
mlflow/gateway/tracing_utils.py
Outdated
| from mlflow.store.tracking.gateway.entities import GatewayEndpointConfig | ||
|
|
||
|
|
||
| def traced_gateway_call( |
There was a problem hiding this comment.
nit for this function name, since it may or may not be traced depending on the config, maybe something like apply_gateway_tracing_config
There was a problem hiding this comment.
maybe_traced_gateway_call?
serena-ruan
left a comment
There was a problem hiding this comment.
Overall LGTM! Left some nits :)
🥞 Stacked PR
Use this link to review incremental changes.
Related Issues/PRs
n/a
What changes are proposed in this pull request?
Title
How is this PR tested?
Does this PR require documentation update?
Release Notes
Is this a user-facing change?
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/tracking: Tracking Service, tracking client APIs, autologgingarea/models: MLmodel format, model serialization/deserialization, flavorsarea/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registryarea/scoring: MLflow Model server, model deployment tools, Spark UDFsarea/evaluation: MLflow model evaluation features, evaluation metrics, and evaluation workflowsarea/gateway: MLflow AI Gateway client APIs, server, and third-party integrationsarea/prompts: MLflow prompt engineering features, prompt templates, and prompt managementarea/tracing: MLflow Tracing features, tracing APIs, and LLM tracing functionalityarea/projects: MLproject format, project running backendsarea/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/build: Build and test infrastructure for MLflowarea/docs: MLflow documentation pagesHow should the PR be classified in the release notes? Choose one:
rn/none- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/breaking-change- The PR will be mentioned in the "Breaking Changes" sectionrn/feature- A new user-facing feature worth mentioning in the release notesrn/bug-fix- A user-facing bug fix worth mentioning in the release notesrn/documentation- A user-facing documentation change worth mentioning in the release notesShould this PR be included in the next patch release?
Yesshould be selected for bug fixes, documentation updates, and other small changes.Noshould be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.What is a minor/patch release?
Bug fixes, doc updates and new features usually go into minor releases.
Bug fixes and doc updates usually go into patch releases.