Skip to content

link gateway and experiment#20356

Merged
TomeHirata merged 6 commits intomlflow:masterfrom
TomeHirata:stack/gateway-backend-trace-integration
Feb 2, 2026
Merged

link gateway and experiment#20356
TomeHirata merged 6 commits intomlflow:masterfrom
TomeHirata:stack/gateway-backend-trace-integration

Conversation

@TomeHirata
Copy link
Collaborator

@TomeHirata TomeHirata commented Jan 27, 2026

🥞 Stacked PR

Use this link to review incremental changes.


Related Issues/PRs

n/a

What changes are proposed in this pull request?

Link gateway endpoint and experiment id. We auto generate an experiment when usage tracking is on and the id is not specified.

image

How is this PR tested?

  • Existing unit/integration tests
  • New unit/integration tests
  • Manual tests

Does this PR require documentation update?

  • No. You can skip the rest of this section.
  • Yes. I've updated:
    • Examples
    • API references
    • Instructions

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/tracking: Tracking Service, tracking client APIs, autologging
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/evaluation: MLflow model evaluation features, evaluation metrics, and evaluation workflows
  • area/gateway: MLflow AI Gateway client APIs, server, and third-party integrations
  • area/prompts: MLflow prompt engineering features, prompt templates, and prompt management
  • area/tracing: MLflow Tracing features, tracing APIs, and LLM tracing functionality
  • area/projects: MLproject format, project running backends
  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages

How should the PR be classified in the release notes? Choose one:

  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Should this PR be included in the next patch release?

Yes should be selected for bug fixes, documentation updates, and other small changes. No should be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.

What is a minor/patch release?
  • Minor release: a release that increments the second part of the version number (e.g., 1.2.0 -> 1.3.0).
    Bug fixes, doc updates and new features usually go into minor releases.
  • Patch release: a release that increments the third part of the version number (e.g., 1.2.0 -> 1.2.1).
    Bug fixes and doc updates usually go into patch releases.
  • Yes (this PR will be cherry-picked and included in the next patch release)
  • No (this PR will be included in the next minor release)

Copilot AI review requested due to automatic review settings January 27, 2026 09:46
@github-actions
Copy link
Contributor

🛠 DevTools 🛠

Install mlflow from this PR

# mlflow
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/20356/merge
# mlflow-skinny
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/20356/merge#subdirectory=libs/skinny

For Databricks, use the following command:

%sh curl -LsSf https://raw.githubusercontent.com/mlflow/mlflow/HEAD/dev/install-skinny.sh | sh -s pull/20356/merge

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR links MLflow AI Gateway endpoints to MLflow Tracing by attaching an experiment_id to each endpoint, automatically creating experiments when needed, instrumenting gateway provider calls with tracing spans, and exposing trace navigation from the UI.

Changes:

  • Extend gateway endpoint schema (proto, entities, DB models, REST/SQL/abstract mixins, JS types) with an optional experiment_id, auto-creating an experiment per endpoint (gateway/{name}) when none is provided.
  • Add tracing instrumentation to gateway providers and HTTP handlers: wrap providers in TracingProviderWrapper, create top-level gateway traces per invocation, and propagate token usage and provider/model metadata into span attributes.
  • Update tests and UI to accommodate tracing: adapt provider-type tests to the tracing wrapper, ensure chat completions validation works with real endpoints, expose experiment_id in the React types and forms, and add a “View traces” link on the endpoint edit page when tracing is configured.

Reviewed changes

Copilot reviewed 16 out of 18 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/server/test_gateway_api.py Updates provider construction tests to account for TracingProviderWrapper and adds setup for chat-completions validation that now requires an actual endpoint.
mlflow/store/tracking/gateway/sqlalchemy_mixin.py Adds experiment_id argument to create_gateway_endpoint / update_gateway_endpoint, with auto-creation of a gateway/{name} experiment and persistence into SqlGatewayEndpoint.
mlflow/store/tracking/gateway/rest_mixin.py Wires experiment_id through REST client methods for creating and updating gateway endpoints, matching the extended proto.
mlflow/store/tracking/gateway/abstract_mixin.py Extends the abstract gateway store interface to include experiment_id on create/update endpoint signatures and documents its tracing semantics.
mlflow/store/tracking/dbmodels/models.py Adds an experiment_id column to SqlGatewayEndpoint and returns it via to_mlflow_entity() so Python entities see it.
mlflow/store/db_migrations/versions/d0e1f2a3b4c5_add_experiment_id_to_endpoints.py Introduces an Alembic migration to add/drop the nullable experiment_id column on the endpoints table.
mlflow/server/js/src/gateway/types.ts Extends TS types for endpoints and create/update payloads with optional experiment_id to keep the UI/client in sync with the backend.
mlflow/server/js/src/gateway/pages/EndpointPage.tsx Passes the endpoint’s experiment_id into the edit form renderer so the UI can show trace links.
mlflow/server/js/src/gateway/hooks/useCreateEndpointForm.ts Adds experimentId form state and includes it as experiment_id in the create-endpoint mutation, defaulting to auto-create when left blank.
mlflow/server/js/src/gateway/components/endpoint-form/EndpointFormRenderer.tsx Adds an “Experiment” section in create mode for optional experiment ID input, with helper text explaining auto-creation behavior.
mlflow/server/js/src/gateway/components/edit-endpoint/EditEndpointFormRenderer.tsx Adds a “Usage log” block linking to /experiments/{experimentId}/traces when the endpoint has an associated experiment.
mlflow/server/gateway_api.py Imports tracing types, adds helpers for creating gateway traces and extracting provider/model info, wraps created providers with TracingProviderWrapper, and instruments all gateway/chat/passthrough routes to create traces and set outputs/token-usage where possible.
mlflow/protos/service_pb2.pyi Updates Python type stubs to add experiment_id fields/slots/ctor args on GatewayEndpoint, CreateGatewayEndpoint, and UpdateGatewayEndpoint.
mlflow/protos/service.proto Extends gateway endpoint and create/update RPC messages with an optional experiment_id field and associated comments.
mlflow/gateway/providers/base.py Enhances FallbackProvider with internal per-attempt tracing spans and introduces TracingProviderWrapper to add spans around all provider methods (chat, embeddings, completions, passthrough, streaming).
mlflow/entities/gateway_endpoint.py Adds experiment_id to the GatewayEndpoint entity and ensures it is serialized/deserialized to/from the extended proto.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@TomeHirata TomeHirata force-pushed the stack/gateway-backend-trace-integration branch from 090aad9 to 3bd4593 Compare January 27, 2026 10:11
@TomeHirata TomeHirata force-pushed the stack/gateway-backend-trace-integration branch 2 times, most recently from 49c039b to 55ae64a Compare January 27, 2026 10:32
@TomeHirata TomeHirata requested a review from B-Step62 January 27, 2026 10:33
@TomeHirata TomeHirata changed the title link gateway and trace link gateway and experiment Jan 27, 2026
@github-actions github-actions bot added the rn/feature Mention under Features in Changelogs. label Jan 27, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Jan 27, 2026

Documentation preview for cd4753d is available at:

More info
  • Ignore this comment if this PR does not change the documentation.
  • The preview is updated when a new commit is pushed to this PR.
  • This comment was created by this workflow run.
  • The documentation was built by this workflow run.

@TomeHirata TomeHirata added the team-review Trigger a team review request label Jan 28, 2026
@TomeHirata TomeHirata force-pushed the stack/gateway-backend-trace-integration branch 2 times, most recently from b9b3cd4 to f9d8fe5 Compare January 29, 2026 06:28
Comment on lines +4278 to +4282
# Auto-create experiment if usage_tracking is enabled and experiment_id not provided
if usage_tracking and experiment_id is None:
store = _get_tracking_store()
experiment_name = f"gateway/{request_message.name}"
experiment_id = _get_or_create_experiment_id(store, experiment_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason we move the implementation from store to handler?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handler is a higher level logic layer which can orchestrate the logic across multiple resources (experiment, gateway). I actually think the current separation of concern is better api design pattern than putting the experiment creation in create_gateway_endpoint, why do you want to move back to the sql store method?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because generally we put implementation details inside store method so each store can have its own logic, while in this case it only works for sqlstore and it's on-purpose to create the experiment so there's no big difference. However, if the backend is FileStore this will create the experiment even if create_gateway_endpoint is not supported by it?

Copy link
Collaborator Author

@TomeHirata TomeHirata Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, if the backend is FileStore this will create the experiment even if create_gateway_endpoint is not supported by it?

Thanks, even though we could add the explicit validation in handler, I think this is a good reason to give up the completeness of the abstraction a bit. Move back the experiment creation to the sql store.

@TomeHirata TomeHirata force-pushed the stack/gateway-backend-trace-integration branch from 9ecca34 to 0f02e60 Compare February 2, 2026 05:42
@TomeHirata
Copy link
Collaborator Author

/autoformat

TomeHirata and others added 3 commits February 2, 2026 15:57
Add experiment_id column to endpoints table to link Gateway endpoints
with MLflow experiments. This enables usage tracking and filtering
of Gateway metrics by experiment.

Also add a boolean usage_tracking field that controls whether trace
ingestion is enabled:
- When usage_tracking is True, an experiment is auto-created if not provided
- When usage_tracking is False, no experiment is created

Changes:
- Add migration for experiment_id and usage_tracking columns
- Update entity and model classes with new fields
- Add fields to proto definitions
- Update server handlers for new parameters
- Update store mixins for handling

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
@TomeHirata TomeHirata force-pushed the stack/gateway-backend-trace-integration branch from 9043396 to e8f06d9 Compare February 2, 2026 06:58
Copy link
Collaborator

@serena-ruan serena-ruan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for iterating on my comments!

Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
@TomeHirata TomeHirata enabled auto-merge February 2, 2026 07:38
Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
@TomeHirata TomeHirata disabled auto-merge February 2, 2026 08:11
Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
@TomeHirata TomeHirata enabled auto-merge February 2, 2026 08:18
@TomeHirata TomeHirata added this pull request to the merge queue Feb 2, 2026
Merged via the queue into mlflow:master with commit 55caf20 Feb 2, 2026
57 checks passed
@TomeHirata TomeHirata deleted the stack/gateway-backend-trace-integration branch February 2, 2026 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rn/feature Mention under Features in Changelogs. team-review Trigger a team review request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants