Add routing strategy and fallback configuration support for gateway endpoints by TomeHirata · Pull Request #19483 · mlflow/mlflow

TomeHirata · 2025-12-18T08:17:16Z

🥞 Stacked PR

Use this link to review incremental changes.

stack/gateway/fallback [Files changed]

Related Issues/PRs

n/a

What changes are proposed in this pull request?

This PR introduces traffic routing and fallback to the gateway endpoints. We will have two types of configuration:

Traffic routing strategy: how to pick the first destination from the primary model definitions. Currently only weight based traffic split is supported.
Fallback strategy: how the request is propagated when the request for the first destination fails. Currently sequential model fallback is supported.

How is this PR tested?

Existing unit/integration tests
New unit/integration tests
Manual tests

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

How should the PR be classified in the release notes? Choose one:

rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

Should this PR be included in the next patch release?

Yes should be selected for bug fixes, documentation updates, and other small changes. No should be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.

What is a minor/patch release?

Minor release: a release that increments the second part of the version number (e.g., 1.2.0 -> 1.3.0).
Bug fixes, doc updates and new features usually go into minor releases.
Patch release: a release that increments the third part of the version number (e.g., 1.2.0 -> 1.2.1).
Bug fixes and doc updates usually go into patch releases.

Yes (this PR will be cherry-picked and included in the next patch release)
No (this PR will be included in the next minor release)

Copilot

Pull request overview

This PR adds routing strategy and fallback configuration support for MLflow gateway endpoints, enabling automatic failover across multiple model providers. The implementation includes database schema changes, new entity types, provider routing logic, and comprehensive test coverage.

Key Changes:

Introduces RoutingStrategy and FallbackStrategy enums with FallbackConfig entity for configuring multi-model failover
Adds FallbackProvider class that attempts providers sequentially until success or max attempts reached
Extends database schema with routing_strategy and fallback_config_json columns
Updates gateway API to support fallback routing and structured output formats across providers

Reviewed changes

Copilot reviewed 25 out of 27 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
mlflow/protos/service.proto	Adds RoutingStrategy and FallbackStrategy enums, FallbackConfig message for proto definitions
mlflow/protos/service_pb2.pyi	Type stub updates for new proto messages and enums
mlflow/java/client/src/main/java/org/mlflow/api/proto/Service.java	Java protobuf code generation for new routing and fallback types
mlflow/entities/gateway_endpoint.py	Implements RoutingStrategy, FallbackStrategy enums and FallbackConfig dataclass
mlflow/entities/init.py	Exports new routing and fallback entities
mlflow/store/tracking/dbmodels/models.py	Adds routing_strategy and fallback_config_json columns to SqlGatewayEndpoint model
mlflow/store/db_migrations/versions/c9d4e5f6a7b8_add_routing_strategy_to_endpoints.py	Database migration for new endpoint routing columns
mlflow/store/tracking/gateway/sqlalchemy_mixin.py	Updates create_gateway_endpoint to accept routing parameters
mlflow/store/tracking/gateway/abstract_mixin.py	Updates abstract interface for endpoint creation
mlflow/store/tracking/gateway/entities.py	Adds routing_strategy and fallback_config to GatewayEndpointConfig
mlflow/store/tracking/gateway/config_resolver.py	Propagates routing config from SQL entities to endpoint configs
mlflow/server/handlers.py	Updates handler to process routing_strategy and fallback_config from requests
mlflow/server/gateway_api.py	Implements provider creation logic with fallback support
mlflow/gateway/providers/base.py	Implements FallbackProvider class with retry logic for all endpoint types
mlflow/gateway/providers/anthropic.py	Adds structured output support and dynamic header construction
mlflow/gateway/providers/gemini.py	Adds structured output support and normalizes finish reasons
mlflow/gateway/providers/mistral.py	Adds chat method implementation
mlflow/gateway/providers/openai.py	No changes in diff (test updates only)
mlflow/types/chat.py	Adds ResponseFormat model for structured outputs
tests/store/tracking/test_gateway_sql_store.py	Comprehensive tests for fallback routing creation and retrieval
tests/server/test_gateway_api.py	Tests for FallbackProvider instantiation and routing logic
tests/gateway/providers/test_fallback.py	Full test suite for FallbackProvider with various scenarios
tests/gateway/providers/test_openai.py	Tests for structured output with additional parameters
tests/gateway/providers/test_mistral.py	Tests for Mistral structured output support
tests/gateway/providers/test_gemini.py	Tests for Gemini structured output and parameter support
tests/gateway/providers/test_anthropic.py	Tests for Anthropic structured output with beta headers
tests/ag2/test_ag2_autolog.py	Updates speaker_selection_method value to uppercase

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

B-Step62 · 2025-12-22T06:19:09Z

mlflow/entities/gateway_endpoint.py

+        except ValueError:
+            return None


What is the reason for eating exception here?

ProtoRoutingStrategy has ROUTING_STRATEGY_UNSPECIFIED, which does not exist in RoutingStrategy enum.

Got it, can we add some comment for the context?

B-Step62 · 2025-12-22T06:21:12Z

mlflow/entities/gateway_endpoint.py

+
+    Args:
+        strategy: The fallback strategy to use (e.g., FallbackStrategy.SEQUENTIAL).
+        max_attempts: Maximum number of models to try.


For my learning, what is the use case of setting max_attempts rather than allowing fallback to all configured models?

That's actually a good question, as I was also curious about it. Let me check with the managed gateway folks.

Is the logic for fallback "try primary one time, if fail, go to next" or is it "try primary n times, if fail, got to next for n times try"?

It's the former, then it's unclear if anyone wants to configure max_attempts < len(models).

mlflow/entities/gateway_endpoint.py

B-Step62 · 2025-12-22T06:30:23Z

mlflow/gateway/providers/base.py

+            except Exception as e:
+                last_error = e


Q: What is our strategy for logging and monitoring the invocation errors that are covered by fallbacks?

We will introduce otel integration to monitor gateway endpoints. If multiple llm call happens due to fallback, we generate multiple spans.

B-Step62 · 2025-12-22T06:34:38Z

mlflow/protos/service.proto

+  optional RoutingStrategy routing_strategy = 9;
+  // Fallback configuration (populated if routing_strategy is FALLBACK)
+  optional FallbackConfig fallback_config = 10;


Q: Is it possible that users want to use fallback in conjunction to other routing method? For example, gradually rollout traffic from model A to B, while keeping model A as fallback to ensure reliability.

It seems fallback can be used in conjunction to traffic routing like traffic split. I've updated the logic so that routing and fallback can be used together.

…ndpoints This commit introduces a new routing strategy for gateway endpoints, allowing for fallback configurations. The `RoutingStrategy` and `FallbackStrategy` enums have been added, along with the `FallbackConfig` message to define fallback behavior. The `GatewayEndpoint` entity has been updated to include these new fields, and the necessary changes have been made across various components, including the database models, API handlers, and providers. Additionally, tests have been added to ensure the correct functionality of the fallback provider. Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>

Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>

BenWilson2 · 2025-12-23T02:21:53Z

mlflow/server/gateway_api.py

+        )
+
+    model_configs_by_id = {m.model_definition_id: m for m in endpoint_config.models}
+


Maybe not super serious to worry about right now, but I'm curious about what happens if my primary model that I select has a very large context window and my fallback is cheaper / older with a much smaller context window length?
Do we warn users about such a configuration? Allow it silently? Suggest matching?
My worry is that someone might configure their system like this and foot-gun themselves into runtime errors for session based chat where a fallback operation midway through sessions because of some unavoidable network packet loss effectively breaks their app.

That's a good point, I expect typically models with the same/similar context window will be used as fallback models and the primary model to avoid the sequential failures like that. But yeah we can think about providing a configuration for models under a gateway endpoint in case clients want to have granular control.

This commit introduces a new `LinkageType` enum to differentiate between PRIMARY and FALLBACK linkages in the `GatewayEndpointModelMapping`. It also adds a `fallback_order` field to specify the order of fallback attempts. The changes include updates to the database schema, API handlers, and various components to support these new fields, ensuring proper handling of fallback configurations in gateway endpoints. Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>

Copilot

Pull request overview

Copilot reviewed 21 out of 23 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>

github-actions · 2025-12-23T08:05:10Z

Documentation preview for 09977a2 is available at:

https://pr-19483--mlflow-docs-preview.netlify.app/docs/latest/

More info

Ignore this comment if this PR does not change the documentation.
The preview is updated when a new commit is pushed to this PR.
This comment was created by this workflow run.
The documentation was built by this workflow run.

Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>

B-Step62

LGTM overall, left a few minor comments and suggestions

B-Step62 · 2025-12-24T03:02:07Z

mlflow/entities/gateway_endpoint.py

+        except ValueError:
+            return None


Got it, can we add some comment for the context?

mlflow/entities/gateway_endpoint.py

B-Step62 · 2025-12-24T03:06:45Z

mlflow/server/gateway_api.py

+            error_code=RESOURCE_DOES_NOT_EXIST,
+        )
+
+    model_definition_ids = fallback_config.model_definition_ids


Is it guaranteed that the model_definition_ids is sorted by the fallback order? If not, I think we need to sort it here.

I'll remove this function, in models are sorted in _create_provider

mlflow/server/gateway_api.py

Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>

This update modifies the fallback configuration for gateway endpoints by replacing the `model_definition_ids` field with `model_mappings`, which now holds a list of `GatewayEndpointModelMapping` objects. The changes also include updates to the related protobuf definitions, SQLAlchemy models, and REST API methods to accommodate the new structure. Additionally, tests have been updated to reflect these changes, ensuring proper functionality of the fallback logic. Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>

Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>

This update simplifies the conditional checks in the `SqlAlchemyGatewayStoreMixin` class by removing explicit `None` checks for `fallback_model_definition_ids` and `model_definition_ids`. Additionally, test assertions have been updated to reflect changes in the expected number of model mappings for gateway endpoints, ensuring consistency across the tests. Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>

…ndpoints (mlflow#19483) Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>

This was referenced Dec 18, 2025

Enhance Anthropic and Gemini providers to support structured outputs #19452

Merged

Add support for top_k, frequency_penalty, and presence_penalty in Gemini and Mistral provider #19453

Merged

TomeHirata marked this pull request as ready for review December 18, 2025 08:18

Copilot AI review requested due to automatic review settings December 18, 2025 08:18

github-actions bot added the rn/none List under Small Changes in Changelogs. label Dec 18, 2025

Copilot started reviewing on behalf of TomeHirata December 18, 2025 08:18 View session

Copilot AI reviewed Dec 18, 2025

View reviewed changes

TomeHirata force-pushed the stack/gateway/fallback branch from 6bdb32c to d083bc1 Compare December 18, 2025 08:33

TomeHirata requested review from B-Step62 and BenWilson2 December 18, 2025 08:43

TomeHirata force-pushed the stack/gateway/fallback branch 3 times, most recently from 7520acf to 229b2aa Compare December 22, 2025 03:14

B-Step62 reviewed Dec 22, 2025

View reviewed changes

TomeHirata force-pushed the stack/gateway/fallback branch 3 times, most recently from 22b1dd7 to 1d2e3dc Compare December 22, 2025 08:28

github-actions bot assigned B-Step62 Dec 22, 2025

TomeHirata force-pushed the stack/gateway/fallback branch 2 times, most recently from d1aa116 to b9cf853 Compare December 22, 2025 15:21

TomeHirata added 7 commits December 23, 2025 10:10

type hint

c7d9d05

Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>

migration

83ea9f2

Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>

test

1cbae8f

Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>

test

cb27482

Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>

http status

f9f31d7

Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>

schema

88eff85

Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>

TomeHirata force-pushed the stack/gateway/fallback branch from b9cf853 to 88eff85 Compare December 23, 2025 01:12

BenWilson2 reviewed Dec 23, 2025

View reviewed changes

github-actions bot assigned BenWilson2 Dec 23, 2025

TomeHirata requested review from B-Step62, BenWilson2 and Copilot December 23, 2025 05:46

Copilot started reviewing on behalf of TomeHirata December 23, 2025 05:47 View session

Copilot AI reviewed Dec 23, 2025

View reviewed changes

TomeHirata added 2 commits December 23, 2025 14:52

lint

bff8dc8

Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>

test

7d2c694

Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>

github-actions bot added rn/feature Mention under Features in Changelogs. and removed rn/none List under Small Changes in Changelogs. labels Dec 23, 2025

inventory

accbf4e

Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>

test

72c3179

Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>

B-Step62 approved these changes Dec 24, 2025

View reviewed changes

TomeHirata added 7 commits December 24, 2025 13:32

comments

8b3f487

Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>

update fallback logic

a669c09

Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>

handler test

efa54bf

Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>

fix

dbae90e

Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>

use structured config

3ad94f3

Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>

test

0e3892c

Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>

TomeHirata enabled auto-merge December 24, 2025 09:01

TomeHirata disabled auto-merge December 25, 2025 00:18

TomeHirata enabled auto-merge December 25, 2025 00:38

TomeHirata added this pull request to the merge queue Dec 25, 2025

Merged via the queue into mlflow:master with commit 52460cd Dec 25, 2025
54 checks passed

TomeHirata deleted the stack/gateway/fallback branch December 25, 2025 01:11

omarfarhoud pushed a commit to omarfarhoud/mlflow that referenced this pull request Jan 20, 2026

Add routing strategy and fallback configuration support for gateway e…

866da2c

…ndpoints (mlflow#19483) Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>

		)

		model_configs_by_id = {m.model_definition_id: m for m in endpoint_config.models}

Conversation

TomeHirata commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🥞 Stacked PR

Related Issues/PRs

What changes are proposed in this pull request?

How is this PR tested?

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

Should this PR be included in the next patch release?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomeHirata Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomeHirata Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomeHirata Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

github-actions bot commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

B-Step62 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

TomeHirata commented Dec 18, 2025 •

edited

Loading

TomeHirata Dec 23, 2025 •

edited

Loading

TomeHirata Dec 22, 2025 •

edited

Loading

TomeHirata Dec 23, 2025 •

edited

Loading

github-actions bot commented Dec 23, 2025 •

edited

Loading