Skip to content

Add routing strategy and fallback configuration support for gateway endpoints#19483

Merged
TomeHirata merged 20 commits intomlflow:masterfrom
TomeHirata:stack/gateway/fallback
Dec 25, 2025
Merged

Add routing strategy and fallback configuration support for gateway endpoints#19483
TomeHirata merged 20 commits intomlflow:masterfrom
TomeHirata:stack/gateway/fallback

Conversation

@TomeHirata
Copy link
Collaborator

@TomeHirata TomeHirata commented Dec 18, 2025

🥞 Stacked PR

Use this link to review incremental changes.


Related Issues/PRs

n/a

What changes are proposed in this pull request?

This PR introduces traffic routing and fallback to the gateway endpoints. We will have two types of configuration:

  1. Traffic routing strategy: how to pick the first destination from the primary model definitions. Currently only weight based traffic split is supported.
  2. Fallback strategy: how the request is propagated when the request for the first destination fails. Currently sequential model fallback is supported.

How is this PR tested?

  • Existing unit/integration tests
  • New unit/integration tests
  • Manual tests

Does this PR require documentation update?

  • No. You can skip the rest of this section.
  • Yes. I've updated:
    • Examples
    • API references
    • Instructions

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/tracking: Tracking Service, tracking client APIs, autologging
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/evaluation: MLflow model evaluation features, evaluation metrics, and evaluation workflows
  • area/gateway: MLflow AI Gateway client APIs, server, and third-party integrations
  • area/prompts: MLflow prompt engineering features, prompt templates, and prompt management
  • area/tracing: MLflow Tracing features, tracing APIs, and LLM tracing functionality
  • area/projects: MLproject format, project running backends
  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages

How should the PR be classified in the release notes? Choose one:

  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Should this PR be included in the next patch release?

Yes should be selected for bug fixes, documentation updates, and other small changes. No should be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.

What is a minor/patch release?
  • Minor release: a release that increments the second part of the version number (e.g., 1.2.0 -> 1.3.0).
    Bug fixes, doc updates and new features usually go into minor releases.
  • Patch release: a release that increments the third part of the version number (e.g., 1.2.0 -> 1.2.1).
    Bug fixes and doc updates usually go into patch releases.
  • Yes (this PR will be cherry-picked and included in the next patch release)
  • No (this PR will be included in the next minor release)

@TomeHirata TomeHirata marked this pull request as ready for review December 18, 2025 08:18
Copilot AI review requested due to automatic review settings December 18, 2025 08:18
@github-actions github-actions bot added the rn/none List under Small Changes in Changelogs. label Dec 18, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds routing strategy and fallback configuration support for MLflow gateway endpoints, enabling automatic failover across multiple model providers. The implementation includes database schema changes, new entity types, provider routing logic, and comprehensive test coverage.

Key Changes:

  • Introduces RoutingStrategy and FallbackStrategy enums with FallbackConfig entity for configuring multi-model failover
  • Adds FallbackProvider class that attempts providers sequentially until success or max attempts reached
  • Extends database schema with routing_strategy and fallback_config_json columns
  • Updates gateway API to support fallback routing and structured output formats across providers

Reviewed changes

Copilot reviewed 25 out of 27 changed files in this pull request and generated no comments.

Show a summary per file
File Description
mlflow/protos/service.proto Adds RoutingStrategy and FallbackStrategy enums, FallbackConfig message for proto definitions
mlflow/protos/service_pb2.pyi Type stub updates for new proto messages and enums
mlflow/java/client/src/main/java/org/mlflow/api/proto/Service.java Java protobuf code generation for new routing and fallback types
mlflow/entities/gateway_endpoint.py Implements RoutingStrategy, FallbackStrategy enums and FallbackConfig dataclass
mlflow/entities/init.py Exports new routing and fallback entities
mlflow/store/tracking/dbmodels/models.py Adds routing_strategy and fallback_config_json columns to SqlGatewayEndpoint model
mlflow/store/db_migrations/versions/c9d4e5f6a7b8_add_routing_strategy_to_endpoints.py Database migration for new endpoint routing columns
mlflow/store/tracking/gateway/sqlalchemy_mixin.py Updates create_gateway_endpoint to accept routing parameters
mlflow/store/tracking/gateway/abstract_mixin.py Updates abstract interface for endpoint creation
mlflow/store/tracking/gateway/entities.py Adds routing_strategy and fallback_config to GatewayEndpointConfig
mlflow/store/tracking/gateway/config_resolver.py Propagates routing config from SQL entities to endpoint configs
mlflow/server/handlers.py Updates handler to process routing_strategy and fallback_config from requests
mlflow/server/gateway_api.py Implements provider creation logic with fallback support
mlflow/gateway/providers/base.py Implements FallbackProvider class with retry logic for all endpoint types
mlflow/gateway/providers/anthropic.py Adds structured output support and dynamic header construction
mlflow/gateway/providers/gemini.py Adds structured output support and normalizes finish reasons
mlflow/gateway/providers/mistral.py Adds chat method implementation
mlflow/gateway/providers/openai.py No changes in diff (test updates only)
mlflow/types/chat.py Adds ResponseFormat model for structured outputs
tests/store/tracking/test_gateway_sql_store.py Comprehensive tests for fallback routing creation and retrieval
tests/server/test_gateway_api.py Tests for FallbackProvider instantiation and routing logic
tests/gateway/providers/test_fallback.py Full test suite for FallbackProvider with various scenarios
tests/gateway/providers/test_openai.py Tests for structured output with additional parameters
tests/gateway/providers/test_mistral.py Tests for Mistral structured output support
tests/gateway/providers/test_gemini.py Tests for Gemini structured output and parameter support
tests/gateway/providers/test_anthropic.py Tests for Anthropic structured output with beta headers
tests/ag2/test_ag2_autolog.py Updates speaker_selection_method value to uppercase

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@TomeHirata TomeHirata force-pushed the stack/gateway/fallback branch from 6bdb32c to d083bc1 Compare December 18, 2025 08:33
@TomeHirata TomeHirata force-pushed the stack/gateway/fallback branch 3 times, most recently from 7520acf to 229b2aa Compare December 22, 2025 03:14
Comment on lines +37 to +38
except ValueError:
return None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for eating exception here?

Copy link
Collaborator Author

@TomeHirata TomeHirata Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProtoRoutingStrategy has ROUTING_STRATEGY_UNSPECIFIED, which does not exist in RoutingStrategy enum.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, can we add some comment for the context?


Args:
strategy: The fallback strategy to use (e.g., FallbackStrategy.SEQUENTIAL).
max_attempts: Maximum number of models to try.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my learning, what is the use case of setting max_attempts rather than allowing fallback to all configured models?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's actually a good question, as I was also curious about it. Let me check with the managed gateway folks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the logic for fallback "try primary one time, if fail, go to next" or is it "try primary n times, if fail, got to next for n times try"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the former, then it's unclear if anyone wants to configure max_attempts < len(models).

Comment on lines +283 to +284
except Exception as e:
last_error = e
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: What is our strategy for logging and monitoring the invocation errors that are covered by fallbacks?

Copy link
Collaborator Author

@TomeHirata TomeHirata Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will introduce otel integration to monitor gateway endpoints. If multiple llm call happens due to fallback, we generate multiple spans.

Comment on lines +4227 to +4229
optional RoutingStrategy routing_strategy = 9;
// Fallback configuration (populated if routing_strategy is FALLBACK)
optional FallbackConfig fallback_config = 10;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Is it possible that users want to use fallback in conjunction to other routing method? For example, gradually rollout traffic from model A to B, while keeping model A as fallback to ensure reliability.

Copy link
Collaborator Author

@TomeHirata TomeHirata Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems fallback can be used in conjunction to traffic routing like traffic split. I've updated the logic so that routing and fallback can be used together.

@TomeHirata TomeHirata force-pushed the stack/gateway/fallback branch 3 times, most recently from 22b1dd7 to 1d2e3dc Compare December 22, 2025 08:28
@TomeHirata TomeHirata force-pushed the stack/gateway/fallback branch 2 times, most recently from d1aa116 to b9cf853 Compare December 22, 2025 15:21
…ndpoints

This commit introduces a new routing strategy for gateway endpoints, allowing for fallback configurations. The `RoutingStrategy` and `FallbackStrategy` enums have been added, along with the `FallbackConfig` message to define fallback behavior. The `GatewayEndpoint` entity has been updated to include these new fields, and the necessary changes have been made across various components, including the database models, API handlers, and providers. Additionally, tests have been added to ensure the correct functionality of the fallback provider.

Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
@TomeHirata TomeHirata force-pushed the stack/gateway/fallback branch from b9cf853 to 88eff85 Compare December 23, 2025 01:12
)

model_configs_by_id = {m.model_definition_id: m for m in endpoint_config.models}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not super serious to worry about right now, but I'm curious about what happens if my primary model that I select has a very large context window and my fallback is cheaper / older with a much smaller context window length?
Do we warn users about such a configuration? Allow it silently? Suggest matching?
My worry is that someone might configure their system like this and foot-gun themselves into runtime errors for session based chat where a fallback operation midway through sessions because of some unavoidable network packet loss effectively breaks their app.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, I expect typically models with the same/similar context window will be used as fallback models and the primary model to avoid the sequential failures like that. But yeah we can think about providing a configuration for models under a gateway endpoint in case clients want to have granular control.

This commit introduces a new `LinkageType` enum to differentiate between PRIMARY and FALLBACK linkages in the `GatewayEndpointModelMapping`. It also adds a `fallback_order` field to specify the order of fallback attempts. The changes include updates to the database schema, API handlers, and various components to support these new fields, ensuring proper handling of fallback configurations in gateway endpoints.

Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 23 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
@github-actions github-actions bot added rn/feature Mention under Features in Changelogs. and removed rn/none List under Small Changes in Changelogs. labels Dec 23, 2025
Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
@github-actions
Copy link
Contributor

github-actions bot commented Dec 23, 2025

Documentation preview for 09977a2 is available at:

More info
  • Ignore this comment if this PR does not change the documentation.
  • The preview is updated when a new commit is pushed to this PR.
  • This comment was created by this workflow run.
  • The documentation was built by this workflow run.

Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
Copy link
Collaborator

@B-Step62 B-Step62 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, left a few minor comments and suggestions

Comment on lines +37 to +38
except ValueError:
return None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, can we add some comment for the context?

error_code=RESOURCE_DOES_NOT_EXIST,
)

model_definition_ids = fallback_config.model_definition_ids
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it guaranteed that the model_definition_ids is sorted by the fallback order? If not, I think we need to sort it here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove this function, in models are sorted in _create_provider

Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
This update modifies the fallback configuration for gateway endpoints by replacing the `model_definition_ids` field with `model_mappings`, which now holds a list of `GatewayEndpointModelMapping` objects. The changes also include updates to the related protobuf definitions, SQLAlchemy models, and REST API methods to accommodate the new structure. Additionally, tests have been updated to reflect these changes, ensuring proper functionality of the fallback logic.

Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
@TomeHirata TomeHirata enabled auto-merge December 24, 2025 09:01
@TomeHirata TomeHirata disabled auto-merge December 25, 2025 00:18
This update simplifies the conditional checks in the `SqlAlchemyGatewayStoreMixin` class by removing explicit `None` checks for `fallback_model_definition_ids` and `model_definition_ids`. Additionally, test assertions have been updated to reflect changes in the expected number of model mappings for gateway endpoints, ensuring consistency across the tests.

Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
@TomeHirata TomeHirata enabled auto-merge December 25, 2025 00:38
@TomeHirata TomeHirata added this pull request to the merge queue Dec 25, 2025
Merged via the queue into mlflow:master with commit 52460cd Dec 25, 2025
54 checks passed
@TomeHirata TomeHirata deleted the stack/gateway/fallback branch December 25, 2025 01:11
omarfarhoud pushed a commit to omarfarhoud/mlflow that referenced this pull request Jan 20, 2026
…ndpoints (mlflow#19483)

Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rn/feature Mention under Features in Changelogs.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants