Add routing strategy and fallback configuration support for gateway endpoints#19483
Add routing strategy and fallback configuration support for gateway endpoints#19483TomeHirata merged 20 commits intomlflow:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds routing strategy and fallback configuration support for MLflow gateway endpoints, enabling automatic failover across multiple model providers. The implementation includes database schema changes, new entity types, provider routing logic, and comprehensive test coverage.
Key Changes:
- Introduces
RoutingStrategyandFallbackStrategyenums withFallbackConfigentity for configuring multi-model failover - Adds
FallbackProviderclass that attempts providers sequentially until success or max attempts reached - Extends database schema with routing_strategy and fallback_config_json columns
- Updates gateway API to support fallback routing and structured output formats across providers
Reviewed changes
Copilot reviewed 25 out of 27 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| mlflow/protos/service.proto | Adds RoutingStrategy and FallbackStrategy enums, FallbackConfig message for proto definitions |
| mlflow/protos/service_pb2.pyi | Type stub updates for new proto messages and enums |
| mlflow/java/client/src/main/java/org/mlflow/api/proto/Service.java | Java protobuf code generation for new routing and fallback types |
| mlflow/entities/gateway_endpoint.py | Implements RoutingStrategy, FallbackStrategy enums and FallbackConfig dataclass |
| mlflow/entities/init.py | Exports new routing and fallback entities |
| mlflow/store/tracking/dbmodels/models.py | Adds routing_strategy and fallback_config_json columns to SqlGatewayEndpoint model |
| mlflow/store/db_migrations/versions/c9d4e5f6a7b8_add_routing_strategy_to_endpoints.py | Database migration for new endpoint routing columns |
| mlflow/store/tracking/gateway/sqlalchemy_mixin.py | Updates create_gateway_endpoint to accept routing parameters |
| mlflow/store/tracking/gateway/abstract_mixin.py | Updates abstract interface for endpoint creation |
| mlflow/store/tracking/gateway/entities.py | Adds routing_strategy and fallback_config to GatewayEndpointConfig |
| mlflow/store/tracking/gateway/config_resolver.py | Propagates routing config from SQL entities to endpoint configs |
| mlflow/server/handlers.py | Updates handler to process routing_strategy and fallback_config from requests |
| mlflow/server/gateway_api.py | Implements provider creation logic with fallback support |
| mlflow/gateway/providers/base.py | Implements FallbackProvider class with retry logic for all endpoint types |
| mlflow/gateway/providers/anthropic.py | Adds structured output support and dynamic header construction |
| mlflow/gateway/providers/gemini.py | Adds structured output support and normalizes finish reasons |
| mlflow/gateway/providers/mistral.py | Adds chat method implementation |
| mlflow/gateway/providers/openai.py | No changes in diff (test updates only) |
| mlflow/types/chat.py | Adds ResponseFormat model for structured outputs |
| tests/store/tracking/test_gateway_sql_store.py | Comprehensive tests for fallback routing creation and retrieval |
| tests/server/test_gateway_api.py | Tests for FallbackProvider instantiation and routing logic |
| tests/gateway/providers/test_fallback.py | Full test suite for FallbackProvider with various scenarios |
| tests/gateway/providers/test_openai.py | Tests for structured output with additional parameters |
| tests/gateway/providers/test_mistral.py | Tests for Mistral structured output support |
| tests/gateway/providers/test_gemini.py | Tests for Gemini structured output and parameter support |
| tests/gateway/providers/test_anthropic.py | Tests for Anthropic structured output with beta headers |
| tests/ag2/test_ag2_autolog.py | Updates speaker_selection_method value to uppercase |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
6bdb32c to
d083bc1
Compare
7520acf to
229b2aa
Compare
| except ValueError: | ||
| return None |
There was a problem hiding this comment.
What is the reason for eating exception here?
There was a problem hiding this comment.
ProtoRoutingStrategy has ROUTING_STRATEGY_UNSPECIFIED, which does not exist in RoutingStrategy enum.
There was a problem hiding this comment.
Got it, can we add some comment for the context?
mlflow/entities/gateway_endpoint.py
Outdated
|
|
||
| Args: | ||
| strategy: The fallback strategy to use (e.g., FallbackStrategy.SEQUENTIAL). | ||
| max_attempts: Maximum number of models to try. |
There was a problem hiding this comment.
For my learning, what is the use case of setting max_attempts rather than allowing fallback to all configured models?
There was a problem hiding this comment.
That's actually a good question, as I was also curious about it. Let me check with the managed gateway folks.
There was a problem hiding this comment.
Is the logic for fallback "try primary one time, if fail, go to next" or is it "try primary n times, if fail, got to next for n times try"?
There was a problem hiding this comment.
It's the former, then it's unclear if anyone wants to configure max_attempts < len(models).
| except Exception as e: | ||
| last_error = e |
There was a problem hiding this comment.
Q: What is our strategy for logging and monitoring the invocation errors that are covered by fallbacks?
There was a problem hiding this comment.
We will introduce otel integration to monitor gateway endpoints. If multiple llm call happens due to fallback, we generate multiple spans.
| optional RoutingStrategy routing_strategy = 9; | ||
| // Fallback configuration (populated if routing_strategy is FALLBACK) | ||
| optional FallbackConfig fallback_config = 10; |
There was a problem hiding this comment.
Q: Is it possible that users want to use fallback in conjunction to other routing method? For example, gradually rollout traffic from model A to B, while keeping model A as fallback to ensure reliability.
There was a problem hiding this comment.
It seems fallback can be used in conjunction to traffic routing like traffic split. I've updated the logic so that routing and fallback can be used together.
22b1dd7 to
1d2e3dc
Compare
d1aa116 to
b9cf853
Compare
…ndpoints This commit introduces a new routing strategy for gateway endpoints, allowing for fallback configurations. The `RoutingStrategy` and `FallbackStrategy` enums have been added, along with the `FallbackConfig` message to define fallback behavior. The `GatewayEndpoint` entity has been updated to include these new fields, and the necessary changes have been made across various components, including the database models, API handlers, and providers. Additionally, tests have been added to ensure the correct functionality of the fallback provider. Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
b9cf853 to
88eff85
Compare
mlflow/server/gateway_api.py
Outdated
| ) | ||
|
|
||
| model_configs_by_id = {m.model_definition_id: m for m in endpoint_config.models} | ||
|
|
There was a problem hiding this comment.
Maybe not super serious to worry about right now, but I'm curious about what happens if my primary model that I select has a very large context window and my fallback is cheaper / older with a much smaller context window length?
Do we warn users about such a configuration? Allow it silently? Suggest matching?
My worry is that someone might configure their system like this and foot-gun themselves into runtime errors for session based chat where a fallback operation midway through sessions because of some unavoidable network packet loss effectively breaks their app.
There was a problem hiding this comment.
That's a good point, I expect typically models with the same/similar context window will be used as fallback models and the primary model to avoid the sequential failures like that. But yeah we can think about providing a configuration for models under a gateway endpoint in case clients want to have granular control.
This commit introduces a new `LinkageType` enum to differentiate between PRIMARY and FALLBACK linkages in the `GatewayEndpointModelMapping`. It also adds a `fallback_order` field to specify the order of fallback attempts. The changes include updates to the database schema, API handlers, and various components to support these new fields, ensuring proper handling of fallback configurations in gateway endpoints. Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 21 out of 23 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Documentation preview for 09977a2 is available at: More info
|
B-Step62
left a comment
There was a problem hiding this comment.
LGTM overall, left a few minor comments and suggestions
| except ValueError: | ||
| return None |
There was a problem hiding this comment.
Got it, can we add some comment for the context?
mlflow/server/gateway_api.py
Outdated
| error_code=RESOURCE_DOES_NOT_EXIST, | ||
| ) | ||
|
|
||
| model_definition_ids = fallback_config.model_definition_ids |
There was a problem hiding this comment.
Is it guaranteed that the model_definition_ids is sorted by the fallback order? If not, I think we need to sort it here.
There was a problem hiding this comment.
I'll remove this function, in models are sorted in _create_provider
Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
This update modifies the fallback configuration for gateway endpoints by replacing the `model_definition_ids` field with `model_mappings`, which now holds a list of `GatewayEndpointModelMapping` objects. The changes also include updates to the related protobuf definitions, SQLAlchemy models, and REST API methods to accommodate the new structure. Additionally, tests have been updated to reflect these changes, ensuring proper functionality of the fallback logic. Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
This update simplifies the conditional checks in the `SqlAlchemyGatewayStoreMixin` class by removing explicit `None` checks for `fallback_model_definition_ids` and `model_definition_ids`. Additionally, test assertions have been updated to reflect changes in the expected number of model mappings for gateway endpoints, ensuring consistency across the tests. Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
…ndpoints (mlflow#19483) Signed-off-by: Tomu Hirata <tomu.hirata@gmail.com>
🥞 Stacked PR
Use this link to review incremental changes.
Related Issues/PRs
n/a
What changes are proposed in this pull request?
This PR introduces traffic routing and fallback to the gateway endpoints. We will have two types of configuration:
How is this PR tested?
Does this PR require documentation update?
Release Notes
Is this a user-facing change?
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/tracking: Tracking Service, tracking client APIs, autologgingarea/models: MLmodel format, model serialization/deserialization, flavorsarea/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registryarea/scoring: MLflow Model server, model deployment tools, Spark UDFsarea/evaluation: MLflow model evaluation features, evaluation metrics, and evaluation workflowsarea/gateway: MLflow AI Gateway client APIs, server, and third-party integrationsarea/prompts: MLflow prompt engineering features, prompt templates, and prompt managementarea/tracing: MLflow Tracing features, tracing APIs, and LLM tracing functionalityarea/projects: MLproject format, project running backendsarea/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/build: Build and test infrastructure for MLflowarea/docs: MLflow documentation pagesHow should the PR be classified in the release notes? Choose one:
rn/none- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/breaking-change- The PR will be mentioned in the "Breaking Changes" sectionrn/feature- A new user-facing feature worth mentioning in the release notesrn/bug-fix- A user-facing bug fix worth mentioning in the release notesrn/documentation- A user-facing documentation change worth mentioning in the release notesShould this PR be included in the next patch release?
Yesshould be selected for bug fixes, documentation updates, and other small changes.Noshould be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.What is a minor/patch release?
Bug fixes, doc updates and new features usually go into minor releases.
Bug fixes and doc updates usually go into patch releases.