[Endpoints] [4/x] Abstract store interface#19005
Conversation
75dd05b to
2b4ec92
Compare
|
Documentation preview for ad08499 is available at: More info
|
2b4ec92 to
139d2d1
Compare
66167b9 to
bcecc8f
Compare
9f711e6 to
62710ce
Compare
| raise NotImplementedError(self.__class__.__name__) | ||
|
|
||
| # =================================================================================== | ||
| # Model Definitions Management |
There was a problem hiding this comment.
The term "model" is over-loaded. Is there any way to make it clear these APIs are for gateway models? Maybe we can define a mixin?
There was a problem hiding this comment.
Yep! I think all of the stores should have mixins for them in order to isolate these and make it even more clear for us that these are UI-only endpoints / methods. I'll refactor to use mixins for the ABC, sql store, and rest store.
3a572c2 to
45dcd4f
Compare
There was a problem hiding this comment.
Pull request overview
This PR introduces abstract store interface methods for secrets management and endpoint configuration as part of the MLflow Gateway/LLM endpoints feature. It provides comprehensive cryptographic infrastructure using envelope encryption (KEK/DEK pattern) following OWASP security best practices.
Key changes:
- Adds cryptography utilities with AES-256-GCM encryption and PBKDF2 key derivation
- Defines entity models for secrets, endpoints, model definitions, and bindings
- Extends abstract store interface with 15 new methods for secrets and endpoints management
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| mlflow/utils/cryptography.py | Core cryptographic utilities implementing envelope encryption with KEK/DEK pattern, following OWASP guidelines (600k PBKDF2 iterations) |
| mlflow/entities/gateway_secrets.py | GatewaySecret entity for encrypted API key storage |
| mlflow/entities/gateway_endpoint.py | Entity models for endpoints, model definitions, bindings, and runtime configurations |
| mlflow/entities/init.py | Exports new gateway-related entities |
| mlflow/store/tracking/abstract_store.py | Abstract interface methods for CRUD operations on secrets, model definitions, endpoints, and bindings |
| mlflow/cli/cryptography.py | CLI command for KEK rotation with safety confirmations and progress reporting |
| mlflow/cli/init.py | Registers crypto CLI group with optional import handling |
| tests/utils/test_cryptography.py | Comprehensive unit tests for cryptographic functions (606 lines) |
| tests/entities/test_gateway_secrets.py | Unit tests for GatewaySecret entity |
| tests/entities/test_gateway_endpoint.py | Unit tests for endpoint-related entities (423 lines) |
| tests/cli/test_cryptography.py | Extensive CLI tests covering success/failure scenarios (443 lines) |
| docs/api_reference/api_inventory.txt | Adds new entities to API documentation inventory |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
49a8458 to
903c1e8
Compare
| from mlflow.store.tracking.gateway.entities import GatewayEndpointConfig | ||
|
|
||
|
|
||
| class GatewayStoreMixin: |
There was a problem hiding this comment.
Did we split these methods as a Mixin because tracking store is getting heavy?
There was a problem hiding this comment.
Mostly to separate out the ambiguous terms "model" (in MLflow that means something different in our public APIs), but also, yeah, sqlalchemy_store.py is the Titanic of Python modules
There was a problem hiding this comment.
Make sense, we should probably do some refactoring of tracking stores using mixin
| """ | ||
| raise NotImplementedError(self.__class__.__name__) | ||
|
|
||
| def get_secret_info( |
There was a problem hiding this comment.
Can we make the resource name consistent?
| def get_secret_info( | |
| def get_secret( |
There was a problem hiding this comment.
I selected the 'info' part simply because I wanted to convey that this returns the metadata and not the actual secret. I'm not convinced that it's needed... curious to hear your thoughts on my reasoning (perhaps I'm just being too paranoid ;) )
There was a problem hiding this comment.
Got it. I proposed get_secret in order to be consistent about the terminology. Since we return GatewaySecret, the method should be called get_secret or get_gateway_secrete. If the GatewaySecret entity does not include actual secret, GatewaySecret might be a misleading name. So I'd call it GatewaySecretInfo. What do you think?
There was a problem hiding this comment.
Also, if we call this method get_secret_info, then can we change list_secrets into list_secret_info to be consistent?
There was a problem hiding this comment.
good call! Since that is metadata as well :)
| self, | ||
| resource_type: str, | ||
| resource_id: str, | ||
| ) -> GatewayEndpointConfig: |
There was a problem hiding this comment.
Is endpoint uniquely identified by resource_type and resource_id? If the composite key (endpoint_id, resource_type, resource_id) is unique, should the returned value list[GatewayEndpointConfig] or list[GatewayEndpoint]?
There was a problem hiding this comment.
This is a really great point. For the initial version I've been focused on the concept of multiple resources -> single endpoint. But there's no reason to restrict this since the DB model supports this without any issues.
There was a problem hiding this comment.
Got it, can we return multiple endpoints here to be future proof? We might want to have multiple endpoints attached to a single resource in the future. We can add the single restriction in the online-scorer specific logic in the downstream if needed.
There was a problem hiding this comment.
yep making the change is in progress right now ;)
903c1e8 to
5bdfd7f
Compare
There was a problem hiding this comment.
LGTM, #19005 (comment) is not blocking but I imagine it would be cleaner to move the method outside the store if it has special access requirement.
| """ | ||
| raise NotImplementedError(self.__class__.__name__) | ||
|
|
||
| def get_resource_endpoint_configs( |
There was a problem hiding this comment.
How to downstream invoke this API? Usually store methods are invoked through MlflowClient, but in this case, it is a bit weird because we don't want anything outside server to call this.
There was a problem hiding this comment.
Yeah having it isolated and not even in the store implementation makes sense. I put the entities that back the responses here into the store layer, so this seems like a logical location to consolidate that access
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
5bdfd7f to
ad08499
Compare
🥞 Stacked PR
Use this link to review incremental changes.
Related Issues/PRs
#xxxWhat changes are proposed in this pull request?
Add the abstract store interface for the endpoints and secrets management APIs
How is this PR tested?
Does this PR require documentation update?
Release Notes
Is this a user-facing change?
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/tracking: Tracking Service, tracking client APIs, autologgingarea/models: MLmodel format, model serialization/deserialization, flavorsarea/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registryarea/scoring: MLflow Model server, model deployment tools, Spark UDFsarea/evaluation: MLflow model evaluation features, evaluation metrics, and evaluation workflowsarea/gateway: MLflow AI Gateway client APIs, server, and third-party integrationsarea/prompts: MLflow prompt engineering features, prompt templates, and prompt managementarea/tracing: MLflow Tracing features, tracing APIs, and LLM tracing functionalityarea/projects: MLproject format, project running backendsarea/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/build: Build and test infrastructure for MLflowarea/docs: MLflow documentation pagesHow should the PR be classified in the release notes? Choose one:
rn/none- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/breaking-change- The PR will be mentioned in the "Breaking Changes" sectionrn/feature- A new user-facing feature worth mentioning in the release notesrn/bug-fix- A user-facing bug fix worth mentioning in the release notesrn/documentation- A user-facing documentation change worth mentioning in the release notesShould this PR be included in the next patch release?
Yesshould be selected for bug fixes, documentation updates, and other small changes.Noshould be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.What is a minor/patch release?
Bug fixes, doc updates and new features usually go into minor releases.
Bug fixes and doc updates usually go into patch releases.