Skip to content

[Endpoints] [3/x] Entities base definitions#19004

Merged
BenWilson2 merged 2 commits intomlflow:masterfrom
BenWilson2:stack/endpoints/entities
Dec 4, 2025
Merged

[Endpoints] [3/x] Entities base definitions#19004
BenWilson2 merged 2 commits intomlflow:masterfrom
BenWilson2:stack/endpoints/entities

Conversation

@BenWilson2
Copy link
Member

@BenWilson2 BenWilson2 commented Nov 24, 2025

🥞 Stacked PR

Use this link to review incremental changes.


Related Issues/PRs

#xxx

What changes are proposed in this pull request?

Adds the base entities definitions for secrets and endpoints definitions. Note that proto interfaces are handled in a later PR in this stack.

How is this PR tested?

  • Existing unit/integration tests
  • New unit/integration tests
  • Manual tests

Does this PR require documentation update?

  • No. You can skip the rest of this section.
  • Yes. I've updated:
    • Examples
    • API references
    • Instructions

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/tracking: Tracking Service, tracking client APIs, autologging
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/evaluation: MLflow model evaluation features, evaluation metrics, and evaluation workflows
  • area/gateway: MLflow AI Gateway client APIs, server, and third-party integrations
  • area/prompts: MLflow prompt engineering features, prompt templates, and prompt management
  • area/tracing: MLflow Tracing features, tracing APIs, and LLM tracing functionality
  • area/projects: MLproject format, project running backends
  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages

How should the PR be classified in the release notes? Choose one:

  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Should this PR be included in the next patch release?

Yes should be selected for bug fixes, documentation updates, and other small changes. No should be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.

What is a minor/patch release?
  • Minor release: a release that increments the second part of the version number (e.g., 1.2.0 -> 1.3.0).
    Bug fixes, doc updates and new features usually go into minor releases.
  • Patch release: a release that increments the third part of the version number (e.g., 1.2.0 -> 1.2.1).
    Bug fixes and doc updates usually go into patch releases.
  • Yes (this PR will be cherry-picked and included in the next patch release)
  • No (this PR will be included in the next minor release)

@github-actions
Copy link
Contributor

github-actions bot commented Nov 26, 2025

Documentation preview for c017f38 is available at:

More info
  • Ignore this comment if this PR does not change the documentation.
  • The preview is updated when a new commit is pushed to this PR.
  • This comment was created by this workflow run.
  • The documentation was built by this workflow run.

@BenWilson2 BenWilson2 force-pushed the stack/endpoints/entities branch 2 times, most recently from c02afd1 to 384adf3 Compare December 3, 2025 01:52
last_updated_at: int
created_by: str | None = None
last_updated_by: str | None = None
endpoint_count: int = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make it lazy load property? IIUC this requires querying the mapping table, but don't use referenced endpoint count for every use case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The endpoint count can be removed - we can infer this with the length of the mappings present in the model binding table. This is safe to delete.

mapping_id: str
endpoint_id: str
model_definition_id: str
model_definition: ModelDefinition | None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to store this within the mapping entity?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal here is to minimize the query complexity on the front-end. Without having this embedded within this entity (we use it directly in the UI when populating the model information card for an endpoint), we would have N queries for each model that attached to an endpoint and then perform client-side joins to populate the page.
While this wouldn't be my choice in design for SDK / cli APIs, I intentionally went with this approach here for the UI in order to minimize the front end complexity and to reduce page render time.

example: EndpointDetailsPage.tsx directly accesses the result of the return of this entity.

name: str
created_at: int
last_updated_at: int
model_mappings: list[EndpointModelMapping] = field(default_factory=list)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should also be queried lazily

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary use of this in the UI is to display critical information in order to populate the reference fields that are directly displayed. We don't really have a use case in the UI display for non-eager return of this in the main endpoint listing page.
Where we directly need this:

  1. Provider filtering (providers are mapped to models, not to endpoints as an endpoint can fallback to using a model from a different provider)
  2. Displayed model contents for each endpoint for link references to the models details page entries
  3. Rendering of named models and underlying provider model names

If scale is a concern, I think we can introduce pagination in the future (i.e. users create thousands of endpoints), but the alternative (lazy loading) for the UI would be many queries being issued in order to populate the required aspects of the list page, dramatically increasing the front end complexity and forcing spin-loading elements within the page.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification! It makes sense to include model mapping in the entity to show the model names on UI and allow filter based on it.

last_updated_at: Timestamp (milliseconds) when the binding was last updated.
created_by: User ID who created the binding.
last_updated_by: User ID who last updated the binding.
endpoint_name: Optional endpoint name, populated via JOIN for display.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
endpoint_name: Optional endpoint name, populated via JOIN for display.
endpoint_name: Optional endpoint name, populated via JOIN for display.

Where do we plan to display this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The endpoint name is displayed on the main list page, in the model mappings detail pages, and in the key mapping details pages.

created_by: str | None = None
last_updated_by: str | None = None
endpoint_name: str | None = None
model_mappings: list[EndpointModelMapping] = field(default_factory=list)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a part of a binding entity?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was originally intended to fulfill the 'filter the models for a single-model use resource_id' but it's overly complicated and doesn't need to be overloaded in this. We don't use this field in the UI and splitting out the resource-based binding information into a separate API is far simpler and more tailored to what is needed. I'm going to simplify this and then add the dedicated API for resources to fetch their binding information.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we name this gateway_endpoint.py or add a docstring at the top explain these models are used within Gateway? The term "endpoint" "model" "resource" is pretty generic without that context.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I'll preface these with gateway so that it's easier to see their relationships ontologically



@dataclass
class ModelConfig(_MlflowObject):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of these entities? I casually read through the remaining stack but didn't find a usage. If we need these, can we define this somewhere in more limited scope than general entities to avoid confusion with ModelDefinition and Endpoint entities?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the interfaces for the downstream access of decrypted secrets server-side. They won't be exposed in the rest interface (non-public). The sql store will have the implementation that returns these with an internal RPC.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move these entities into different module? All classes under mlflow/entities are for public interface. Also it doesn't need to be _MlflowObject subclass then cuz we probably want to be stricter about the print and serde method, rather than using its base methods.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep! Since these are entirely UI-only I'll migrate them to make it clear that they're isolated. Good call.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure these tests (test_endpoint.py and test_secrets.py) are meaningful.... they are basically only testing default dataclass constructor right...?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah in this PR they're nonsense stubs. The full test suite is added with the addition of the from_proto() / to_proto() methods in the rest branch.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove these tests? I saw rest branch adds more tests, but my point was we don't want to have non-useful tests, because that makes the code lengthy and can hide important details.

@BenWilson2 BenWilson2 force-pushed the stack/endpoints/entities branch 2 times, most recently from 14fce99 to 09aba4e Compare December 4, 2025 00:27
@BenWilson2 BenWilson2 requested a review from B-Step62 December 4, 2025 00:30
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
@BenWilson2 BenWilson2 force-pushed the stack/endpoints/entities branch from 09aba4e to 3c17d6c Compare December 4, 2025 01:54
Copilot AI review requested due to automatic review settings December 4, 2025 01:54
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces base entity definitions for gateway endpoints and secrets management, along with comprehensive cryptographic utilities for secure secret storage using envelope encryption. The implementation adds support for storing encrypted API keys and credentials with KEK/DEK encryption schemes.

Key Changes:

  • Adds cryptographic utilities for AES-256-GCM encryption with KEK/DEK envelope encryption
  • Introduces entity definitions for gateway endpoints, model definitions, and secrets
  • Implements CLI commands for KEK rotation operations
  • Provides comprehensive test coverage for cryptography and entity classes

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
mlflow/utils/cryptography.py Core cryptographic utilities implementing AES-256-GCM encryption, PBKDF2 key derivation, and envelope encryption with KEK/DEK pattern
mlflow/entities/gateway_secrets.py Entity definition for encrypted secrets with LLM provider credentials
mlflow/entities/secrets.py Duplicate/unused entity file with minor docstring differences
mlflow/entities/gateway_endpoint.py Entity definitions for endpoints, model definitions, mappings, and configurations
mlflow/entities/__init__.py Updated exports to include new gateway and secret entities
mlflow/cli/cryptography.py CLI commands for KEK rotation operations with comprehensive user guidance
mlflow/cli/__init__.py Integration of cryptography CLI commands
tests/utils/test_cryptography.py Comprehensive test coverage for cryptographic functions
tests/entities/test_gateway_secrets.py Test coverage for secret entity creation and attributes
tests/entities/test_gateway_endpoint.py Test coverage for endpoint and model definition entities
tests/cli/test_cryptography.py Test coverage for CLI commands with mocked database operations
docs/api_reference/api_inventory.txt API documentation inventory updates

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1 to +33
from dataclasses import dataclass

from mlflow.entities._mlflow_object import _MlflowObject


@dataclass
class GatewaySecret(_MlflowObject):
"""
Represents an encrypted secret for authenticating with LLM providers.

Store encrypted API keys and authentication credentials using envelope encryption.
The actual secret value is encrypted with a DEK (Data Encryption Key), which is itself
encrypted by a KEK (Key Encryption Key) for secure storage.

Args:
secret_id: Unique identifier for this secret.
secret_name: User-friendly name for the secret (must be unique).
masked_value: Masked version of the secret for display (e.g., "sk-...xyz123").
provider: LLM provider this secret is for (e.g., "openai", "anthropic").
created_at: Timestamp (milliseconds) when the secret was created.
last_updated_at: Timestamp (milliseconds) when the secret was last updated.
created_by: User ID who created the secret.
last_updated_by: User ID who last updated the secret.
"""

secret_id: str
secret_name: str
masked_value: str
created_at: int
last_updated_at: int
provider: str | None = None
created_by: str | None = None
last_updated_by: str | None = None
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor documentation inconsistency: Line 11 says "Store encrypted API keys..." while the identical class in gateway_secrets.py says "Secrets store encrypted API keys...". While mlflow/entities/secrets.py exists, it's not imported in __init__.py - only gateway_secrets.py is imported. This file appears to be unused and should likely be removed to avoid confusion, or the import should be updated to use this file instead.

Suggested change
from dataclasses import dataclass
from mlflow.entities._mlflow_object import _MlflowObject
@dataclass
class GatewaySecret(_MlflowObject):
"""
Represents an encrypted secret for authenticating with LLM providers.
Store encrypted API keys and authentication credentials using envelope encryption.
The actual secret value is encrypted with a DEK (Data Encryption Key), which is itself
encrypted by a KEK (Key Encryption Key) for secure storage.
Args:
secret_id: Unique identifier for this secret.
secret_name: User-friendly name for the secret (must be unique).
masked_value: Masked version of the secret for display (e.g., "sk-...xyz123").
provider: LLM provider this secret is for (e.g., "openai", "anthropic").
created_at: Timestamp (milliseconds) when the secret was created.
last_updated_at: Timestamp (milliseconds) when the secret was last updated.
created_by: User ID who created the secret.
last_updated_by: User ID who last updated the secret.
"""
secret_id: str
secret_name: str
masked_value: str
created_at: int
last_updated_at: int
provider: str | None = None
created_by: str | None = None
last_updated_by: str | None = None
# FILE REMOVED: This file was unused and duplicated functionality from gateway_secrets.py.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this file

@click.group("crypto", help="Commands for managing MLflow's cryptographic passphrase.")
def commands():
"""
MLflow cryptopgraphic management CLI. Allows for the management of the envelope
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a typo in the docstring: "cryptopgraphic" should be "cryptographic".

Suggested change
MLflow cryptopgraphic management CLI. Allows for the management of the envelope
MLflow cryptographic management CLI. Allows for the management of the envelope

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

@B-Step62 B-Step62 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good once #19004 (comment) and #19004 (comment) is addressed!

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
@BenWilson2 BenWilson2 force-pushed the stack/endpoints/entities branch from 3c17d6c to c017f38 Compare December 4, 2025 21:12
@BenWilson2 BenWilson2 added this pull request to the merge queue Dec 4, 2025
Merged via the queue into mlflow:master with commit 0944a18 Dec 4, 2025
50 checks passed
@BenWilson2 BenWilson2 deleted the stack/endpoints/entities branch December 4, 2025 21:55
endpoint_id: ID of the endpoint.
model_definition_id: ID of the model definition.
model_definition: The full model definition (populated via JOIN).
weight: Routing weight for traffic distribution (default 1).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BenWilson2 What does the unit of weight here? Is it percentage?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/tracking Tracking service, tracking client APIs, autologging rn/feature Mention under Features in Changelogs. team-review Trigger a team review request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants