Skip to content

[Endpoints] [2/x] Add cryptography implementation for secrets storage#19003

Closed
BenWilson2 wants to merge 1 commit intomlflow:masterfrom
BenWilson2:stack/endpoints/crypto
Closed

[Endpoints] [2/x] Add cryptography implementation for secrets storage#19003
BenWilson2 wants to merge 1 commit intomlflow:masterfrom
BenWilson2:stack/endpoints/crypto

Conversation

@BenWilson2
Copy link
Member

@BenWilson2 BenWilson2 commented Nov 24, 2025

🥞 Stacked PR

Use this link to review incremental changes.


Related Issues/PRs

#xxx

What changes are proposed in this pull request?

Adds the encryption / decryption layer for secrets management for endpoints

How is this PR tested?

  • Existing unit/integration tests
  • New unit/integration tests
  • Manual tests

Does this PR require documentation update?

  • No. You can skip the rest of this section.
  • Yes. I've updated:
    • Examples
    • API references
    • Instructions

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/tracking: Tracking Service, tracking client APIs, autologging
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/evaluation: MLflow model evaluation features, evaluation metrics, and evaluation workflows
  • area/gateway: MLflow AI Gateway client APIs, server, and third-party integrations
  • area/prompts: MLflow prompt engineering features, prompt templates, and prompt management
  • area/tracing: MLflow Tracing features, tracing APIs, and LLM tracing functionality
  • area/projects: MLproject format, project running backends
  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages

How should the PR be classified in the release notes? Choose one:

  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Should this PR be included in the next patch release?

Yes should be selected for bug fixes, documentation updates, and other small changes. No should be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.

What is a minor/patch release?
  • Minor release: a release that increments the second part of the version number (e.g., 1.2.0 -> 1.3.0).
    Bug fixes, doc updates and new features usually go into minor releases.
  • Patch release: a release that increments the third part of the version number (e.g., 1.2.0 -> 1.2.1).
    Bug fixes and doc updates usually go into patch releases.
  • Yes (this PR will be cherry-picked and included in the next patch release)
  • No (this PR will be included in the next minor release)

@github-actions
Copy link
Contributor

github-actions bot commented Nov 24, 2025

Documentation preview for 1da3509 is available at:

More info
  • Ignore this comment if this PR does not change the documentation.
  • The preview is updated when a new commit is pushed to this PR.
  • This comment was created by this workflow run.
  • The documentation was built by this workflow run.

@github-actions github-actions bot added area/tracking Tracking service, tracking client APIs, autologging rn/feature Mention under Features in Changelogs. labels Nov 24, 2025
@BenWilson2 BenWilson2 force-pushed the stack/endpoints/crypto branch 2 times, most recently from 1775fa6 to bbb7755 Compare November 26, 2025 03:39
@BenWilson2 BenWilson2 force-pushed the stack/endpoints/crypto branch from bbb7755 to edb2277 Compare November 27, 2025 06:31
@BenWilson2 BenWilson2 force-pushed the stack/endpoints/crypto branch 2 times, most recently from e663668 to 272cf2d Compare December 2, 2025 02:14
@BenWilson2 BenWilson2 added the team-review Trigger a team review request label Dec 2, 2025
@BenWilson2 BenWilson2 force-pushed the stack/endpoints/crypto branch 3 times, most recently from 489066d to 79cefae Compare December 3, 2025 22:35
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
@BenWilson2 BenWilson2 force-pushed the stack/endpoints/crypto branch from 79cefae to 1da3509 Compare December 4, 2025 01:54
Copilot AI review requested due to automatic review settings December 4, 2025 01:54
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a comprehensive cryptography implementation for secure secrets storage in MLflow, including encryption/decryption using envelope encryption (KEK/DEK pattern) and a CLI tool for KEK rotation operations.

Key Changes:

  • Implements AES-256-GCM encryption with PBKDF2-derived KEK for secrets management
  • Adds envelope encryption pattern where each secret has a unique DEK wrapped by a master KEK
  • Provides CLI command mlflow crypto rotate-kek for secure KEK passphrase rotation

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
mlflow/utils/cryptography.py Core cryptography module implementing KEK management, encryption/decryption functions, key wrapping/unwrapping, and secret masking utilities
mlflow/cli/cryptography.py CLI commands for cryptographic operations including KEK rotation with database transaction support
mlflow/cli/__init__.py Integrates cryptography CLI commands with optional import handling
tests/utils/test_cryptography.py Comprehensive test suite covering encryption, decryption, key rotation, and edge cases
tests/cli/test_cryptography.py CLI-specific tests including rotation workflows, error handling, and user interaction scenarios

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@click.group("crypto", help="Commands for managing MLflow's cryptographic passphrase.")
def commands():
"""
MLflow cryptopgraphic management CLI. Allows for the management of the envelope
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling error: "cryptopgraphic" should be "cryptographic"

Suggested change
MLflow cryptopgraphic management CLI. Allows for the management of the envelope
MLflow cryptographic management CLI. Allows for the management of the envelope

Copilot uses AI. Check for mistakes.
Comment on lines +185 to +186
click.echo(
f"\n✓ Successfully rotated {rotated_count} encryption keys "
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar issue with pluralization: when rotated_count is 1, the message will say "Successfully rotated 1 encryption keys" which is grammatically incorrect. Consider using singular/plural form based on count:

key_word = "key" if rotated_count == 1 else "keys"
click.echo(
    f"\n✓ Successfully rotated {rotated_count} encryption {key_word} "
    f"from KEK v{old_version} to v{new_version}\n"
)
Suggested change
click.echo(
f"\n✓ Successfully rotated {rotated_count} encryption keys "
key_word = "key" if rotated_count == 1 else "keys"
click.echo(
f"\n✓ Successfully rotated {rotated_count} encryption {key_word} "

Copilot uses AI. Check for mistakes.
# Step 6: Restart server
$ systemctl start mlflow-server
"""
old_passphrase = os.getenv("MLFLOW_CRYPTO_KEK_PASSPHRASE")
Copy link
Collaborator

@TomeHirata TomeHirata Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q: we don't define this env var in mlflow.environment_variables because it's not used in the tracking server logic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not used client-side at all (that would be dangerous). This passphrase must only reside server-side :)

click.echo(
f"\n✗ Failed to rotate encryption key {secret.secret_id}: {e}", err=True
)
session.rollback()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't rollback handled by ManagedSessionMaker?

"No changes were made. Fix the issue and re-run the command."
) from e

session.commit()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto


return result

if not isinstance(secret_value, str):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What type falls into this branch? The type hint only expects str or dict.

return json.loads(plaintext)
except json.JSONDecodeError:
return plaintext

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

new_wrapped_dek = wrap_dek(dek, new_kek)

return RotatedSecret(encrypted_value=encrypted_value, wrapped_dek=new_wrapped_dek)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

)


def decrypt_secret(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def decrypt_secret(
def _decrypt_secret(

return f"{prefix}...{suffix}"


def encrypt_secret(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def encrypt_secret(
def _encrypt_secret(

return aad_str.encode("utf-8")


def mask_secret_value(secret_value: str | dict[str, Any]) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def mask_secret_value(secret_value: str | dict[str, Any]) -> str:
def _mask_secret_value(

) from e


def create_aad(secret_id: str, secret_name: str) -> bytes:
Copy link
Collaborator

@TomeHirata TomeHirata Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def create_aad(secret_id: str, secret_name: str) -> bytes:
def _create_aad(secret_id: str, secret_name: str) -> bytes:

return self._kek_version


def generate_dek() -> bytes:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we make this private? Same for other methods not used by other files

Suggested change
def generate_dek() -> bytes:
def _generate_dek() -> bytes:

def _check_cryptography_available():
"""Check if cryptography is installed and raise helpful error if not."""
try:
import cryptography # noqa: F401
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use importlib.util.find_spec?

@TomeHirata
Copy link
Collaborator

Overall looks great, left some style comments

@BenWilson2 BenWilson2 closed this Dec 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/tracking Tracking service, tracking client APIs, autologging rn/feature Mention under Features in Changelogs. team-review Trigger a team review request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants