Skip to content

Prevent symlink path traversal in local artifact store#18964

Merged
BenWilson2 merged 4 commits intomlflow:masterfrom
BenWilson2:fix-path-traversal
Dec 2, 2025
Merged

Prevent symlink path traversal in local artifact store#18964
BenWilson2 merged 4 commits intomlflow:masterfrom
BenWilson2:fix-path-traversal

Conversation

@BenWilson2
Copy link
Member

@BenWilson2 BenWilson2 commented Nov 21, 2025

🛠 DevTools 🛠

Open in GitHub Codespaces

Install mlflow from this PR

# mlflow
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/18964/merge
# mlflow-skinny
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/18964/merge#subdirectory=libs/skinny

For Databricks, use the following command:

%sh curl -LsSf https://raw.githubusercontent.com/mlflow/mlflow/HEAD/dev/install-skinny.sh | sh -s pull/18964/merge

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

Checks that symlink path traversal cannot extend outside of the base artifact path to prevent a path traversal attack.

How is this PR tested?

  • Existing unit/integration tests
  • New unit/integration tests
  • Manual tests

Does this PR require documentation update?

  • No. You can skip the rest of this section.
  • Yes. I've updated:
    • Examples
    • API references
    • Instructions

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/tracking: Tracking Service, tracking client APIs, autologging
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/evaluation: MLflow model evaluation features, evaluation metrics, and evaluation workflows
  • area/gateway: MLflow AI Gateway client APIs, server, and third-party integrations
  • area/prompts: MLflow prompt engineering features, prompt templates, and prompt management
  • area/tracing: MLflow Tracing features, tracing APIs, and LLM tracing functionality
  • area/projects: MLproject format, project running backends
  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages

How should the PR be classified in the release notes? Choose one:

  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Should this PR be included in the next patch release?

Yes should be selected for bug fixes, documentation updates, and other small changes. No should be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.

What is a minor/patch release?
  • Minor release: a release that increments the second part of the version number (e.g., 1.2.0 -> 1.3.0).
    Bug fixes, doc updates and new features usually go into minor releases.
  • Patch release: a release that increments the third part of the version number (e.g., 1.2.0 -> 1.2.1).
    Bug fixes and doc updates usually go into patch releases.
  • Yes (this PR will be cherry-picked and included in the next patch release)
  • No (this PR will be included in the next minor release)

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
@github-actions github-actions bot added v3.6.1 area/models MLmodel format, model serialization/deserialization, flavors rn/bug-fix Mention under Bug Fixes in Changelogs. labels Nov 21, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 21, 2025

Documentation preview for 275de93 is available at:

More info
  • Ignore this comment if this PR does not change the documentation.
  • The preview is updated when a new commit is pushed to this PR.
  • This comment was created by this workflow run.
  • The documentation was built by this workflow run.

return path


def validate_path_within_directory(base_dir, constructed_path):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets add type hints

Returns:
The constructed_path if validation passes.
"""
real_base_dir = os.path.realpath(base_dir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we leverage pathlib?

secret_file.write_text("SECRET_CONTENT")
yield secret_dir
if secret_dir.exists():
shutil.rmtree(str(secret_dir))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove this. pytest removes the tmp directory

secret_dir = tmp_path.parent / "secrets_outside"
secret_dir.mkdir(exist_ok=True)
secret_file = secret_dir / "secret.txt"
secret_file.write_text("SECRET_CONTENT")
Copy link
Member

@harupy harupy Nov 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
secret_file.write_text("SECRET_CONTENT")
secret_file.touch()

since contents don't matter

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
@BenWilson2 BenWilson2 requested a review from harupy November 25, 2025 17:27
real_constructed_path = pathlib.Path(constructed_path).resolve()

try:
real_constructed_path.relative_to(real_base_dir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use is_relative_to?

):
artifact_dir = pathlib.Path(local_artifact_repo.artifact_dir)
symlink_path = artifact_dir / symlink_name
os.symlink(str(external_secret_dir), str(symlink_path))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.symlink only accepts strings?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pathlib doesn't have a method to create a symlink?


def validate_path_within_directory(base_dir: str, constructed_path: str) -> str:
"""
Validates that the constructed path (after resolving symlinks) is within the base directory.
Copy link
Member

@harupy harupy Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, we blindly follow symlinks. I wonder if that's really ok

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have much of an option since symlinks would be entirely justifiable to use from within the scope of the artifact root directory. Blocking traversal outside of that root seems like effective prevention of attack vectors.

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
@BenWilson2 BenWilson2 requested a review from harupy December 2, 2025 03:59
@BenWilson2 BenWilson2 added v3.7.0 and removed v3.6.1 labels Dec 2, 2025
Copy link
Member

@harupy harupy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@BenWilson2 BenWilson2 added this pull request to the merge queue Dec 2, 2025
Merged via the queue into mlflow:master with commit 6fd54de Dec 2, 2025
72 of 77 checks passed
@BenWilson2 BenWilson2 deleted the fix-path-traversal branch December 2, 2025 16:56
BenWilson2 added a commit to BenWilson2/mlflow that referenced this pull request Dec 4, 2025
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
BenWilson2 added a commit that referenced this pull request Dec 4, 2025
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/models MLmodel format, model serialization/deserialization, flavors rn/bug-fix Mention under Bug Fixes in Changelogs. team-review Trigger a team review request v3.7.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants