Skip to content

Fix mlflow.spark.load_model to handle Unity Catalog Volumes paths correctly#18672

Merged
harupy merged 2 commits intomlflow:masterfrom
harupy:fix-dfs_tmpdir-spark
Nov 5, 2025
Merged

Fix mlflow.spark.load_model to handle Unity Catalog Volumes paths correctly#18672
harupy merged 2 commits intomlflow:masterfrom
harupy:fix-dfs_tmpdir-spark

Conversation

@harupy
Copy link
Member

@harupy harupy commented Nov 5, 2025

🛠 DevTools 🛠

Open in GitHub Codespaces

Install mlflow from this PR

# mlflow
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/18672/merge
# mlflow-skinny
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/18672/merge#subdirectory=libs/skinny

For Databricks, use the following command:

%sh curl -LsSf https://raw.githubusercontent.com/mlflow/mlflow/HEAD/dev/install-skinny.sh | sh -s pull/18672/merge

Related Issues/PRs

Fix #18668

What changes are proposed in this pull request?

This PR fixes a bug where mlflow.spark.load_model incorrectly prepends /dbfs/ to Unity Catalog Volumes paths when using dfs_tmpdir on Databricks clusters with Dedicated access mode.

The issue occurs because dbfs_hdfs_uri_to_fuse_path() was converting all paths to DBFS FUSE paths (e.g., /dbfs/...), but UC Volumes paths (e.g., /Volumes/...) should not be converted as they don't use the /dbfs prefix.

Changes:

  • Modified dbfs_hdfs_uri_to_fuse_path() in mlflow/utils/uri.py to detect UC Volumes paths and return them unchanged
  • Added type hints to the function signature for better code clarity
  • Updated the function's docstring to document the UC Volumes behavior

This brings load_model behavior in line with save_model, which already supports UC Volumes as dfs_tmpdir on Dedicated clusters.

How is this PR tested?

  • Existing unit/integration tests
  • Manual tests

The fix was manually tested on Databricks Runtime 17.3 LTS ML with a Dedicated cluster using UC Volumes as dfs_tmpdir.

Does this PR require documentation update?

  • No. You can skip the rest of this section.

Release Notes

Is this a user-facing change?

  • Yes. Give a description of this change to be included in the release notes for MLflow users.

Fixed mlflow.spark.load_model to properly handle Unity Catalog Volumes paths when using dfs_tmpdir on Databricks clusters with Dedicated access mode. Previously, the function incorrectly prepended /dbfs/ to UC Volumes paths, causing I/O errors.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs

How should the PR be classified in the release notes? Choose one:

  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes

Should this PR be included in the next patch release?

  • Yes (this PR will be cherry-picked and included in the next patch release)

@github-actions
Copy link
Contributor

github-actions bot commented Nov 5, 2025

Documentation preview for 87fd660 is available at:

More info
  • Ignore this comment if this PR does not change the documentation.
  • The preview is updated when a new commit is pushed to this PR.
  • This comment was created by this workflow run.
  • The documentation was built by this workflow run.

…ction

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
@harupy harupy force-pushed the fix-dfs_tmpdir-spark branch from ae290ee to 06a52c0 Compare November 5, 2025 02:30
@github-actions github-actions bot added the v3.6.0 label Nov 5, 2025
@harupy harupy changed the title Fix mlflow.spark.load_model to not prepend dbfs:/ Fix mlflow.spark.load_model to not prepend dbfs:/ for UC volumes Nov 5, 2025
@github-actions github-actions bot added the rn/bug-fix Mention under Bug Fixes in Changelogs. label Nov 5, 2025
@harupy
Copy link
Member Author

harupy commented Nov 5, 2025

/resolve


🚀 Resolve workflow started


✅ Workflow completed successfully.

Claude Output

Summary

I've successfully resolved the PR review comment! Here's what was done:

PR Context

Review Comment Addressed

File: mlflow/utils/uri.py:414
Reviewer: harupy
Comment: "Update the Returns description."

Changes Made

Updated the docstring for the dbfs_hdfs_uri_to_fuse_path function to accurately reflect that UC Volumes paths are returned unchanged, rather than always returning a DBFS FUSE-style path.

Before:

Returns:
    A DBFS FUSE-style path, e.g. "/dbfs/my-directory"

After:

Returns:
    A DBFS FUSE-style path, e.g. "/dbfs/my-directory". For UC Volumes paths
    (e.g., "/Volumes/..."), returns the path unchanged.

Commit Details

  • ✅ Committed locally with DCO sign-off (commit 87fd660)
  • ✅ All pre-commit hooks passed
  • ✅ Ready to push (not pushed per instructions)

The change clarifies the function's behavior for UC Volumes paths, which was added in this PR but not reflected in the documentation.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: mlflow-app[bot] <mlflow-app[bot]@users.noreply.github.com>
@harupy harupy changed the title Fix mlflow.spark.load_model to not prepend dbfs:/ for UC volumes Fix mlflow.spark.load_model to handle Unity Catalog Volumes paths correctly Nov 5, 2025
@github-actions github-actions bot added area/model-registry Model registry, model registry APIs, and the fluent client calls for model registry area/models MLmodel format, model serialization/deserialization, flavors area/scoring MLflow Model server, model deployment tools, Spark UDFs labels Nov 5, 2025
@harupy harupy requested a review from WeichenXu123 November 5, 2025 13:14
Copy link
Collaborator

@WeichenXu123 WeichenXu123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@harupy harupy added this pull request to the merge queue Nov 5, 2025
Merged via the queue into mlflow:master with commit 59c979f Nov 5, 2025
52 of 54 checks passed
@harupy harupy deleted the fix-dfs_tmpdir-spark branch November 5, 2025 13:47
B-Step62 pushed a commit to B-Step62/mlflow that referenced this pull request Nov 7, 2025
…orrectly (mlflow#18672)

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: mlflow-app[bot] <mlflow-app[bot]@users.noreply.github.com>
Co-authored-by: mlflow-app[bot] <mlflow-app[bot]@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
@github-actions github-actions bot added v3.6.1 and removed v3.6.0 labels Nov 8, 2025
B-Step62 pushed a commit to B-Step62/mlflow that referenced this pull request Nov 11, 2025
…orrectly (mlflow#18672)

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: mlflow-app[bot] <mlflow-app[bot]@users.noreply.github.com>
Co-authored-by: mlflow-app[bot] <mlflow-app[bot]@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
B-Step62 pushed a commit that referenced this pull request Nov 11, 2025
…orrectly (#18672)

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: mlflow-app[bot] <mlflow-app[bot]@users.noreply.github.com>
Co-authored-by: mlflow-app[bot] <mlflow-app[bot]@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
@B-Step62 B-Step62 added v3.6.0 and removed v3.6.1 labels Nov 11, 2025
serena-ruan pushed a commit that referenced this pull request Nov 27, 2025
…orrectly (#18672)

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: mlflow-app[bot] <mlflow-app[bot]@users.noreply.github.com>
Co-authored-by: mlflow-app[bot] <mlflow-app[bot]@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
serena-ruan pushed a commit to serena-ruan/mlflow that referenced this pull request Dec 1, 2025
…orrectly (mlflow#18672)

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: mlflow-app[bot] <mlflow-app[bot]@users.noreply.github.com>
Co-authored-by: mlflow-app[bot] <mlflow-app[bot]@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
B-Step62 pushed a commit to B-Step62/mlflow that referenced this pull request Dec 5, 2025
…orrectly (mlflow#18672)

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: mlflow-app[bot] <mlflow-app[bot]@users.noreply.github.com>
Co-authored-by: mlflow-app[bot] <mlflow-app[bot]@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
B-Step62 pushed a commit that referenced this pull request Dec 5, 2025
…orrectly (#18672)

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: mlflow-app[bot] <mlflow-app[bot]@users.noreply.github.com>
Co-authored-by: mlflow-app[bot] <mlflow-app[bot]@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/model-registry Model registry, model registry APIs, and the fluent client calls for model registry area/models MLmodel format, model serialization/deserialization, flavors area/scoring MLflow Model server, model deployment tools, Spark UDFs rn/bug-fix Mention under Bug Fixes in Changelogs. v3.6.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Not able to use a UC Volume as dfs_tmpdir in mlflow.spark.load_model when running on a Databricks Dedicated cluster

3 participants