Support get_dataset(name=...) in OSS environments by alkispoly-db · Pull Request #20423 · mlflow/mlflow

alkispoly-db · 2026-01-28T22:58:37Z

Related Issues/PRs

N/A - Feature enhancement

What changes are proposed in this pull request?

This PR adds support for retrieving datasets by name in non-Databricks (OSS) environments. Previously, get_dataset() in OSS only accepted dataset_id, while Databricks environments used name (UC table name). Now both parameters work in OSS.

Changes:

Add _validate_non_databricks_get_params to validate that either name OR dataset_id is provided (not both, not neither)
Add _resolve_dataset_by_name helper that uses search_datasets to find the dataset by name
- Uses smart quoting (double quotes for names without ", single quotes for names without ')
- Uses pattern matching for clean result handling
- Raises appropriate errors for not found / multiple matches
Update get_dataset to resolve name to dataset_id when name is provided
Move MlflowClient, MlflowException, and error codes to top-level imports

How is this PR tested?

Existing unit/integration tests
New unit/integration tests

Added 7 new tests:

test_get_dataset_by_name_oss - basic name lookup works
test_get_dataset_by_name_not_found - raises when name doesn't exist
test_get_dataset_by_name_multiple_matches - raises error on duplicate names
test_get_dataset_both_name_and_id_error - rejects both params
test_get_dataset_neither_name_nor_id_error - rejects neither
test_get_dataset_name_with_single_quote - handles names with '
test_get_dataset_name_with_double_quote - handles names with "

All 76 tests pass: uv run pytest tests/genai/datasets/test_fluent.py -v

Does this PR require documentation update?

No. You can skip the rest of this section.

The docstring for get_dataset() has been updated to reflect the new behavior.

Release Notes

Is this a user-facing change?

Yes. Give a description of this change to be included in the release notes for MLflow users.

get_dataset() now supports retrieving datasets by name in non-Databricks environments. Users can use either get_dataset(name="my_dataset") or get_dataset(dataset_id="d-xxx") to retrieve a dataset.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

area/tracking: Tracking Service, tracking client APIs, autologging

How should the PR be classified in the release notes? Choose one:

rn/feature - A new user-facing feature worth mentioning in the release notes

Should this PR be included in the next patch release?

Yes (this PR will be cherry-picked and included in the next patch release)
No (this PR will be included in the next minor release)

github-actions · 2026-01-28T22:58:49Z

🛠 DevTools 🛠

Install mlflow from this PR

# mlflow
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/20423/merge
# mlflow-skinny
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/20423/merge#subdirectory=libs/skinny

For Databricks, use the following command:

%sh curl -LsSf https://raw.githubusercontent.com/mlflow/mlflow/HEAD/dev/install-skinny.sh | sh -s pull/20423/merge

github-actions · 2026-01-28T23:07:40Z

Documentation preview for 9ab0ac0 is available at:

https://pr-20423--mlflow-docs-preview.netlify.app/docs/latest/

More info

Ignore this comment if this PR does not change the documentation.
The preview is updated when a new commit is pushed to this PR.
This comment was created by this workflow run.
The documentation was built by this workflow run.

Previously, get_dataset() in non-Databricks environments only accepted dataset_id, while Databricks environments used name (UC table name). This change adds support for retrieving datasets by name in OSS by resolving the name to dataset_id via search_datasets. Changes: - Add _validate_non_databricks_get_params to allow either name or dataset_id - Add _resolve_dataset_by_name helper using search_datasets with smart quoting - Update get_dataset to resolve name to dataset_id when name is provided - Add 7 new tests covering name lookup, error cases, and special characters - Update mock_client fixture to patch MlflowClient at module level Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>

mlflow/genai/datasets/__init__.py

harupy

LGTM once https://github.com/mlflow/mlflow/pull/20423/changes#r2741222323 is resolved!

mlflow/genai/datasets/__init__.py

Co-authored-by: Harutaka Kawamura <hkawamura0130@gmail.com> Signed-off-by: Alkis Polyzotis <80279913+alkispoly-db@users.noreply.github.com> Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>

github-actions bot added area/tracking Tracking service, tracking client APIs, autologging rn/feature Mention under Features in Changelogs. labels Jan 28, 2026

alkispoly-db force-pushed the mlflow-get-dataset-oss branch from 817a043 to 3c4ec7e Compare January 28, 2026 23:07

alkispoly-db force-pushed the mlflow-get-dataset-oss branch from 3c4ec7e to 594934c Compare January 28, 2026 23:10

alkispoly-db force-pushed the mlflow-get-dataset-oss branch from 594934c to d57b7d0 Compare January 28, 2026 23:16

alkispoly-db requested a review from harupy January 28, 2026 23:17

harupy reviewed Jan 29, 2026

View reviewed changes

mlflow/genai/datasets/__init__.py Outdated Show resolved Hide resolved

harupy approved these changes Jan 29, 2026

View reviewed changes

harupy reviewed Jan 29, 2026

View reviewed changes

mlflow/genai/datasets/__init__.py Outdated Show resolved Hide resolved

github-actions bot assigned harupy Jan 29, 2026

Update mlflow/genai/datasets/__init__.py

9ab0ac0

Co-authored-by: Harutaka Kawamura <hkawamura0130@gmail.com> Signed-off-by: Alkis Polyzotis <80279913+alkispoly-db@users.noreply.github.com> Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>

alkispoly-db force-pushed the mlflow-get-dataset-oss branch from df8a5c5 to 9ab0ac0 Compare January 29, 2026 19:08

alkispoly-db enabled auto-merge January 29, 2026 19:12

alkispoly-db added this pull request to the merge queue Jan 29, 2026

Merged via the queue into mlflow:master with commit 245fe39 Jan 29, 2026
47 checks passed

alkispoly-db deleted the mlflow-get-dataset-oss branch January 29, 2026 19:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support get_dataset(name=...) in OSS environments#20423

Support get_dataset(name=...) in OSS environments#20423
alkispoly-db merged 2 commits intomlflow:masterfrom
alkispoly-db:mlflow-get-dataset-oss

alkispoly-db commented Jan 28, 2026

Uh oh!

github-actions bot commented Jan 28, 2026

Install mlflow from this PR

Uh oh!

github-actions bot commented Jan 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

harupy left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alkispoly-db commented Jan 28, 2026

Related Issues/PRs

What changes are proposed in this pull request?

How is this PR tested?

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

Should this PR be included in the next patch release?

Uh oh!

github-actions bot commented Jan 28, 2026

Install mlflow from this PR

Uh oh!

github-actions bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

harupy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Jan 28, 2026 •

edited

Loading