Skip to content

Support get_dataset(name=...) in OSS environments#20423

Merged
alkispoly-db merged 2 commits intomlflow:masterfrom
alkispoly-db:mlflow-get-dataset-oss
Jan 29, 2026
Merged

Support get_dataset(name=...) in OSS environments#20423
alkispoly-db merged 2 commits intomlflow:masterfrom
alkispoly-db:mlflow-get-dataset-oss

Conversation

@alkispoly-db
Copy link
Collaborator

Related Issues/PRs

N/A - Feature enhancement

What changes are proposed in this pull request?

This PR adds support for retrieving datasets by name in non-Databricks (OSS) environments. Previously, get_dataset() in OSS only accepted dataset_id, while Databricks environments used name (UC table name). Now both parameters work in OSS.

Changes:

  • Add _validate_non_databricks_get_params to validate that either name OR dataset_id is provided (not both, not neither)
  • Add _resolve_dataset_by_name helper that uses search_datasets to find the dataset by name
    • Uses smart quoting (double quotes for names without ", single quotes for names without ')
    • Uses pattern matching for clean result handling
    • Raises appropriate errors for not found / multiple matches
  • Update get_dataset to resolve name to dataset_id when name is provided
  • Move MlflowClient, MlflowException, and error codes to top-level imports

How is this PR tested?

  • Existing unit/integration tests
  • New unit/integration tests

Added 7 new tests:

  • test_get_dataset_by_name_oss - basic name lookup works
  • test_get_dataset_by_name_not_found - raises when name doesn't exist
  • test_get_dataset_by_name_multiple_matches - raises error on duplicate names
  • test_get_dataset_both_name_and_id_error - rejects both params
  • test_get_dataset_neither_name_nor_id_error - rejects neither
  • test_get_dataset_name_with_single_quote - handles names with '
  • test_get_dataset_name_with_double_quote - handles names with "

All 76 tests pass: uv run pytest tests/genai/datasets/test_fluent.py -v

Does this PR require documentation update?

  • No. You can skip the rest of this section.

The docstring for get_dataset() has been updated to reflect the new behavior.

Release Notes

Is this a user-facing change?

  • Yes. Give a description of this change to be included in the release notes for MLflow users.

get_dataset() now supports retrieving datasets by name in non-Databricks environments. Users can use either get_dataset(name="my_dataset") or get_dataset(dataset_id="d-xxx") to retrieve a dataset.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/tracking: Tracking Service, tracking client APIs, autologging

How should the PR be classified in the release notes? Choose one:

  • rn/feature - A new user-facing feature worth mentioning in the release notes

Should this PR be included in the next patch release?

  • Yes (this PR will be cherry-picked and included in the next patch release)
  • No (this PR will be included in the next minor release)

@github-actions
Copy link
Contributor

🛠 DevTools 🛠

Install mlflow from this PR

# mlflow
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/20423/merge
# mlflow-skinny
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/20423/merge#subdirectory=libs/skinny

For Databricks, use the following command:

%sh curl -LsSf https://raw.githubusercontent.com/mlflow/mlflow/HEAD/dev/install-skinny.sh | sh -s pull/20423/merge

@github-actions github-actions bot added area/tracking Tracking service, tracking client APIs, autologging rn/feature Mention under Features in Changelogs. labels Jan 28, 2026
@alkispoly-db alkispoly-db force-pushed the mlflow-get-dataset-oss branch from 817a043 to 3c4ec7e Compare January 28, 2026 23:07
@github-actions
Copy link
Contributor

github-actions bot commented Jan 28, 2026

Documentation preview for 9ab0ac0 is available at:

More info
  • Ignore this comment if this PR does not change the documentation.
  • The preview is updated when a new commit is pushed to this PR.
  • This comment was created by this workflow run.
  • The documentation was built by this workflow run.

@alkispoly-db alkispoly-db force-pushed the mlflow-get-dataset-oss branch from 3c4ec7e to 594934c Compare January 28, 2026 23:10
Previously, get_dataset() in non-Databricks environments only accepted
dataset_id, while Databricks environments used name (UC table name).
This change adds support for retrieving datasets by name in OSS by
resolving the name to dataset_id via search_datasets.

Changes:
- Add _validate_non_databricks_get_params to allow either name or dataset_id
- Add _resolve_dataset_by_name helper using search_datasets with smart quoting
- Update get_dataset to resolve name to dataset_id when name is provided
- Add 7 new tests covering name lookup, error cases, and special characters
- Update mock_client fixture to patch MlflowClient at module level

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>
@alkispoly-db alkispoly-db force-pushed the mlflow-get-dataset-oss branch from 594934c to d57b7d0 Compare January 28, 2026 23:16
@alkispoly-db alkispoly-db requested a review from harupy January 28, 2026 23:17
Copy link
Member

@harupy harupy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Co-authored-by: Harutaka Kawamura <hkawamura0130@gmail.com>
Signed-off-by: Alkis Polyzotis <80279913+alkispoly-db@users.noreply.github.com>
Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>
@alkispoly-db alkispoly-db force-pushed the mlflow-get-dataset-oss branch from df8a5c5 to 9ab0ac0 Compare January 29, 2026 19:08
@alkispoly-db alkispoly-db added this pull request to the merge queue Jan 29, 2026
Merged via the queue into mlflow:master with commit 245fe39 Jan 29, 2026
47 checks passed
@alkispoly-db alkispoly-db deleted the mlflow-get-dataset-oss branch January 29, 2026 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/tracking Tracking service, tracking client APIs, autologging rn/feature Mention under Features in Changelogs.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants