Support get_dataset(name=...) in OSS environments#20423
Merged
alkispoly-db merged 2 commits intomlflow:masterfrom Jan 29, 2026
Merged
Support get_dataset(name=...) in OSS environments#20423alkispoly-db merged 2 commits intomlflow:masterfrom
alkispoly-db merged 2 commits intomlflow:masterfrom
Conversation
Contributor
🛠 DevTools 🛠
Install mlflow from this PRFor Databricks, use the following command: |
817a043 to
3c4ec7e
Compare
Contributor
|
Documentation preview for 9ab0ac0 is available at: More info
|
3c4ec7e to
594934c
Compare
Previously, get_dataset() in non-Databricks environments only accepted dataset_id, while Databricks environments used name (UC table name). This change adds support for retrieving datasets by name in OSS by resolving the name to dataset_id via search_datasets. Changes: - Add _validate_non_databricks_get_params to allow either name or dataset_id - Add _resolve_dataset_by_name helper using search_datasets with smart quoting - Update get_dataset to resolve name to dataset_id when name is provided - Add 7 new tests covering name lookup, error cases, and special characters - Update mock_client fixture to patch MlflowClient at module level Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>
594934c to
d57b7d0
Compare
harupy
reviewed
Jan 29, 2026
harupy
approved these changes
Jan 29, 2026
Member
harupy
left a comment
There was a problem hiding this comment.
LGTM once https://github.com/mlflow/mlflow/pull/20423/changes#r2741222323 is resolved!
harupy
reviewed
Jan 29, 2026
Co-authored-by: Harutaka Kawamura <hkawamura0130@gmail.com> Signed-off-by: Alkis Polyzotis <80279913+alkispoly-db@users.noreply.github.com> Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>
df8a5c5 to
9ab0ac0
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Related Issues/PRs
N/A - Feature enhancementWhat changes are proposed in this pull request?
This PR adds support for retrieving datasets by name in non-Databricks (OSS) environments. Previously,
get_dataset()in OSS only accepteddataset_id, while Databricks environments usedname(UC table name). Now both parameters work in OSS.Changes:
_validate_non_databricks_get_paramsto validate that eithernameORdataset_idis provided (not both, not neither)_resolve_dataset_by_namehelper that usessearch_datasetsto find the dataset by name", single quotes for names without')get_datasetto resolvenametodataset_idwhen name is providedMlflowClient,MlflowException, and error codes to top-level importsHow is this PR tested?
Added 7 new tests:
test_get_dataset_by_name_oss- basic name lookup workstest_get_dataset_by_name_not_found- raises when name doesn't existtest_get_dataset_by_name_multiple_matches- raises error on duplicate namestest_get_dataset_both_name_and_id_error- rejects both paramstest_get_dataset_neither_name_nor_id_error- rejects neithertest_get_dataset_name_with_single_quote- handles names with'test_get_dataset_name_with_double_quote- handles names with"All 76 tests pass:
uv run pytest tests/genai/datasets/test_fluent.py -vDoes this PR require documentation update?
The docstring for
get_dataset()has been updated to reflect the new behavior.Release Notes
Is this a user-facing change?
get_dataset()now supports retrieving datasets by name in non-Databricks environments. Users can use eitherget_dataset(name="my_dataset")orget_dataset(dataset_id="d-xxx")to retrieve a dataset.What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/tracking: Tracking Service, tracking client APIs, autologgingHow should the PR be classified in the release notes? Choose one:
rn/feature- A new user-facing feature worth mentioning in the release notesShould this PR be included in the next patch release?