Skip to content

Add workspace database schema#18909

Merged
B-Step62 merged 1 commit intomlflow:orgnization-supportfrom
mprahl:workspaces-db-model
Nov 21, 2025
Merged

Add workspace database schema#18909
B-Step62 merged 1 commit intomlflow:orgnization-supportfrom
mprahl:workspaces-db-model

Conversation

@mprahl
Copy link
Collaborator

@mprahl mprahl commented Nov 19, 2025

🛠 DevTools 🛠

Open in GitHub Codespaces

Install mlflow from this PR

# mlflow
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/18909/merge
# mlflow-skinny
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/18909/merge#subdirectory=libs/skinny

For Databricks, use the following command:

%sh curl -LsSf https://raw.githubusercontent.com/mlflow/mlflow/HEAD/dev/install-skinny.sh | sh -s pull/18909/merge

Related Issues/PRs

First split from #18869.

#1464
#5844
Design document: https://docs.google.com/document/d/1IbFfceCmWV3knJfc0ninn58EXLVbHUh1meV_pknOM40/edit

What changes are proposed in this pull request?

This adds the required workspace columns and the workspace catalog table with the default workspace precreated. All workspace columns default to "default" for now and we may choose to remove the defaults once the tracking store and model registry store are made workspace aware to catch application logic issues not properly setting the workspace.

Some model registry store changes were needed to account for the new composite foreign key.

How is this PR tested?

  • Existing unit/integration tests
  • New unit/integration tests
  • Manual tests

Does this PR require documentation update?

  • No. You can skip the rest of this section.
  • Yes. I've updated:
    • Examples
    • API references
    • Instructions

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/tracking: Tracking Service, tracking client APIs, autologging
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/evaluation: MLflow model evaluation features, evaluation metrics, and evaluation workflows
  • area/gateway: MLflow AI Gateway client APIs, server, and third-party integrations
  • area/prompts: MLflow prompt engineering features, prompt templates, and prompt management
  • area/tracing: MLflow Tracing features, tracing APIs, and LLM tracing functionality
  • area/projects: MLproject format, project running backends
  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages

How should the PR be classified in the release notes? Choose one:

  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Should this PR be included in the next patch release?

Yes should be selected for bug fixes, documentation updates, and other small changes. No should be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.

What is a minor/patch release?
  • Minor release: a release that increments the second part of the version number (e.g., 1.2.0 -> 1.3.0).
    Bug fixes, doc updates and new features usually go into minor releases.
  • Patch release: a release that increments the third part of the version number (e.g., 1.2.0 -> 1.2.1).
    Bug fixes and doc updates usually go into patch releases.
  • Yes (this PR will be cherry-picked and included in the next patch release)
  • No (this PR will be included in the next minor release)

@github-actions github-actions bot added area/model-registry Model registry, model registry APIs, and the fluent client calls for model registry area/tracking Tracking service, tracking client APIs, autologging rn/none List under Small Changes in Changelogs. labels Nov 19, 2025
@mprahl
Copy link
Collaborator Author

mprahl commented Nov 19, 2025

FYI @B-Step62

Copy link
Collaborator

@B-Step62 B-Step62 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mprahl The change looks good to me. Can we point the PR to a feature branch here?

https://github.com/mlflow/mlflow/tree/orgnization-support

For context, we have high cadence of new version release (at least once a month) and they are often cut from master. While the initial implementation is more or less ready in your side, we still likely need multiple weeks until safely land the entire changes and test them throuhghly. We want to avoid including partial changes into release while it is not fully functional.

Drift between master and feature branch would be a bit annoying, but I believe it is not too bad if we sync them regularly.

@B-Step62
Copy link
Collaborator

@dbczumar @BenWilson2 Could you give another eyes at the PR? Thanks!

@github-actions
Copy link
Contributor

github-actions bot commented Nov 19, 2025

Documentation preview for 1f0b624 is available at:

More info
  • Ignore this comment if this PR does not change the documentation.
  • The preview is updated when a new commit is pushed to this PR.
  • This comment was created by this workflow run.
  • The documentation was built by this workflow run.

@mprahl mprahl changed the base branch from master to orgnization-support November 19, 2025 14:30
@mprahl
Copy link
Collaborator Author

mprahl commented Nov 19, 2025

@mprahl The change looks good to me. Can we point the PR to a feature branch here?

https://github.com/mlflow/mlflow/tree/orgnization-support

For context, we have high cadence of new version release (at least once a month) and they are often cut from master. While the initial implementation is more or less ready in your side, we still likely need multiple weeks until safely land the entire changes and test them throuhghly. We want to avoid including partial changes into release while it is not fully functional.

Drift between master and feature branch would be a bit annoying, but I believe it is not too bad if we sync them regularly.

Thanks for the quick review! I just changed the base branch of the PR.

This adds the required workspace columns and the workspace catalog table
with the default workspace precreated. All workspace columns default to
"default" for now and we may choose to remove the defaults once the
tracking store and model registry store are made workspace aware to
catch application logic issues not properly setting the workspace.

Some model registry store changes were needed to account for the new
composite foreign key.

Signed-off-by: mprahl <mprahl@users.noreply.github.com>
@mprahl mprahl force-pushed the workspaces-db-model branch from c0691c6 to 1f0b624 Compare November 19, 2025 14:35
@mprahl
Copy link
Collaborator Author

mprahl commented Nov 19, 2025

@B-Step62 I just rebased on the new branch. Could you please approve the workflows again?

@B-Step62 B-Step62 self-assigned this Nov 19, 2025
Copy link
Member

@BenWilson2 BenWilson2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be more appropriate to have the fields added to each table be workspace id's instead of the unique workspace name that is stored?
I have a feeling that an inner join based on the active workspace (as a name -> id mapping table) to associated tables based on the ID might be better than access filtering based on a string value for the required updates to backend APIs. Curious to hear your thoughts on this one based on DB performance.

Test coverage looks great - thank you for validation of upgrade / downgrade.

@mprahl
Copy link
Collaborator Author

mprahl commented Nov 20, 2025

Would it be more appropriate to have the fields added to each table be workspace id's instead of the unique workspace name that is stored? I have a feeling that an inner join based on the active workspace (as a name -> id mapping table) to associated tables based on the ID might be better than access filtering based on a string value for the required updates to backend APIs. Curious to hear your thoughts on this one based on DB performance.

Test coverage looks great - thank you for validation of upgrade / downgrade.

I'll repost from my Slack comment for visibility:

The reason why I don't leverage foreign keys to a workspace ID is to allow workspace providers outside of the database (e.g. Kubernetes namespaces) to be compatible with the SQLAlchemy tracking and model registry stores. When using a workspace provider, the workspaces won't be stored in the database to avoid having to keep things in sync from the source of truth (e.g. Kubernetes).

Copy link
Member

@BenWilson2 BenWilson2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the justification. No concerns here!

@B-Step62 B-Step62 merged commit 5b68933 into mlflow:orgnization-support Nov 21, 2025
67 of 69 checks passed
mprahl added a commit to opendatahub-io/mlflow that referenced this pull request Nov 21, 2025
Signed-off-by: mprahl <mprahl@users.noreply.github.com>
mprahl added a commit to mprahl/mlflow that referenced this pull request Nov 27, 2025
Signed-off-by: mprahl <mprahl@users.noreply.github.com>
B-Step62 pushed a commit that referenced this pull request Dec 9, 2025
Signed-off-by: mprahl <mprahl@users.noreply.github.com>
mprahl added a commit to mprahl/mlflow that referenced this pull request Jan 16, 2026
Signed-off-by: mprahl <mprahl@users.noreply.github.com>
B-Step62 pushed a commit to B-Step62/mlflow that referenced this pull request Jan 19, 2026
Signed-off-by: mprahl <mprahl@users.noreply.github.com>
ahadas pushed a commit to ahadas/mlflow that referenced this pull request Jan 27, 2026
Signed-off-by: mprahl <mprahl@users.noreply.github.com>
B-Step62 pushed a commit to B-Step62/mlflow that referenced this pull request Feb 3, 2026
Signed-off-by: mprahl <mprahl@users.noreply.github.com>
mprahl added a commit to mprahl/mlflow that referenced this pull request Feb 3, 2026
Signed-off-by: mprahl <mprahl@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/model-registry Model registry, model registry APIs, and the fluent client calls for model registry area/tracking Tracking service, tracking client APIs, autologging rn/none List under Small Changes in Changelogs.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants