Fix the demo generation and deletion when workspaces are enabled#20733
Fix the demo generation and deletion when workspaces are enabled#20733B-Step62 merged 2 commits intomlflow:masterfrom
Conversation
🛠 DevTools 🛠
Install mlflow from this PRFor Databricks, use the following command: |
There was a problem hiding this comment.
Pull request overview
This PR makes demo generation/deletion and the Settings page workspace-aware so that demo operations correctly target the active workspace even when child threads are spawned during demo generation.
Changes:
- Wrap
generate_all_demos()inWorkspaceContext(get_request_workspace())and add a regression test ensuring workspace propagation to child threads. - Make
/settingsworkspace-scoped in routing utilities and navigation (no longer treated as a global route). - Update Settings “Clear all demo data” action to use
fetchEndpointRaw(so workspace headers are included) and add a UI test.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
mlflow/demo/__init__.py |
Wrap demo generation in WorkspaceContext to propagate workspace to child threads. |
tests/demo/test_generate.py |
Add test verifying workspace visibility inside threads spawned by demo generators. |
mlflow/server/js/src/workspaces/utils/WorkspaceUtils.ts |
Treat settings as workspace-scoped by removing it from “always global” routes. |
mlflow/server/js/src/workspaces/utils/WorkspaceUtils.test.ts |
Update routing tests to reflect settings being workspace-scoped. |
mlflow/server/js/src/shared/web-shared/model-trace-explorer/RoutingUtils.tsx |
Align shared routing logic so settings is not treated as always-global. |
mlflow/server/js/src/shared/web-shared/genai-traces-table/utils/RoutingUtils.tsx |
Align shared routing logic so settings is not treated as always-global. |
mlflow/server/js/src/settings/SettingsPage.tsx |
Use fetchEndpointRaw for demo delete so workspace header is included. |
mlflow/server/js/src/settings/SettingsPage.test.tsx |
Add test asserting demo delete uses fetchEndpointRaw with correct endpoint/method. |
mlflow/server/js/src/experiment-tracking/route-defs.ts |
Remove the globalRoute designation for settings. |
mlflow/server/js/src/common/components/MlflowSidebar.tsx |
Allow workspace prefixing for the Settings link. |
| # Propagate the workspace to the environment so that child threads spawned during | ||
| # demo generation (e.g. by the evaluation harness's ThreadPoolExecutor) can resolve | ||
| # the active workspace via the MLFLOW_WORKSPACE env-var fallback. The ContextVar | ||
| # set by the server middleware is thread-local and is invisible to new threads. | ||
| with WorkspaceContext(get_request_workspace()): | ||
| for name in generator_names: |
There was a problem hiding this comment.
WorkspaceContext mutates the process-wide MLFLOW_WORKSPACE env var for the entire duration of demo generation. Since generate_all_demos() is used by the /mlflow/demo/generate handler and demo generation can be long-running, concurrent requests in the same process could observe the wrong workspace and create/delete demo data in an unintended workspace. Consider avoiding env-var mutation for request handling (e.g., propagate the ContextVar into spawned threads via contextvars.copy_context() / executor wrappers, or serialize demo generation with a lock/queue so only one workspace can be active at a time in-process).
There was a problem hiding this comment.
The env var race isn't a real concern as uvicorn workers are separate processes with isolated environments. Within a single worker, requests are handled sequentially so the with WorkspaceContext(...) block completes before another request can mutate the env var.
contextvars.copy_context() wouldn't help here either, since ThreadPoolExecutor.submit() doesn't propagate the calling context into worker threads. That's the exact problem this change solves. You'd need to wrap every executor.submit() call across all downstream code (evaluation harness, etc.), which is far more invasive than the single-line WorkspaceContext wrapper.
There was a problem hiding this comment.
Agreed that context var doesn't help here. But I think we still have concurrency problem when FastAPI mounts the Flask handler - it runs the sync handler within a thread pool. Shall we add a simple lock mechanism here to prevent demo experiment is generated concurrently within a single process? It should not affect user experience a lot.
There was a problem hiding this comment.
Good catch! Added a threading lock on the call in the handler.
|
Documentation preview for 989a0d4 is available at: More info
|
mlflow/demo/__init__.py
Outdated
| refresh: bool = False, | ||
| features: list[str] | None = None, | ||
| ) -> list[DemoResult]: | ||
| from mlflow.utils.workspace_context import WorkspaceContext, get_request_workspace |
There was a problem hiding this comment.
Can we move this import to the module level?
tests/demo/test_generate.py
Outdated
|
|
||
|
|
||
| def test_generate_all_demos_propagates_workspace_to_child_threads(): | ||
| """Verify that the workspace set by server middleware is visible in child threads. |
There was a problem hiding this comment.
I think the expectation is obvious from the test title, can we remove/shorten the doc string?
TomeHirata
left a comment
There was a problem hiding this comment.
Left two nits, otherwise LGTM
B-Step62
left a comment
There was a problem hiding this comment.
One clarification question about concurrency problem here: https://github.com/mlflow/mlflow/pull/20733/changes#r2797550436
0e02513 to
c2705e7
Compare
| class ThreadSpawningGenerator(BaseDemoGenerator): | ||
| name = DemoFeature.TRACES | ||
|
|
||
| def generate(self) -> DemoResult: | ||
| def _worker(): |
There was a problem hiding this comment.
generate_all_demos() always calls generator.store_version(). Since this test generator subclasses BaseDemoGenerator, that will execute BaseDemoGenerator.store_version() and instantiate a real tracking store, which can create ./mlruns and introduce filesystem side effects in a unit test. Consider overriding store_version() on ThreadSpawningGenerator (no-op) or patching the tracking store / setting a temp tracking URI within the test.
There was a problem hiding this comment.
Addressed the comment.
c2705e7 to
1117c22
Compare
The issue is that some threads were created which didn't keep the workspace context var. By using the WorkspaceContext context manager, it sets the workspace env var temporarily and then unsets it. Signed-off-by: mprahl <mprahl@users.noreply.github.com>
Now that this page has a button to clear the demo data, it needs to know the workspace it's in. Signed-off-by: mprahl <mprahl@users.noreply.github.com>
1117c22 to
989a0d4
Compare
| # Serialize demo generation so concurrent requests (e.g. FastAPI running Flask | ||
| # handlers in a thread pool) cannot race on the process-wide MLFLOW_WORKSPACE | ||
| # env var that WorkspaceContext temporarily sets during generate_all_demos. | ||
| _demo_generate_lock = threading.Lock() | ||
|
|
There was a problem hiding this comment.
_demo_generate_lock serializes demo generation, but /mlflow/demo/delete can still run concurrently with /mlflow/demo/generate in the same process, which can lead to partially-generated or partially-deleted demo data if a delete happens mid-generation. Consider reusing this lock in _delete_demo() as well (or using a shared lock around both endpoints) so generate/delete are mutually exclusive per process.
| </span> | ||
| </MlflowSidebarLink> | ||
| <MlflowSidebarLink | ||
| disableWorkspacePrefix |
There was a problem hiding this comment.
[not in the scope of this PR] The current setting page is mostly "user" level. We should think how to support different scope of settings (user, workspace, admin).
…low#20733) Signed-off-by: mprahl <mprahl@users.noreply.github.com>
…low#20733) Signed-off-by: mprahl <mprahl@users.noreply.github.com>
Related Issues/PRs
#xxxWhat changes are proposed in this pull request?
How is this PR tested?
Does this PR require documentation update?
Does this PR require updating the MLflow Skills repository?
Release Notes
Is this a user-facing change?
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/tracking: Tracking Service, tracking client APIs, autologgingarea/models: MLmodel format, model serialization/deserialization, flavorsarea/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registryarea/scoring: MLflow Model server, model deployment tools, Spark UDFsarea/evaluation: MLflow model evaluation features, evaluation metrics, and evaluation workflowsarea/gateway: MLflow AI Gateway client APIs, server, and third-party integrationsarea/prompts: MLflow prompt engineering features, prompt templates, and prompt managementarea/tracing: MLflow Tracing features, tracing APIs, and LLM tracing functionalityarea/projects: MLproject format, project running backendsarea/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/build: Build and test infrastructure for MLflowarea/docs: MLflow documentation pagesHow should the PR be classified in the release notes? Choose one:
rn/none- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/breaking-change- The PR will be mentioned in the "Breaking Changes" sectionrn/feature- A new user-facing feature worth mentioning in the release notesrn/bug-fix- A user-facing bug fix worth mentioning in the release notesrn/documentation- A user-facing documentation change worth mentioning in the release notesShould this PR be included in the next patch release?
Yesshould be selected for bug fixes, documentation updates, and other small changes.Noshould be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.What is a minor/patch release?
Bug fixes, doc updates and new features usually go into minor releases.
Bug fixes and doc updates usually go into patch releases.