Skip to content

[UI Telemetry] Implement SharedWorker for batching frontend logs#19292

Merged
daniellok-db merged 26 commits intomlflow:masterfrom
daniellok-db:stack/service-worker
Dec 18, 2025
Merged

[UI Telemetry] Implement SharedWorker for batching frontend logs#19292
daniellok-db merged 26 commits intomlflow:masterfrom
daniellok-db:stack/service-worker

Conversation

@daniellok-db
Copy link
Collaborator

@daniellok-db daniellok-db commented Dec 9, 2025

🥞 Stacked PR

Use this link to review incremental changes.


Related Issues/PRs

#xxx

What changes are proposed in this pull request?

This PR implements the SharedWorker code that runs in the browser. The next PR contains the client code that is actually used by the frontend.

The worker is fairly simple, it receives logs via postMessage from the client, does a few validation checks, enriches the data with session ID, and then queues up the logs to be dispatched periodically.

How is this PR tested?

  • Existing unit/integration tests
  • New unit/integration tests
  • Manual tests

Bug bashed manually

Does this PR require documentation update?

  • No. You can skip the rest of this section.
  • Yes. I've updated:
    • Examples
    • API references
    • Instructions

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/tracking: Tracking Service, tracking client APIs, autologging
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/evaluation: MLflow model evaluation features, evaluation metrics, and evaluation workflows
  • area/gateway: MLflow AI Gateway client APIs, server, and third-party integrations
  • area/prompts: MLflow prompt engineering features, prompt templates, and prompt management
  • area/tracing: MLflow Tracing features, tracing APIs, and LLM tracing functionality
  • area/projects: MLproject format, project running backends
  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages

How should the PR be classified in the release notes? Choose one:

  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Should this PR be included in the next patch release?

Yes should be selected for bug fixes, documentation updates, and other small changes. No should be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.

What is a minor/patch release?
  • Minor release: a release that increments the second part of the version number (e.g., 1.2.0 -> 1.3.0).
    Bug fixes, doc updates and new features usually go into minor releases.
  • Patch release: a release that increments the third part of the version number (e.g., 1.2.0 -> 1.2.1).
    Bug fixes and doc updates usually go into patch releases.
  • Yes (this PR will be cherry-picked and included in the next patch release)
  • No (this PR will be included in the next minor release)

@daniellok-db daniellok-db force-pushed the stack/service-worker branch 3 times, most recently from c2cc1fe to 53c8f83 Compare December 9, 2025 14:15
@daniellok-db daniellok-db force-pushed the stack/service-worker branch 2 times, most recently from f0fb458 to 3bc5eb5 Compare December 10, 2025 04:02
@daniellok-db daniellok-db changed the title full-ui [UI Telemetry] Implement SharedWorker for batching frontend logs Dec 10, 2025
@daniellok-db daniellok-db marked this pull request as ready for review December 10, 2025 05:11
Copilot AI review requested due to automatic review settings December 10, 2025 05:11
@github-actions
Copy link
Contributor

@daniellok-db Thank you for the contribution! Could you fix the following issue(s)?

⚠ DCO check

The DCO check failed. Please sign off your commit(s) by following the instructions here. See https://github.com/mlflow/mlflow/blob/master/CONTRIBUTING.md#sign-your-work for more details.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a SharedWorker-based system for batching and uploading frontend telemetry logs in MLflow. The implementation includes:

  • Backend handlers for fetching telemetry configuration and receiving batched telemetry records
  • Frontend SharedWorker that batches events and uploads them every 15 seconds
  • Configuration management with caching and fallback values
  • Proto definitions for UI telemetry messages

Key Changes

  • Added UI-specific telemetry configuration endpoints with caching
  • Implemented SharedWorker for cross-tab telemetry batching
  • Extended telemetry Record schema to support custom session/installation IDs
  • Added comprehensive test coverage for new functionality

Reviewed changes

Copilot reviewed 16 out of 18 changed files in this pull request and generated no comments.

Show a summary per file
File Description
mlflow/telemetry/utils.py Added fetch_ui_telemetry_config() function with fallback support
mlflow/telemetry/constant.py Added UI config URLs and fallback config constant
mlflow/telemetry/schemas.py Extended Record to support optional installation_id and session_id
mlflow/server/handlers.py Added GET/POST handlers for UI telemetry with TTL caching
mlflow/server/__init__.py Registered new telemetry routes
mlflow/server/js/src/telemetry/worker/TelemetryLogger.worker.ts Implemented SharedWorker for batching telemetry
mlflow/server/js/src/telemetry/worker/LogQueue.ts Implemented queue with 15-second flush interval
mlflow/server/js/craco.config.js Configured webpack to build worker as separate entry point
tests/telemetry/test_*.py Added comprehensive tests for new functionality
tests/server/test_handlers.py Added tests for GET/POST telemetry handlers

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 10, 2025

Documentation preview for d06c455 is available at:

More info
  • Ignore this comment if this PR does not change the documentation.
  • The preview is updated when a new commit is pushed to this PR.
  • This comment was created by this workflow run.
  • The documentation was built by this workflow run.

@daniellok-db daniellok-db force-pushed the stack/service-worker branch 3 times, most recently from 05a93cb to 19de120 Compare December 10, 2025 09:24
Comment on lines +333 to +336
// serve SharedWorker file at top-level, it seems to be more
// stable than if it's contained in `static/js/...`. previously
// i was running into issues with webpack path resolution
return 'TelemetryLogger.[name].[contenthash].worker.js';
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it is possible not to include contenthash here so that the file can be stable across hot reloads, but i was running into some issue where the yarn dev server was OOMing. i assume there were a lot of references that could not be cleaned up.

this way when the hash changes, we can manually terminate the orphaned workers. anyway it is a dev only issue as the bundle is static post-build.

import { UI_TELEMETRY_ENDPOINT } from './constants';
import { type TelemetryRecord } from './types';

const FLUSH_INTERVAL_MS = 15000; // 15 seconds
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be discussed: what's the right flush interval? UI sessions can be longer lived, 15 seconds may be a bit too short. the tradeoff here is log fidelity vs. request volume. the longer the interval, the more logs are lost when the browser is closed, but we can save some request overhead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we start with 60 seconds? cc @B-Step62

Comment on lines +37 to +39
if (!config || (config.disable_ui_telemetry ?? true)) {
return;
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if config couldn't be fetched, or if for some reason we cannot read the disable_ui_telemetry field, then drop the log

private config: Promise<TelemetryConfig | null> = fetchConfig();
private sessionId = crypto.randomUUID();
private logQueue: LogQueue = new LogQueue();
private samplingValue: number = Math.random() * 100;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Math.random() generates a random float between 0 - 1, so we can specify a pretty granular ui_rollout_percent in the config, like 12.345678.

@github-actions github-actions bot added area/uiux Front-end, user experience, plotting, JavaScript, JavaScript dev server rn/none List under Small Changes in Changelogs. labels Dec 10, 2025
@daniellok-db daniellok-db added this pull request to the merge queue Dec 18, 2025
Merged via the queue into mlflow:master with commit 712cd0c Dec 18, 2025
63 of 65 checks passed
@daniellok-db daniellok-db deleted the stack/service-worker branch December 18, 2025 02:45
WeichenXu123 pushed a commit to WeichenXu123/mlflow that referenced this pull request Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/uiux Front-end, user experience, plotting, JavaScript, JavaScript dev server rn/none List under Small Changes in Changelogs. v3.8.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants