[Online Scoring][6.3/x] Add exclusive job flag and trace scorer job definition by dbczumar · Pull Request #19714 · mlflow/mlflow

dbczumar · 2026-01-01T18:11:34Z

🥞 Stacked PR

Use this link to review incremental changes.

stack/pr6c-exclusive-flag-job [Files changed]
- stack/pr7b-completed-sessions [Files changed]
  - stack/pr7c-session-checkpointer [Files changed]
    - stack/pr8-session-job-utils [Files changed]
      - stack/pr9-periodic-scheduler [Files changed]
        
        stack/pr10-online-scoring-ui [Files changed]

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

Introduce server job for online trace scoring

How is this PR tested?

Existing unit/integration tests
New unit/integration tests
Manual tests

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

How should the PR be classified in the release notes? Choose one:

rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

Should this PR be included in the next patch release?

Yes should be selected for bug fixes, documentation updates, and other small changes. No should be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.

What is a minor/patch release?

Minor release: a release that increments the second part of the version number (e.g., 1.2.0 -> 1.3.0).
Bug fixes, doc updates and new features usually go into minor releases.
Patch release: a release that increments the third part of the version number (e.g., 1.2.0 -> 1.2.1).
Bug fixes and doc updates usually go into patch releases.

Yes (this PR will be cherry-picked and included in the next patch release)
No (this PR will be included in the next minor release)

The second handler for IS NOT NULL (checking for separate NOT and NULL tokens) is unreachable with sqlparse >= 0.5.3, which always parses "NOT NULL" as a single keyword token. Testing confirms this pattern holds across various whitespace configurations. Signed-off-by: dbczumar <corey.zumar@databricks.com>

Add OnlineTraceScoringProcessor that orchestrates online scoring: - Fetches traces since last checkpoint - Applies sampling to select scorers per trace - Runs scoring in parallel using ThreadPoolExecutor - Logs assessments back to traces - Updates checkpoint after processing Signed-off-by: dbczumar <corey.zumar@databricks.com>

Signed-off-by: dbczumar <corey.zumar@databricks.com>

Verify that metadata.mlflow.sourceRun IS NULL filter is applied to exclude traces generated during evaluation runs. Signed-off-by: dbczumar <corey.zumar@databricks.com>

Move the EvalItem import inside _execute_scoring() to avoid pulling in pandas at module load time. This allows OnlineTraceScoringProcessor to be exported from __init__.py without breaking the skinny client. Signed-off-by: dbczumar <corey.zumar@databricks.com>

Move evaluation module imports inside _execute_scoring method to prevent pulling in pandas at module load time, which would break the skinny client. Signed-off-by: dbczumar <corey.zumar@databricks.com>

Update test patches to target mlflow.genai.evaluation.harness module directly instead of trace_processor module, since imports are now lazy. Signed-off-by: dbczumar <corey.zumar@databricks.com>

Signed-off-by: dbczumar <corey.zumar@databricks.com>

When traces from multiple filters have overlapping timestamps, they are sorted alphabetically by trace_id at each timestamp. Updated test to expect correct interleaved ordering rather than sequential blocks. Signed-off-by: dbczumar <corey.zumar@databricks.com>

Calculate overlap pairs and checkpoint values based on MAX_TRACES_PER_JOB constant rather than hardcoding values. The test now works correctly regardless of the MAX_TRACES_PER_JOB value. Signed-off-by: dbczumar <corey.zumar@databricks.com>

Replace hardcoded timestamp ranges with dynamic calculations: - Staging filter starts at 0, covers MAX_TRACES_PER_JOB range - Prod filter starts at 0.8*MAX_TRACES_PER_JOB, creating 20% overlap - All expected values computed from these fractions Test now works correctly regardless of MAX_TRACES_PER_JOB value. Signed-off-by: dbczumar <corey.zumar@databricks.com>

Changed loop to iterate over tasks.items() instead of tasks.values() to access trace_id for better debugging when a task has no trace. Signed-off-by: dbczumar <corey.zumar@databricks.com>

Add exclusive flag support to the job decorator to prevent duplicate job execution for the same parameters. Add run_online_trace_scorer_job that uses OnlineTraceScoringProcessor to score individual traces. Signed-off-by: dbczumar <corey.zumar@databricks.com>

…ng job Signed-off-by: dbczumar <corey.zumar@databricks.com>

…ent module Signed-off-by: dbczumar <corey.zumar@databricks.com>

- Add unit test for _compute_exclusive_lock_key in test_utils.py - Add integration tests for exclusive job behavior in test_jobs.py Signed-off-by: dbczumar <corey.zumar@databricks.com>

Verify the job correctly instantiates OnlineTraceScoringProcessor and calls process_traces. Signed-off-by: dbczumar <corey.zumar@databricks.com>

The custom JSON serializer was failing when Huey tried to serialize lock values (plain strings) because it expected all data to be Message objects with _asdict(). Updated serializer to handle both Message objects and plain data, and improved deserializer to reconstruct Messages only when appropriate. Signed-off-by: dbczumar <corey.zumar@databricks.com>

…lain data" This reverts commit 11f8272. Signed-off-by: dbczumar <corey.zumar@databricks.com>

The JsonSerializer needs to handle two types of data: 1. Message objects (task data) with ._asdict() method 2. Plain data like lock values (e.g., '1' for exclusive locks) Without this fix, exclusive jobs fail when Huey tries to serialize lock values during lock acquisition. Signed-off-by: dbczumar <corey.zumar@databricks.com>

serena-ruan · 2026-01-09T04:17:02Z

tests/server/jobs/test_online_scoring_jobs.py

+        ) as mock_create,
+    ):
+        online_scorers = [make_online_scorer_dict(Completeness())]
+        run_online_trace_scorer_job(experiment_id="exp1", online_scorers=online_scorers)


Could we verify the exclusiveness on the scorer job directly as well? Since it contains objects as params

Totally - done!

serena-ruan · 2026-01-09T04:18:40Z

tests/server/jobs/test_online_scoring_jobs.py

+        "name": scorer.name,
+        "experiment_id": "exp1",
+        "serialized_scorer": json.dumps(scorer.model_dump()),
+        "sample_rate": sample_rate,
+        "filter_string": None,
+    }


asdict(OnlineScorer) should convert to {"name": ..., "serialized_scorer": ..., "online_config": ...} instead?

Yeah, this works / is better! thanks! :D

serena-ruan · 2026-01-09T04:19:34Z

mlflow/genai/scorers/job.py

+)
+def run_online_trace_scorer_job(
+    experiment_id: str,
+    online_scorers: list[dict[str, Any]],


Just to clarify, we expect the input to be list of OnlineScorer's dictionary format with asdict?

Absolutely (added some information about that to the param docstring)

serena-ruan

Overall LGTM!

@job

Allow the exclusive parameter in @job decorator to accept a list of parameter names that determine exclusivity, rather than just a boolean. This enables fine-grained control over which parameters are considered when determining if a job should be exclusive. - Change exclusive from bool to bool | list[str] in job decorator - Update _compute_exclusive_lock_key to accept exclusive_params - When exclusive is a list, only hash those specific parameters - Update run_online_trace_scorer_job to use exclusive=['experiment_id'] - This prevents simultaneous scoring jobs for the same experiment while allowing different scorer configurations to queue properly - Add comprehensive tests for parameter-specific exclusivity Signed-off-by: dbczumar <corey.zumar@databricks.com>

Add integration test that verifies run_online_trace_scorer_job uses exclusive=["experiment_id"] correctly. Test submits two jobs with same experiment_id but different scorers and verifies only one runs while the other is canceled due to the exclusive lock. Also simplify test helper by inlining OnlineScorer dict creation using asdict() instead of maintaining a helper function. Signed-off-by: dbczumar <corey.zumar@databricks.com>

Move duplicated test utilities (_setup_job_runner, wait_job_finalize, etc.) from test_jobs.py and test_online_scoring_jobs.py into a shared helpers.py module for better code reuse across the test suite. Signed-off-by: dbczumar <corey.zumar@databricks.com>

Signed-off-by: dbczumar <corey.zumar@databricks.com>

dbczumar force-pushed the stack/pr6c-exclusive-flag-job branch from 7decb5f to a1c1303 Compare January 1, 2026 18:39

dbczumar mentioned this pull request Jan 1, 2026

[Online Scoring][7.0/x] Add find_completed_sessions method #19718

Merged

29 tasks

dbczumar force-pushed the stack/pr6c-exclusive-flag-job branch from a1c1303 to 2e6a8d7 Compare January 1, 2026 19:00

dbczumar mentioned this pull request Jan 1, 2026

[Online Scoring][6.1/x] Add IS NULL / IS NOT NULL comparator support for trace metadata filtering #19720

Merged

29 tasks

dbczumar force-pushed the stack/pr6c-exclusive-flag-job branch 16 times, most recently from 7d932d7 to 85c26d4 Compare January 2, 2026 18:53

dbczumar added 21 commits January 7, 2026 16:56

Fix import path in trace_processor.py

5c3c8b8

Signed-off-by: dbczumar <corey.zumar@databricks.com>

Fix trace_processor sync issues and add unit tests

2c88bb6

Signed-off-by: dbczumar <corey.zumar@databricks.com>

Add test for eval run trace exclusion filter

24f9c88

Verify that metadata.mlflow.sourceRun IS NULL filter is applied to exclude traces generated during evaluation runs. Signed-off-by: dbczumar <corey.zumar@databricks.com>

Make harness imports lazy in trace processor to avoid pandas dependency

6f1cc2e

Move evaluation module imports inside _execute_scoring method to prevent pulling in pandas at module load time, which would break the skinny client. Signed-off-by: dbczumar <corey.zumar@databricks.com>

Fix test patches to use harness module after lazy import changes

82cdddb

Update test patches to target mlflow.genai.evaluation.harness module directly instead of trace_processor module, since imports are now lazy. Signed-off-by: dbczumar <corey.zumar@databricks.com>

Update test_trace_processor to use nested OnlineScoringConfig

9031324

Signed-off-by: dbczumar <corey.zumar@databricks.com>

Include trace_id in warning message for missing trace

516a48f

Changed loop to iterate over tasks.items() instead of tasks.values() to access trace_id for better debugging when a task has no trace. Signed-off-by: dbczumar <corey.zumar@databricks.com>

Add MLFLOW_SERVER_ONLINE_SCORING_MAX_WORKERS env var for online scori…

7dcc4c9

…ng job Signed-off-by: dbczumar <corey.zumar@databricks.com>

Fix OnlineScorer import to use package re-export instead of non-exist…

8f5c605

…ent module Signed-off-by: dbczumar <corey.zumar@databricks.com>

Add tests for exclusive job locking functionality

5e3d039

- Add unit test for _compute_exclusive_lock_key in test_utils.py - Add integration tests for exclusive job behavior in test_jobs.py Signed-off-by: dbczumar <corey.zumar@databricks.com>

Add smoke test for run_online_trace_scorer_job

3aef78b

Verify the job correctly instantiates OnlineTraceScoringProcessor and calls process_traces. Signed-off-by: dbczumar <corey.zumar@databricks.com>

Revert "Fix Huey JSON serializer to handle both Message objects and p…

c45ce3a

…lain data" This reverts commit 11f8272. Signed-off-by: dbczumar <corey.zumar@databricks.com>

serena-ruan reviewed Jan 9, 2026

View reviewed changes

serena-ruan approved these changes Jan 9, 2026

View reviewed changes

dbczumar added 5 commits January 9, 2026 10:10

Move local imports to module-level in test_online_scoring_jobs.py

08a49ed

Signed-off-by: dbczumar <corey.zumar@databricks.com>

fix

fb69a7f

Signed-off-by: dbczumar <corey.zumar@databricks.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Online Scoring][6.3/x] Add exclusive job flag and trace scorer job definition#19714

[Online Scoring][6.3/x] Add exclusive job flag and trace scorer job definition#19714
dbczumar merged 135 commits intomlflow:masterfrom
dbczumar:stack/pr6c-exclusive-flag-job

dbczumar commented Jan 1, 2026 •

edited

Loading

Uh oh!

serena-ruan Jan 9, 2026

Uh oh!

dbczumar Jan 9, 2026

Uh oh!

serena-ruan Jan 9, 2026

Uh oh!

dbczumar Jan 9, 2026

Uh oh!

serena-ruan Jan 9, 2026

Uh oh!

dbczumar Jan 9, 2026

Uh oh!

serena-ruan left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

dbczumar commented Jan 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🥞 Stacked PR

Related Issues/PRs

What changes are proposed in this pull request?

How is this PR tested?

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

Should this PR be included in the next patch release?

Uh oh!

serena-ruan Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

dbczumar Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

serena-ruan Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

dbczumar Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

serena-ruan Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

dbczumar Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

serena-ruan left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dbczumar commented Jan 1, 2026 •

edited

Loading