Implement automatic discovery for builtin scorers by alkispoly-db · Pull Request #19443 · mlflow/mlflow

alkispoly-db · 2025-12-17T00:31:49Z

🛠 DevTools 🛠

Install mlflow from this PR

# mlflow
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/19443/merge
# mlflow-skinny
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/19443/merge#subdirectory=libs/skinny

For Databricks, use the following command:

%sh curl -LsSf https://raw.githubusercontent.com/mlflow/mlflow/HEAD/dev/install-skinny.sh | sh -s pull/19443/merge

Related Issues/PRs

N/A

What changes are proposed in this pull request?

This PR implements automatic discovery for builtin scorers, replacing the hardcoded list in get_all_scorers() with a dynamic introspection-based approach using Python's __subclasses__() method.

Key changes:

Added _get_all_concrete_builtin_scorers() helper function that recursively discovers all concrete BuiltInScorer subclasses
Updated get_all_scorers() to use automatic discovery instead of maintaining a hardcoded list
Added proper exception handling for scorers requiring constructor arguments (catches both TypeError and pydantic.ValidationError)
Removed unused is_databricks_uri import
Updated tests to expect 16+ scorers instead of the previous 9/11
Added test_builtin_scorer_discovery() to verify the discovery mechanism

Benefits:

Zero maintenance: New scorers are automatically discovered without code changes
Complete coverage: Now discovers all 16 instantiable scorers (vs previous 9 OSS / 11 Databricks)
Adds missing scorers: Includes 5+ scorers that were missing from the hardcoded list:
- Fluency
- Summarization
- ConversationalSafety
- ConversationalToolCallEfficiency
- ConversationalRoleAdherence
Future-proof: Adding new scorers requires no changes to get_all_scorers()

How is this PR tested?

Existing unit/integration tests
New unit/integration tests
Manual tests

Testing:

Updated test_get_all_scorers_oss to verify 16+ scorers are discovered
Added test_builtin_scorer_discovery to validate discovery of all expected scorer classes
Verified CLI command mlflow scorers list --builtin returns all 16 scorers
All tests pass with both OSS and Databricks tracking URIs

Does this PR require documentation update?

No. You can skip the rest of this section.

The function docstring has been updated to explain the automatic discovery mechanism. No external documentation changes needed.

Release Notes

Is this a user-facing change?

Yes. Give a description of this change to be included in the release notes for MLflow users.

mlflow scorers list --builtin now automatically discovers and returns all builtin scorers (16 total) instead of a hardcoded subset (9 OSS / 11 Databricks). This includes 5+ previously unavailable scorers: Fluency, Summarization, ConversationalSafety, ConversationalToolCallEfficiency, and ConversationalRoleAdherence. The get_all_scorers() function now uses automatic discovery, eliminating the need to manually maintain the scorer list.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

area/evaluation: MLflow model evaluation features, evaluation metrics, and evaluation workflows

How should the PR be classified in the release notes? Choose one:

rn/feature - A new user-facing feature worth mentioning in the release notes

Should this PR be included in the next patch release?

Yes (this PR will be cherry-picked and included in the next patch release)
No (this PR will be included in the next minor release)

This is a feature enhancement that increases the number of available scorers, making it more appropriate for a minor release.

Replace hardcoded list in get_all_scorers() with automatic discovery using Python's __subclasses__() introspection. This eliminates the need to manually maintain the scorer list when adding new scorers. Key changes: - Add _get_all_concrete_builtin_scorers() helper function that recursively discovers all concrete BuiltInScorer subclasses - Update get_all_scorers() to use automatic discovery instead of hardcoded list - Handle scorers requiring constructor args (e.g., Guidelines) by catching both TypeError and pydantic.ValidationError - Remove unused is_databricks_uri import - Update tests to expect 16+ scorers instead of 9/11 - Add test_builtin_scorer_discovery() to verify discovery mechanism Benefits: - Zero maintenance: new scorers automatically discovered - Complete coverage: discovers all 16 instantiable scorers (vs 9/11) - Adds 5+ scorers previously missing from hardcoded list: Fluency, Summarization, ConversationalSafety, ConversationalToolCallEfficiency, ConversationalRoleAdherence - Future-proof: no code changes needed for new scorers 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>

Changes based on ALKIS comments: - Move inspect import to top-level (no local imports) - Consolidate tests: remove redundant test_get_all_scorers_oss - Rewrite test to use public API get_all_scorers() instead of private _get_all_concrete_builtin_scorers() - Improve test to verify exact set of expected scorers - Simplify docstring and comments The new test_get_all_scorers() is more comprehensive: - Tests the public API directly (better encapsulation) - Verifies exact count and exact set of scorers - Explicitly checks Guidelines is excluded - No dependency on tracking URI (simpler) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>

github-actions · 2025-12-17T00:43:33Z

Documentation preview for 03ede60 is available at:

https://pr-19443--mlflow-docs-preview.netlify.app/docs/latest/

More info

Ignore this comment if this PR does not change the documentation.
The preview is updated when a new commit is pushed to this PR.
This comment was created by this workflow run.
The documentation was built by this workflow run.

Address ALKIS feedback: - Remove hardcoded count (expected_count = 16) in favor of directly comparing scorer_class_names with expected_scorers - Remove all explanatory comments from test body - Remove custom assertion messages - rely on pytest introspection - Convert scorer_names check to simpler set comparison for duplicates The test is now more concise and focuses purely on comparing the actual set of discovered scorers against the expected set. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>

The assertions checking that BuiltInScorer, BuiltInSessionLevelScorer, and Guidelines are not in scorer_class_names are redundant since the equality check `scorer_class_names == expected_scorers` already guarantees these classes are excluded. This further simplifies the test to its essential validation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>

The test_list_builtin_scorers_shows_all_available_scorers test was trying to patch is_databricks_uri which no longer exists in builtin_scorers.py after our refactoring. Since get_all_scorers() now returns all scorers regardless of environment (no Databricks-specific conditional logic), the parameterized test checking different behaviors is obsolete. Simplified the test to verify that the CLI returns the same scorers as get_all_scorers() without mocking or parameterization. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>

AveshCSingh

LGTM

github-actions bot added area/evaluation MLflow Evaluation rn/feature Mention under Features in Changelogs. labels Dec 17, 2025

alkispoly-db and others added 2 commits December 17, 2025 00:46

alkispoly-db requested a review from AveshCSingh December 17, 2025 00:50

AveshCSingh approved these changes Dec 17, 2025

View reviewed changes

alkispoly-db added this pull request to the merge queue Dec 17, 2025

Merged via the queue into mlflow:master with commit bc45779 Dec 17, 2025
47 of 49 checks passed

alkispoly-db deleted the mlflow-auto-builtin branch December 17, 2025 18:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement automatic discovery for builtin scorers#19443

Implement automatic discovery for builtin scorers#19443
alkispoly-db merged 5 commits intomlflow:masterfrom
alkispoly-db:mlflow-auto-builtin

alkispoly-db commented Dec 17, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Dec 17, 2025 •

edited

Loading

Uh oh!

AveshCSingh left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alkispoly-db commented Dec 17, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Install mlflow from this PR

Related Issues/PRs

What changes are proposed in this pull request?

How is this PR tested?

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

Should this PR be included in the next patch release?

Uh oh!

github-actions bot commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AveshCSingh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alkispoly-db commented Dec 17, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Dec 17, 2025 •

edited

Loading