Implement automatic discovery for builtin scorers#19443
Merged
alkispoly-db merged 5 commits intomlflow:masterfrom Dec 17, 2025
Merged
Implement automatic discovery for builtin scorers#19443alkispoly-db merged 5 commits intomlflow:masterfrom
alkispoly-db merged 5 commits intomlflow:masterfrom
Conversation
Replace hardcoded list in get_all_scorers() with automatic discovery using Python's __subclasses__() introspection. This eliminates the need to manually maintain the scorer list when adding new scorers. Key changes: - Add _get_all_concrete_builtin_scorers() helper function that recursively discovers all concrete BuiltInScorer subclasses - Update get_all_scorers() to use automatic discovery instead of hardcoded list - Handle scorers requiring constructor args (e.g., Guidelines) by catching both TypeError and pydantic.ValidationError - Remove unused is_databricks_uri import - Update tests to expect 16+ scorers instead of 9/11 - Add test_builtin_scorer_discovery() to verify discovery mechanism Benefits: - Zero maintenance: new scorers automatically discovered - Complete coverage: discovers all 16 instantiable scorers (vs 9/11) - Adds 5+ scorers previously missing from hardcoded list: Fluency, Summarization, ConversationalSafety, ConversationalToolCallEfficiency, ConversationalRoleAdherence - Future-proof: no code changes needed for new scorers 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>
Changes based on ALKIS comments: - Move inspect import to top-level (no local imports) - Consolidate tests: remove redundant test_get_all_scorers_oss - Rewrite test to use public API get_all_scorers() instead of private _get_all_concrete_builtin_scorers() - Improve test to verify exact set of expected scorers - Simplify docstring and comments The new test_get_all_scorers() is more comprehensive: - Tests the public API directly (better encapsulation) - Verifies exact count and exact set of scorers - Explicitly checks Guidelines is excluded - No dependency on tracking URI (simpler) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>
Contributor
|
Documentation preview for 03ede60 is available at: More info
|
Address ALKIS feedback: - Remove hardcoded count (expected_count = 16) in favor of directly comparing scorer_class_names with expected_scorers - Remove all explanatory comments from test body - Remove custom assertion messages - rely on pytest introspection - Convert scorer_names check to simpler set comparison for duplicates The test is now more concise and focuses purely on comparing the actual set of discovered scorers against the expected set. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>
The assertions checking that BuiltInScorer, BuiltInSessionLevelScorer, and Guidelines are not in scorer_class_names are redundant since the equality check `scorer_class_names == expected_scorers` already guarantees these classes are excluded. This further simplifies the test to its essential validation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>
The test_list_builtin_scorers_shows_all_available_scorers test was trying to patch is_databricks_uri which no longer exists in builtin_scorers.py after our refactoring. Since get_all_scorers() now returns all scorers regardless of environment (no Databricks-specific conditional logic), the parameterized test checking different behaviors is obsolete. Simplified the test to verify that the CLI returns the same scorers as get_all_scorers() without mocking or parameterization. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🛠 DevTools 🛠
Install mlflow from this PR
For Databricks, use the following command:
Related Issues/PRs
N/A
What changes are proposed in this pull request?
This PR implements automatic discovery for builtin scorers, replacing the hardcoded list in
get_all_scorers()with a dynamic introspection-based approach using Python's__subclasses__()method.Key changes:
_get_all_concrete_builtin_scorers()helper function that recursively discovers all concreteBuiltInScorersubclassesget_all_scorers()to use automatic discovery instead of maintaining a hardcoded listTypeErrorandpydantic.ValidationError)is_databricks_uriimporttest_builtin_scorer_discovery()to verify the discovery mechanismBenefits:
get_all_scorers()How is this PR tested?
Testing:
test_get_all_scorers_ossto verify 16+ scorers are discoveredtest_builtin_scorer_discoveryto validate discovery of all expected scorer classesmlflow scorers list --builtinreturns all 16 scorersDoes this PR require documentation update?
The function docstring has been updated to explain the automatic discovery mechanism. No external documentation changes needed.
Release Notes
Is this a user-facing change?
mlflow scorers list --builtinnow automatically discovers and returns all builtin scorers (16 total) instead of a hardcoded subset (9 OSS / 11 Databricks). This includes 5+ previously unavailable scorers: Fluency, Summarization, ConversationalSafety, ConversationalToolCallEfficiency, and ConversationalRoleAdherence. Theget_all_scorers()function now uses automatic discovery, eliminating the need to manually maintain the scorer list.What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/evaluation: MLflow model evaluation features, evaluation metrics, and evaluation workflowsHow should the PR be classified in the release notes? Choose one:
rn/feature- A new user-facing feature worth mentioning in the release notesShould this PR be included in the next patch release?
This is a feature enhancement that increases the number of available scorers, making it more appropriate for a minor release.