Add description field to all built-in scorers by alkispoly-db · Pull Request #18547 · mlflow/mlflow

alkispoly-db · 2025-10-28T03:50:24Z

🛠 DevTools 🛠

Install mlflow from this PR

# mlflow
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/18547/merge
# mlflow-skinny
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/18547/merge#subdirectory=libs/skinny

For Databricks, use the following command:

%sh curl -LsSf https://raw.githubusercontent.com/mlflow/mlflow/HEAD/dev/install-skinny.sh | sh -s pull/18547/merge

Related Issues/PRs

N/A

What changes are proposed in this pull request?

This PR adds a description field to all 9 built-in scorer classes in MLflow to improve discoverability and documentation. Each scorer now includes a concise, human-readable description that explains what it evaluates:

RetrievalRelevance: Evaluate whether each retrieved context chunk is relevant to the input request
RetrievalSufficiency: Evaluate whether the information in the last retrieval is sufficient to generate the facts
RetrievalGroundedness: Assess whether the facts in the response are implied by the retrieval information (no hallucinations)
Guidelines: Evaluate whether the agent's response follows specific constraints or instructions
ExpectationsGuidelines: Evaluate whether responses follow row-specific constraints
RelevanceToQuery: Ensure responses directly address the user's input
Safety: Ensure responses do not contain harmful, offensive, or toxic content
Correctness: Check whether the response matches expected facts
Equivalence: Compare outputs against expected outputs for semantic equivalence

How is this PR tested?

Existing unit/integration tests
New unit/integration tests
Manual tests

Ran the full test suite for builtin_scorers: 58 tests passed, 4 skipped. All existing tests continue to pass with no breaking changes.

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

Added description fields to all built-in scorers (RetrievalRelevance, RetrievalSufficiency, RetrievalGroundedness, Guidelines, ExpectationsGuidelines, RelevanceToQuery, Safety, Correctness, Equivalence) to improve discoverability and make the API more self-documenting.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

How should the PR be classified in the release notes? Choose one:

`rn/none` - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
`rn/breaking-change` - The PR will be mentioned in the "Breaking Changes" section
`rn/feature` - A new user-facing feature worth mentioning in the release notes
`rn/bug-fix` - A user-facing bug fix worth mentioning in the release notes
`rn/documentation` - A user-facing documentation change worth mentioning in the release notes

Should this PR be included in the next patch release?

`Yes` should be selected for bug fixes, documentation updates, and other small changes. `No` should be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.

Yes (this PR will be cherry-picked and included in the next patch release)
No (this PR will be included in the next minor release)

Added concise description fields to all built-in scorer classes to improve discoverability and documentation. Each description provides a brief summary of what the scorer evaluates: - RetrievalRelevance: Evaluates chunk relevance to input request - RetrievalSufficiency: Checks if retrieval info is sufficient for expected facts - RetrievalGroundedness: Assesses if response facts are implied by retrieval (no hallucinations) - Guidelines: Checks adherence to specified constraints/instructions - ExpectationsGuidelines: Validates per-row guideline adherence - RelevanceToQuery: Ensures response addresses user input without deviation - Safety: Ensures no harmful, offensive, or toxic content - Correctness: Verifies response matches expected facts - Equivalence: Compares outputs for semantic equivalence This change improves the scorer API by providing human-readable descriptions that can be displayed in UIs and documentation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>

…-descriptions

…/mlflow into mlflow-builtin-descriptions

github-actions · 2025-10-31T22:03:20Z

Documentation preview for ebfde36 is available at:

https://pr-18547--mlflow-docs-preview.netlify.app/docs/latest/

More info

Ignore this comment if this PR does not change the documentation.
The preview is updated when a new commit is pushed to this PR.
This comment was created by this workflow run.
The documentation was built by this workflow run.

Updated two tests that expected built-in scorers to have `description=None`. With the addition of default descriptions to built-in scorers, these tests now correctly verify that built-in scorers have non-empty string descriptions. Changes: - test_builtin_scorer_without_description: Now verifies scorers have default descriptions - test_backward_compatibility_scorer_without_description: Updated to check that built-in scorers have default descriptions while custom scorers/judges still default to None - Added clarifying comments explaining the new behavior All 14 tests in test_scorer_description.py now pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>

alkispoly-db requested review from BenWilson2 and smoorjani October 28, 2025 03:52

alkispoly-db added rn/feature Mention under Features in Changelogs. area/evaluation MLflow Evaluation labels Oct 28, 2025

smoorjani approved these changes Oct 28, 2025

View reviewed changes

alkispoly-db and others added 3 commits October 31, 2025 21:49

Merge branch 'master' of github.com:mlflow/mlflow into mlflow-builtin…

5d6d47e

…-descriptions

Merge branch 'master' into mlflow-builtin-descriptions

45aa635

Merge branch 'mlflow-builtin-descriptions' of github.com:alkispoly-db…

e3b25fc

…/mlflow into mlflow-builtin-descriptions

BenWilson2 approved these changes Nov 3, 2025

View reviewed changes

BenWilson2 added this pull request to the merge queue Nov 3, 2025

Merged via the queue into mlflow:master with commit 35cf507 Nov 3, 2025
46 of 48 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add description field to all built-in scorers#18547

Add description field to all built-in scorers#18547
BenWilson2 merged 5 commits intomlflow:masterfrom
alkispoly-db:mlflow-builtin-descriptions

alkispoly-db commented Oct 28, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Oct 31, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

alkispoly-db commented Oct 28, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Install mlflow from this PR

Related Issues/PRs

What changes are proposed in this pull request?

How is this PR tested?

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

Should this PR be included in the next patch release?

Uh oh!

github-actions bot commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alkispoly-db commented Oct 28, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Oct 31, 2025 •

edited

Loading