[OSS GenAI Eval #1] Add test coverage to validate assessments. by B-Step62 · Pull Request #16731 · mlflow/mlflow

B-Step62 · 2025-07-15T08:02:17Z

🛠 DevTools 🛠

Install mlflow from this PR

# mlflow
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/16731/merge
# mlflow-skinny
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/16731/merge#subdirectory=skinny

For Databricks, use the following command:

%sh curl -LsSf https://raw.githubusercontent.com/mlflow/mlflow/HEAD/dev/install-skinny.sh | sh -s pull/16731/merge

What changes are proposed in this pull request?

Before implementing OSS GenAI evaluation logic, enriching the test cases to validate detailed assessment results.

How is this PR tested?

Existing unit/integration tests
New unit/integration tests
Manual tests

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

Interface

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

Should this PR be included in the next patch release?

Yes should be selected for bug fixes, documentation updates, and other small changes. No should be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.

What is a minor/patch release?

Minor release: a release that increments the second part of the version number (e.g., 1.2.0 -> 1.3.0).
Bug fixes, doc updates and new features usually go into minor releases.
Patch release: a release that increments the third part of the version number (e.g., 1.2.0 -> 1.2.1).
Bug fixes and doc updates usually go into patch releases.

Yes (this PR will be cherry-picked and included in the next patch release)
No (this PR will be included in the next minor release)

tests/genai/evaluate/test_evaluation.py

harupy

LGTM! https://github.com/mlflow/mlflow/pull/16731/files#r2206982127 is not a blocker

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

…w#16731) Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

github-actions bot added area/evaluation MLflow Evaluation rn/none List under Small Changes in Changelogs. labels Jul 15, 2025

B-Step62 force-pushed the pr-1-genai-eval-oss-test-enhancement branch from 13d5bae to 745c73e Compare July 15, 2025 08:30

B-Step62 mentioned this pull request Jul 15, 2025

[OSS GenAI Eval #2] Add evaluation harness that support static dataset evaluation. #16732

Merged

41 tasks

B-Step62 requested a review from harupy July 15, 2025 08:43

harupy reviewed Jul 15, 2025

View reviewed changes

tests/genai/evaluate/test_evaluation.py Outdated Show resolved Hide resolved

harupy reviewed Jul 15, 2025

View reviewed changes

tests/genai/evaluate/test_evaluation.py Outdated Show resolved Hide resolved

harupy approved these changes Jul 15, 2025

View reviewed changes

B-Step62 force-pushed the genai-oss-eval branch from a7a33f1 to 7996b05 Compare July 16, 2025 08:58

B-Step62 added 2 commits July 16, 2025 17:59

Enrich evaluation tests to validate detailed assessments

b0d1af7

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

comments

72280fc

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

B-Step62 force-pushed the pr-1-genai-eval-oss-test-enhancement branch from 745c73e to 72280fc Compare July 16, 2025 09:01

B-Step62 merged commit 0f2dca5 into mlflow:genai-oss-eval Jul 16, 2025
26 of 43 checks passed

B-Step62 deleted the pr-1-genai-eval-oss-test-enhancement branch July 16, 2025 09:12

B-Step62 added a commit to B-Step62/mlflow that referenced this pull request Aug 6, 2025

[OSS GenAI Eval #1] Add test coverage to validate assessments. (mlflo…

e02e1ab

…w#16731) Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OSS GenAI Eval #1] Add test coverage to validate assessments.#16731

[OSS GenAI Eval #1] Add test coverage to validate assessments.#16731
B-Step62 merged 2 commits intomlflow:genai-oss-evalfrom
B-Step62:pr-1-genai-eval-oss-test-enhancement

B-Step62 commented Jul 15, 2025 •

edited by github-actions bot

Loading

Uh oh!

Uh oh!

Uh oh!

harupy left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

B-Step62 commented Jul 15, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Install mlflow from this PR

What changes are proposed in this pull request?

How is this PR tested?

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

Should this PR be included in the next patch release?

Uh oh!

Uh oh!

Uh oh!

harupy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

B-Step62 commented Jul 15, 2025 •

edited by github-actions bot

Loading