Fix scorers issue in metaprompting by chenmoneygithub · Pull Request #20173 · mlflow/mlflow

chenmoneygithub · 2026-01-21T03:56:25Z

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

We need to raise an explicit exception when only one of train_data and scorers is set in mlflow.genai.optimize_prompts(). Additionally, we allow scorers=None for a better developer experience.

When metaprompting optimizer receives dataset while not getting scorer, the metaprompting will still work, and an example looks like below:


You are an expert prompt engineer. Your task is to improve the following prompts to achieve better performance.
CURRENT PROMPTS: Prompt name: medical_section_classifier Template: Classify this medical research paper sentence into one of these sections: CONCLUSIONS, RESULTS, METHODS, OBJECTIVE, BACKGROUND.
Sentence: {{sentence}}
EVALUATION EXAMPLES: Below are examples showing how the current prompts performed. Study these to identify patterns in what worked and what failed.
Example 1: Input: {"sentence": "The emergence of HIV as a chronic condition means that people living with HIV are required to take more responsibility for the self-management of their condition , including making physical , emotional and social adjustments ."} Output: BACKGROUND Expected: {'expected_response': 'BACKGROUND'}
Example 2: Input: {"sentence": "This paper describes the design and evaluation of Positive Outlook , an online program aiming to enhance the self-management skills of gay men living with HIV ."} Output: METHODS Expected: {'expected_response': 'BACKGROUND'}
Example 3: Input: {"sentence": "This study is designed as a randomised controlled trial in which men living with HIV in Australia will be assigned to either an intervention group or usual care control group ."} Output: METHODS Expected: {'expected_response': 'METHODS'}
Example 4: Input: {"sentence": "The intervention group will participate in the online group program ` Positive Outlook ' ."} Output: METHODS Expected: {'expected_response': 'METHODS'}
Example 5: Input: {"sentence": "The program is based on self-efficacy theory and uses a self-management approach to enhance skills , confidence and abilities to manage the psychosocial issues associated with HIV in daily life ."} Output: BACKGROUND Expected: {'expected_response': 'METHODS'}
Example 6: Input: {"sentence": "Participants will access the program for a minimum of 90 minutes per week over seven weeks ."} Output: METHODS Expected: {'expected_response': 'METHODS'}
Example 7: Input: {"sentence": "Primary outcomes are domain specific self-efficacy , HIV related quality of life , and outcomes of health education ."} Output: METHODS Expected: {'expected_response': 'METHODS'}
Example 8: Input: {"sentence": "Secondary outcomes include : depression , anxiety and stress ; general health and quality of life ; adjustment to HIV ; and social support ."} Output: METHODS
Reason: It describes the outcomes being measured in the study (secondary outcomes), which is part of the study design/methods rather than results, objectives, background, or conclusions. Expected: {'expected_response': 'METHODS'}
Example 9: Input: {"sentence": "Data collection will take place at baseline , completion of the intervention ( or eight weeks post randomisation ) and at 12 week follow-up ."} Output: METHODS Expected: {'expected_response': 'METHODS'}

How is this PR tested?

Existing unit/integration tests
New unit/integration tests
Manual tests

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

How should the PR be classified in the release notes? Choose one:

rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

Should this PR be included in the next patch release?

Yes should be selected for bug fixes, documentation updates, and other small changes. No should be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.

What is a minor/patch release?

Minor release: a release that increments the second part of the version number (e.g., 1.2.0 -> 1.3.0).
Bug fixes, doc updates and new features usually go into minor releases.
Patch release: a release that increments the third part of the version number (e.g., 1.2.0 -> 1.2.1).
Bug fixes and doc updates usually go into patch releases.

Yes (this PR will be cherry-picked and included in the next patch release)
No (this PR will be included in the next minor release)

github-actions · 2026-01-21T03:56:45Z

🛠 DevTools 🛠

Install mlflow from this PR

# mlflow
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/20173/merge
# mlflow-skinny
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/20173/merge#subdirectory=libs/skinny

For Databricks, use the following command:

%sh curl -LsSf https://raw.githubusercontent.com/mlflow/mlflow/HEAD/dev/install-skinny.sh | sh -s pull/20173/merge

github-actions · 2026-01-21T03:56:46Z

@chenmoneygithub Thank you for the contribution! Could you fix the following issue(s)?

⚠ DCO check

The DCO check failed. Please sign off your commit(s) by following the instructions here. See https://github.com/mlflow/mlflow/blob/master/CONTRIBUTING.md#sign-your-work for more details.

Copilot

Pull request overview

This PR enhances the prompt optimization API to support zero-shot mode by making the scorers parameter optional and adding validation to ensure train_data and scorers are set together (both provided or both None/empty).

Changes:

Made scorers parameter optional (defaults to None) in optimize_prompts()
Added validation to ensure train_data and scorers are mutually required
Updated validate_train_data() to handle None scorers for zero-shot mode

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
mlflow/genai/optimize/optimize.py	Made scorers optional, added mutual validation, set eval_fn to None in zero-shot mode
mlflow/genai/optimize/util.py	Updated validate_train_data to accept None scorers
tests/genai/optimize/test_optimize.py	Updated MockPromptOptimizer to handle None eval_fn, added validation tests

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mlflow/genai/optimize/optimize.py

github-actions · 2026-01-21T04:04:14Z

Documentation preview for 2aad211 is available at:

https://pr-20173--mlflow-docs-preview.netlify.app/docs/latest/

More info

Ignore this comment if this PR does not change the documentation.
The preview is updated when a new commit is pushed to this PR.
This comment was created by this workflow run.
The documentation was built by this workflow run.

TomeHirata · 2026-01-21T12:16:28Z

mlflow/genai/optimize/optimize.py

+    has_train_data = train_data is not None and len(train_data) > 0
+    has_scorers = scorers is not None and len(scorers) > 0
+
+    if has_train_data and not has_scorers:


Is't it possible to run few-shot metaprompting if tracing data exists and scorers is None?

Technically yes, but it's not working well from my earlier experiments, potentially because there is no new information getting generated.

However, for model switching use case, where the inference model is different from the model that generates the traces, this setup (train_data + no scorer) does work, since we use the same API to cover both scenarios, let me remove this validation.

mlflow/genai/optimize/optimize.py

TomeHirata · 2026-01-21T23:05:41Z

mlflow/genai/optimize/optimize.py


-    if train_data is None or len(train_data) == 0:
+    # Validate that train_data and scorers are set together
+    has_train_data = train_data is not None and len(train_data) > 0


nit: do we allow users to pass train_data=None? The type hint does not support None.

TomeHirata · 2026-01-21T23:08:27Z

mlflow/genai/optimize/optimize.py

-    metric_fn = create_metric_from_scorers(scorers, aggregation)
-    eval_fn = _build_eval_fn(predict_fn, metric_fn)
+    # Create metric function only if scorers are provided (few-shot mode)
+    if has_scorers:


What happens if users don't pass dataset and scorers, and use GEPA? Maybe should we add a validation in each optimizer as the required fields may vary across optimizers?

I realized the old code is a bit broken, refactored to make validation work better, and ensure that metaprompting with trian_data without scorers works well.

fix scorer odds

813791e

Copilot AI review requested due to automatic review settings January 21, 2026 03:56

chenmoneygithub added the v3.9.0 label Jan 21, 2026

Copilot started reviewing on behalf of chenmoneygithub January 21, 2026 03:56 View session

github-actions bot added area/prompts MLflow Prompt Registry and Optimization rn/bug-fix Mention under Bug Fixes in Changelogs. labels Jan 21, 2026

Copilot AI reviewed Jan 21, 2026

View reviewed changes

mlflow/genai/optimize/optimize.py Outdated Show resolved Hide resolved

mlflow/genai/optimize/optimize.py Outdated Show resolved Hide resolved

mlflow/genai/optimize/optimize.py Show resolved Hide resolved

better code

c9b8d09

chenmoneygithub requested a review from TomeHirata January 21, 2026 04:18

TomeHirata reviewed Jan 21, 2026

View reviewed changes

github-actions bot assigned TomeHirata Jan 21, 2026

lift some check

ad1ff97

TomeHirata reviewed Jan 21, 2026

View reviewed changes

mlflow/genai/optimize/optimize.py Outdated Show resolved Hide resolved

TomeHirata reviewed Jan 21, 2026

View reviewed changes

chenmoneygithub added 2 commits January 21, 2026 16:15

make the metaprompting work better

b4e8919

fix lints

3f12d0e

chenmoneygithub requested a review from TomeHirata January 22, 2026 01:10

TomeHirata approved these changes Jan 22, 2026

View reviewed changes

Merge branch 'master' into metaprompting-fix-2

2aad211

chenmoneygithub added this pull request to the merge queue Jan 22, 2026

Merged via the queue into mlflow:master with commit 25833b7 Jan 22, 2026
46 of 47 checks passed

chenmoneygithub deleted the metaprompting-fix-2 branch January 22, 2026 05:05

harupy pushed a commit to harupy/mlflow that referenced this pull request Jan 28, 2026

Fix scorers issue in metaprompting (mlflow#20173)

87798b4

harupy pushed a commit to harupy/mlflow that referenced this pull request Jan 28, 2026

Fix scorers issue in metaprompting (mlflow#20173)

df3f69b

harupy pushed a commit that referenced this pull request Jan 28, 2026

Fix scorers issue in metaprompting (#20173)

d7dc46c

Conversation

chenmoneygithub commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related Issues/PRs

What changes are proposed in this pull request?

How is this PR tested?

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

Should this PR be included in the next patch release?

Uh oh!

github-actions bot commented Jan 21, 2026

Install mlflow from this PR

Uh oh!

github-actions bot commented Jan 21, 2026

⚠ DCO check

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomeHirata Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

chenmoneygithub Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

TomeHirata Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

chenmoneygithub Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

TomeHirata Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

chenmoneygithub Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chenmoneygithub commented Jan 21, 2026 •

edited

Loading

github-actions bot commented Jan 21, 2026 •

edited

Loading