Support metaprompting in mlflow.genai.optimize_prompts() by chenmoneygithub · Pull Request #19762 · mlflow/mlflow

chenmoneygithub · 2026-01-06T04:46:36Z

🛠 DevTools 🛠

Install mlflow from this PR

# mlflow
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/19762/merge
# mlflow-skinny
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/19762/merge#subdirectory=libs/skinny

For Databricks, use the following command:

%sh curl -LsSf https://raw.githubusercontent.com/mlflow/mlflow/HEAD/dev/install-skinny.sh | sh -s pull/19762/merge

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

Support metaprompting in mlflow.genai.optimize_prompts(). There are two modes:

zero-shot metaprompting when no train_data is provided.
few-shot metaprompting when train_data is provided.

Zero-shot is less useful on the SDK side, but will be useful on the optimization UI. I will update the tutorial in a separate PR to avoid gigantic PR.

e2e tested with the script below:

import random

import litellm
from datasets import load_dataset

import mlflow
from mlflow.genai.optimize.optimizers import MetaPromptOptimizer
from mlflow.genai.scorers import Correctness


def load_aime_data(num_train=10, num_test=32, seed=42):
    """Load AIME dataset from HuggingFace and split into train/test."""
    dataset = load_dataset("gneubig/aime-1983-2024", split="train")

    # Shuffle with fixed seed for reproducibility
    indices = list(range(len(dataset)))
    random.seed(seed)
    random.shuffle(indices)

    # Split into train and test
    train_indices = indices[:num_train]
    test_indices = indices[num_train : num_train + num_test]

    train_dataset = dataset.select(train_indices)
    test_dataset = dataset.select(test_indices)

    # Convert to MLflow format
    def convert_to_mlflow_format(dataset):
        mlflow_data = []
        for example in dataset:
            mlflow_data.append(
                {
                    "inputs": {"question": example["Question"]},
                    "expectations": {"expected_response": str(example["Answer"])},
                }
            )
        return mlflow_data

    return convert_to_mlflow_format(train_dataset), convert_to_mlflow_format(
        test_dataset
    )


def run_benchmark(
    reflection_model="gpt-4o",
    eval_model="gpt-4o-mini",
    num_train_examples=10,
    num_test_examples=32,
):
    """
    Benchmark MetaPromptOptimizer on AIME dataset.

    Args:
        reflection_model: Model to use for prompt optimization
        eval_model: Model to use for predictions
        num_train_examples: Number of training examples
        num_test_examples: Number of test examples for final evaluation
    """

    print("=" * 80)
    print("MetaPromptOptimizer Benchmark on AIME")
    print("=" * 80)

    # Load data
    print(
        f"\nLoading {num_train_examples} train examples and {num_test_examples} test examples..."
    )
    train_data, test_data = load_aime_data(
        num_train=num_train_examples, num_test=num_test_examples
    )

    print(f"Train size: {len(train_data)}")
    print(f"Test size: {len(test_data)}")

    try:
        prompt = mlflow.genai.load_prompt("prompts:/aime_solver/6")
    except Exception:
        prompt = mlflow.genai.register_prompt(
            name="aime_solver",
            template="Solve the following math problem. Provide only the numerical answer:\n\n{{question}}",
        )

    print(f"\nInitial prompt: {prompt.template}")

    # Define predict function
    def predict_fn(question: str) -> str:
        """Prediction function that uses the registered prompt."""
        prompt = mlflow.genai.load_prompt("prompts:/aime_solver/6")
        formatted_prompt = prompt.format(question=question)
        response = litellm.completion(
            model=f"openai/{eval_model}",
            messages=[{"role": "user", "content": formatted_prompt}],
            temperature=1.0,
        )
        return response.choices[0].message.content

    # Optimize prompts
    print("\n" + "-" * 80)
    print("Optimizing Prompts")
    print("-" * 80)

    result = mlflow.genai.optimize_prompts(
        predict_fn=predict_fn,
        train_data=train_data,
        prompt_uris=[prompt.uri],
        optimizer=MetaPromptOptimizer(
            reflection_model="openai:/gpt-5-mini",
            lm_kwargs={"temperature": 1.0, "max_tokens": 4096},
        ),
        scorers=[Correctness(model="openai:/gpt-5-mini")],
    )

    # result = mlflow.genai.optimize_prompts(
    #     predict_fn=predict_fn,
    #     train_data=[],
    #     prompt_uris=[prompt.uri],
    #     optimizer=MetaPromptOptimizer(
    #         reflection_model="openai:/gpt-5-mini",
    #         lm_kwargs={"temperature": 1.0, "max_tokens": 4096},
    #     ),
    #     scorers=[],
    # )

    print(f"\n{'=' * 80}")
    print("Optimization Results")
    print("=" * 80)
    print(f"Initial prompt:\n{prompt.template}\n")
    print(f"Optimized prompt:\n{result.optimized_prompts[0].template}\n")
    print(
        f"Training score improvement: {result.initial_eval_score:.4f} → {result.final_eval_score:.4f}"
    )

    # Check if prompt changed
    if result.optimized_prompts[0].template == prompt.template:
        print("\n⚠️  Note: The optimized prompt is identical to the initial prompt.")
        print("No improvement was found during optimization.")


def run_evaluation(prompt_version: str, data, eval_model: str):
    """
    Evaluate the performance of a prompt on a dataset.
    """

    def predict_fn(question: str) -> str:
        prompt = mlflow.genai.load_prompt(f"prompts:/aime_solver/{prompt_version}")
        formatted_prompt = prompt.format(question=question)
        response = litellm.completion(
            model=f"openai/{eval_model}",
            messages=[{"role": "user", "content": formatted_prompt}],
            temperature=1.0,
        )
        return response.choices[0].message.content

    return mlflow.genai.evaluate(
        data=data,
        predict_fn=predict_fn,
        scorers=[Correctness(model=eval_model)],
    )


def main():
    # # Configure MLflow tracking
    mlflow.set_tracking_uri("http://localhost:5000")
    mlflow.set_experiment("metaprompt-optimizer-aime")

    print("Starting benchmark...")

    run_benchmark(
        reflection_model="gpt-5-mini",
        eval_model="gpt-5-nano",
        num_train_examples=12,
        num_test_examples=24,
    )


if __name__ == "__main__":
    main()

A sample output is as below:

================================================================================
Optimization Results
================================================================================
Initial prompt:
Solve the following math problem. Provide only the numerical answer:

{{question}}

Optimized prompt:
You are an expert contest mathematician. Solve the following math problem and output only the final numerical answer. Do not show any solution steps, reasoning, or explanations — perform all reasoning internally. Before answering, verify your result by recomputing or checking algebra/arithmetic.

Rules for the output (must be followed exactly):
- If the answer is an integer, output only the digits of the integer (no commas, no words, no punctuation).
- If the answer is a simplified fraction, output it as numerator/denominator in lowest terms (e.g., 3/7).
- If the problem asks for a specific expression (such as m+n), output that final integer only.
- Do not output any extra text, labels, or whitespace — only the exact answer characters.

and screenshot for the associated mlflow run:

Screenshot for the trace of metaprompting with few-shot data:

How is this PR tested?

Existing unit/integration tests
New unit/integration tests
Manual tests

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

How should the PR be classified in the release notes? Choose one:

rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

Should this PR be included in the next patch release?

Yes should be selected for bug fixes, documentation updates, and other small changes. No should be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.

What is a minor/patch release?

Minor release: a release that increments the second part of the version number (e.g., 1.2.0 -> 1.3.0).
Bug fixes, doc updates and new features usually go into minor releases.
Patch release: a release that increments the third part of the version number (e.g., 1.2.0 -> 1.2.1).
Bug fixes and doc updates usually go into patch releases.

Yes (this PR will be cherry-picked and included in the next patch release)
No (this PR will be included in the next minor release)

github-actions · 2026-01-06T04:46:49Z

@chenmoneygithub Thank you for the contribution! Could you fix the following issue(s)?

⚠ DCO check

The DCO check failed. Please sign off your commit(s) by following the instructions here. See https://github.com/mlflow/mlflow/blob/master/CONTRIBUTING.md#sign-your-work for more details.

Copilot

Pull request overview

This PR introduces metaprompting support to MLflow's prompt optimization capabilities by adding a new MetaPromptOptimizer class. The optimizer uses LLMs to iteratively improve prompts through either zero-shot mode (applying general best practices without evaluation data) or few-shot mode (learning from evaluation feedback on training examples).

Key changes:

New MetaPromptOptimizer class with automatic mode detection based on training data availability
Support for custom guidelines to guide the optimization process
Comprehensive test suite covering initialization, template validation, sampling, and integration scenarios
Support for separate validation datasets to prevent overfitting

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 12 comments.

File	Description
mlflow/genai/optimize/optimizers/metaprompt_optimizer.py	New optimizer implementation with zero-shot and few-shot metaprompting modes, template variable validation, and MLflow tracking integration
tests/genai/optimize/optimizers/test_metaprompt_optimizer.py	Comprehensive test suite covering initialization, template variables, sampling, meta-prompt building, LLM invocation, and integration scenarios
mlflow/genai/optimize/optimizers/init.py	Exports the new MetaPromptOptimizer class
mlflow/genai/optimize/optimize.py	Minor formatting improvements for better code readability (line breaks in function signatures)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mlflow/genai/optimize/optimizers/metaprompt_optimizer.py

tests/genai/optimize/optimizers/test_metaprompt_optimizer.py

mlflow/genai/optimize/optimizers/metaprompt_optimizer.py

tests/genai/optimize/optimizers/test_metaprompt_optimizer.py

github-actions · 2026-01-06T04:54:31Z

Documentation preview for 9c3284f is available at:

https://pr-19762--mlflow-docs-preview.netlify.app/docs/latest/

More info

Ignore this comment if this PR does not change the documentation.
The preview is updated when a new commit is pushed to this PR.
This comment was created by this workflow run.
The documentation was built by this workflow run.

github-actions · 2026-01-07T04:15:21Z

@chenmoneygithub Thank you for the contribution! Could you fix the following issue(s)?

⚠ DCO check

The DCO check failed. Please sign off your commit(s) by following the instructions here. See https://github.com/mlflow/mlflow/blob/master/CONTRIBUTING.md#sign-your-work for more details.

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

chenmoneygithub · 2026-01-07T22:46:18Z

mlflow/genai/optimize/optimize.py

-    )
+    # Check if train_data is empty (for zero-shot optimization)
+    if len(train_data) == 0:
+        # Zero-shot mode: no training data provided


Zero-shot is less useful on the SDK side, since people have easier way to do zero-shot metaprompting/optimization, but this will be the backbone for the UI solution.

chenmoneygithub · 2026-01-07T23:08:45Z

@copilot redo the review from beginning, please cover all commits not just the commits since your last review.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mlflow/genai/optimize/optimizers/metaprompt_optimizer.py

TomeHirata · 2026-01-08T08:49:46Z

mlflow/genai/optimize/optimizers/metaprompt_optimizer.py

+
+    Args:
+        reflection_model: Name of the model to use for prompt optimization.
+            Format: "<provider>:/<model>" (e.g., "openai:/gpt-4o",


can we use newer models?

for sure, done!

mlflow/genai/optimize/optimizers/metaprompt_optimizer.py

TomeHirata · 2026-01-08T09:09:03Z

mlflow/genai/optimize/optimizers/metaprompt_optimizer.py

+    registered regardless of performance improvement.
+
+    Args:
+        reflection_model: Name of the model to use for prompt optimization.


Shall we call it as prompt_model or optimizer_model? Metaprompting does not reflect eval results.

I also thought about this, and few-shot metaprompting does use some "reflection" while zero-shot not, so prompt_model fits better here semantically. However, I also want to keep some consistency with the GepaPromptOptimizer so that users don't need to learn two concepts when picking up optimizers, so I decided to keep it as reflection_model. Please let me know if this makes sense, and happy to make changes!

Since algorithm is totally different I think it's fine not to keep the same naming. Not a blocker though.

mlflow/genai/optimize/optimizers/metaprompt_optimizer.py

TomeHirata · 2026-01-08T09:10:27Z

mlflow/genai/optimize/optimizers/metaprompt_optimizer.py

+            # Validate prompt names match
+            self._validate_prompt_names(target_prompts, improved_prompts)
+
+            # Validate template variables are preserved in improved prompts


harupy · 2026-01-09T00:05:40Z

.vscode/settings.json

  "[python]": {
    "editor.defaultFormatter": "charliermarsh.ruff",
-    "editor.formatOnSave": true,
+    "editor.formatOnSave": false,


Let's revert this. This is unrelated to this PR.

btw why do we need this change?

oh geez, I meant to edit it locally. command + shift + P put .vscode/settings.json as the first option.

mlflow/genai/optimize/util.py

harupy

Left a few more comments, otherwise LGTM

TomeHirata · 2026-01-09T10:27:52Z

mlflow/genai/optimize/optimizers/metaprompt_optimizer.py

+
+    Automatically detects optimization mode based on training data:
+    - Zero-shot: No evaluation data - applies general prompt engineering best practices
+    - Few-shot: Has evaluation data - learns from feedback on examples


Is feedback necessary, or can users pass just inputs/outputs?

For this implementation, feedback is alway present. But feedback is a big misleading, should be evaluation results, changed

TomeHirata · 2026-01-09T10:38:09Z

mlflow/genai/optimize/optimizers/metaprompt_optimizer.py

+        """
+        _logger.info("Applying zero-shot prompt optimization with best practices")
+
+        # Build meta-prompt


nit: I think we don't need this comment

mlflow/genai/optimize/optimizers/metaprompt_optimizer.py

TomeHirata · 2026-01-09T10:44:33Z

mlflow/genai/optimize/optimizers/metaprompt_optimizer.py

+
+        content = None  # Initialize to avoid NameError in exception handler
+
+        with mlflow.start_span(name="metaprompt_reflection", span_type=SpanType.LLM) as span:


Do you think we should always enable tracing? Or should we conditionally enable if when enable_tracking=True?

good call, it makes sense to me to skip tracing the metaprompting call as well when enable_tracking=False, changed!

TomeHirata

Left some comments, otherwise LGTM

chenmoneygithub added 3 commits January 5, 2026 17:59

init

87e80f7

increment

76b1819

debug tmp

29eb8f0

Copilot AI review requested due to automatic review settings January 6, 2026 04:46

chenmoneygithub marked this pull request as draft January 6, 2026 04:46

Copilot started reviewing on behalf of chenmoneygithub January 6, 2026 04:47 View session

Copilot AI reviewed Jan 6, 2026

View reviewed changes

polish

3556d80

chenmoneygithub marked this pull request as ready for review January 7, 2026 04:15

chenmoneygithub requested a review from Copilot January 7, 2026 04:15

github-actions bot added area/prompts MLflow Prompt Registry and Optimization rn/feature Mention under Features in Changelogs. labels Jan 7, 2026

Copilot started reviewing on behalf of chenmoneygithub January 7, 2026 04:15 View session

Copilot AI reviewed Jan 7, 2026

View reviewed changes

chenmoneygithub added 5 commits January 6, 2026 21:06

fix tests

0ea4f1e

format

ab5ccfe

remove debug function

0ce9972

Merge branch 'master' into metaprompting

6edbf38

merge the metaprompt for zero-shot and few-shot

e605385

chenmoneygithub commented Jan 7, 2026

View reviewed changes

harupy requested a review from Copilot January 8, 2026 00:45

Copilot started reviewing on behalf of harupy January 8, 2026 00:45 View session

Copilot AI reviewed Jan 8, 2026

View reviewed changes

mlflow/genai/optimize/optimizers/metaprompt_optimizer.py Show resolved Hide resolved

mlflow/genai/optimize/optimizers/metaprompt_optimizer.py Show resolved Hide resolved

fix comments

95956d4

chenmoneygithub changed the title ~~[WIP] Support metaprompting in mlflow.genai.optimize_prompts()~~ Support metaprompting in mlflow.genai.optimize_prompts() Jan 8, 2026

chenmoneygithub requested a review from TomeHirata January 8, 2026 07:07

harupy reviewed Jan 8, 2026

View reviewed changes

mlflow/genai/optimize/optimizers/metaprompt_optimizer.py Show resolved Hide resolved

TomeHirata reviewed Jan 8, 2026

View reviewed changes

harupy reviewed Jan 8, 2026

View reviewed changes

mlflow/genai/optimize/optimizers/metaprompt_optimizer.py Outdated Show resolved Hide resolved

harupy reviewed Jan 8, 2026

View reviewed changes

mlflow/genai/optimize/optimizers/metaprompt_optimizer.py Outdated Show resolved Hide resolved

harupy reviewed Jan 8, 2026

View reviewed changes

mlflow/genai/optimize/optimizers/metaprompt_optimizer.py Outdated Show resolved Hide resolved

harupy reviewed Jan 8, 2026

View reviewed changes

mlflow/genai/optimize/optimizers/metaprompt_optimizer.py Outdated Show resolved Hide resolved

TomeHirata reviewed Jan 8, 2026

View reviewed changes

mlflow/genai/optimize/optimizers/metaprompt_optimizer.py Outdated Show resolved Hide resolved

TomeHirata reviewed Jan 8, 2026

View reviewed changes

github-actions bot assigned harupy and TomeHirata Jan 8, 2026

resolve comments

255853c

harupy reviewed Jan 9, 2026

View reviewed changes

mlflow/genai/optimize/util.py Show resolved Hide resolved

harupy approved these changes Jan 9, 2026

View reviewed changes

TomeHirata reviewed Jan 9, 2026

View reviewed changes

mlflow/genai/optimize/optimizers/metaprompt_optimizer.py Outdated Show resolved Hide resolved

TomeHirata reviewed Jan 9, 2026

View reviewed changes

mlflow/genai/optimize/optimizers/metaprompt_optimizer.py Outdated Show resolved Hide resolved

TomeHirata reviewed Jan 9, 2026

View reviewed changes

TomeHirata approved these changes Jan 9, 2026

View reviewed changes

chenmoneygithub added 3 commits January 9, 2026 09:49

revert unwanted changes

157cdbe

Merge branch 'master' into metaprompting

7950a3a

address comments

9c3284f

chenmoneygithub added this pull request to the merge queue Jan 12, 2026

Merged via the queue into mlflow:master with commit 07ee84f Jan 12, 2026
49 of 50 checks passed

chenmoneygithub deleted the metaprompting branch January 12, 2026 21:20

debu-sinha pushed a commit to debu-sinha/mlflow that referenced this pull request Jan 15, 2026

Support metaprompting in mlflow.genai.optimize_prompts() (mlflow#19762)

5caaa9a

chenmoneygithub added the v3.9.0 label Jan 26, 2026


		content = None # Initialize to avoid NameError in exception handler

		with mlflow.start_span(name="metaprompt_reflection", span_type=SpanType.LLM) as span:

Conversation

chenmoneygithub commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Install mlflow from this PR

Related Issues/PRs

What changes are proposed in this pull request?

How is this PR tested?

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

Should this PR be included in the next patch release?

Uh oh!

github-actions bot commented Jan 6, 2026

⚠ DCO check

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 7, 2026

⚠ DCO check

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chenmoneygithub commented Jan 7, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chenmoneygithub Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chenmoneygithub commented Jan 6, 2026 •

edited

Loading

github-actions bot commented Jan 6, 2026 •

edited

Loading

chenmoneygithub Jan 9, 2026 •

edited

Loading

TomeHirata Jan 9, 2026 •

edited

Loading