Support metaprompting in mlflow.genai.optimize_prompts()#19762
Support metaprompting in mlflow.genai.optimize_prompts()#19762chenmoneygithub merged 14 commits intomlflow:masterfrom
Conversation
|
@chenmoneygithub Thank you for the contribution! Could you fix the following issue(s)? ⚠ DCO checkThe DCO check failed. Please sign off your commit(s) by following the instructions here. See https://github.com/mlflow/mlflow/blob/master/CONTRIBUTING.md#sign-your-work for more details. |
There was a problem hiding this comment.
Pull request overview
This PR introduces metaprompting support to MLflow's prompt optimization capabilities by adding a new MetaPromptOptimizer class. The optimizer uses LLMs to iteratively improve prompts through either zero-shot mode (applying general best practices without evaluation data) or few-shot mode (learning from evaluation feedback on training examples).
Key changes:
- New
MetaPromptOptimizerclass with automatic mode detection based on training data availability - Support for custom guidelines to guide the optimization process
- Comprehensive test suite covering initialization, template validation, sampling, and integration scenarios
- Support for separate validation datasets to prevent overfitting
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 12 comments.
| File | Description |
|---|---|
| mlflow/genai/optimize/optimizers/metaprompt_optimizer.py | New optimizer implementation with zero-shot and few-shot metaprompting modes, template variable validation, and MLflow tracking integration |
| tests/genai/optimize/optimizers/test_metaprompt_optimizer.py | Comprehensive test suite covering initialization, template variables, sampling, meta-prompt building, LLM invocation, and integration scenarios |
| mlflow/genai/optimize/optimizers/init.py | Exports the new MetaPromptOptimizer class |
| mlflow/genai/optimize/optimize.py | Minor formatting improvements for better code readability (line breaks in function signatures) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Documentation preview for 9c3284f is available at: More info
|
|
@chenmoneygithub Thank you for the contribution! Could you fix the following issue(s)? ⚠ DCO checkThe DCO check failed. Please sign off your commit(s) by following the instructions here. See https://github.com/mlflow/mlflow/blob/master/CONTRIBUTING.md#sign-your-work for more details. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ) | ||
| # Check if train_data is empty (for zero-shot optimization) | ||
| if len(train_data) == 0: | ||
| # Zero-shot mode: no training data provided |
There was a problem hiding this comment.
Zero-shot is less useful on the SDK side, since people have easier way to do zero-shot metaprompting/optimization, but this will be the backbone for the UI solution.
|
@copilot redo the review from beginning, please cover all commits not just the commits since your last review. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| Args: | ||
| reflection_model: Name of the model to use for prompt optimization. | ||
| Format: "<provider>:/<model>" (e.g., "openai:/gpt-4o", |
There was a problem hiding this comment.
can we use newer models?
There was a problem hiding this comment.
for sure, done!
| registered regardless of performance improvement. | ||
|
|
||
| Args: | ||
| reflection_model: Name of the model to use for prompt optimization. |
There was a problem hiding this comment.
Shall we call it as prompt_model or optimizer_model? Metaprompting does not reflect eval results.
There was a problem hiding this comment.
I also thought about this, and few-shot metaprompting does use some "reflection" while zero-shot not, so prompt_model fits better here semantically. However, I also want to keep some consistency with the GepaPromptOptimizer so that users don't need to learn two concepts when picking up optimizers, so I decided to keep it as reflection_model. Please let me know if this makes sense, and happy to make changes!
There was a problem hiding this comment.
Since algorithm is totally different I think it's fine not to keep the same naming. Not a blocker though.
| # Validate prompt names match | ||
| self._validate_prompt_names(target_prompts, improved_prompts) | ||
|
|
||
| # Validate template variables are preserved in improved prompts |
.vscode/settings.json
Outdated
| "[python]": { | ||
| "editor.defaultFormatter": "charliermarsh.ruff", | ||
| "editor.formatOnSave": true, | ||
| "editor.formatOnSave": false, |
There was a problem hiding this comment.
Let's revert this. This is unrelated to this PR.
There was a problem hiding this comment.
oh geez, I meant to edit it locally. command + shift + P put .vscode/settings.json as the first option.
harupy
left a comment
There was a problem hiding this comment.
Left a few more comments, otherwise LGTM
|
|
||
| Automatically detects optimization mode based on training data: | ||
| - Zero-shot: No evaluation data - applies general prompt engineering best practices | ||
| - Few-shot: Has evaluation data - learns from feedback on examples |
There was a problem hiding this comment.
Is feedback necessary, or can users pass just inputs/outputs?
There was a problem hiding this comment.
For this implementation, feedback is alway present. But feedback is a big misleading, should be evaluation results, changed
| """ | ||
| _logger.info("Applying zero-shot prompt optimization with best practices") | ||
|
|
||
| # Build meta-prompt |
There was a problem hiding this comment.
nit: I think we don't need this comment
|
|
||
| content = None # Initialize to avoid NameError in exception handler | ||
|
|
||
| with mlflow.start_span(name="metaprompt_reflection", span_type=SpanType.LLM) as span: |
There was a problem hiding this comment.
Do you think we should always enable tracing? Or should we conditionally enable if when enable_tracking=True?
There was a problem hiding this comment.
good call, it makes sense to me to skip tracing the metaprompting call as well when enable_tracking=False, changed!
TomeHirata
left a comment
There was a problem hiding this comment.
Left some comments, otherwise LGTM
🛠 DevTools 🛠
Install mlflow from this PR
For Databricks, use the following command:
Related Issues/PRs
#xxxWhat changes are proposed in this pull request?
Support metaprompting in mlflow.genai.optimize_prompts(). There are two modes:
Zero-shot is less useful on the SDK side, but will be useful on the optimization UI. I will update the tutorial in a separate PR to avoid gigantic PR.
e2e tested with the script below:
A sample output is as below:
and screenshot for the associated mlflow run:
Screenshot for the trace of metaprompting with few-shot data:
How is this PR tested?
Does this PR require documentation update?
Release Notes
Is this a user-facing change?
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/tracking: Tracking Service, tracking client APIs, autologgingarea/models: MLmodel format, model serialization/deserialization, flavorsarea/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registryarea/scoring: MLflow Model server, model deployment tools, Spark UDFsarea/evaluation: MLflow model evaluation features, evaluation metrics, and evaluation workflowsarea/gateway: MLflow AI Gateway client APIs, server, and third-party integrationsarea/prompts: MLflow prompt engineering features, prompt templates, and prompt managementarea/tracing: MLflow Tracing features, tracing APIs, and LLM tracing functionalityarea/projects: MLproject format, project running backendsarea/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/build: Build and test infrastructure for MLflowarea/docs: MLflow documentation pagesHow should the PR be classified in the release notes? Choose one:
rn/none- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/breaking-change- The PR will be mentioned in the "Breaking Changes" sectionrn/feature- A new user-facing feature worth mentioning in the release notesrn/bug-fix- A user-facing bug fix worth mentioning in the release notesrn/documentation- A user-facing documentation change worth mentioning in the release notesShould this PR be included in the next patch release?
Yesshould be selected for bug fixes, documentation updates, and other small changes.Noshould be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.What is a minor/patch release?
Bug fixes, doc updates and new features usually go into minor releases.
Bug fixes and doc updates usually go into patch releases.