Prompt Optimization backend PR 2: Add CreatePromptOptimizationJob and CancelPromptOptimizationJob#20115
Conversation
🛠 DevTools 🛠
Install mlflow from this PRFor Databricks, use the following command: |
|
@chenmoneygithub Thank you for the contribution! Could you fix the following issue(s)? ⚠ DCO checkThe DCO check failed. Please sign off your commit(s) by following the instructions here. See https://github.com/mlflow/mlflow/blob/master/CONTRIBUTING.md#sign-your-work for more details. |
There was a problem hiding this comment.
Pull request overview
This PR adds backend support for prompt optimization jobs, introducing two new API endpoints: CreatePromptOptimizationJob and CancelPromptOptimizationJob. The implementation enables asynchronous prompt optimization with support for different optimizer types (GEPA, MetaPrompt) and includes both few-shot and zero-shot optimization modes.
Changes:
- Added new protobuf definitions for prompt optimization job APIs including JobStatus, OptimizerType, and PromptOptimizationJob message types
- Implemented server-side handlers for creating and canceling prompt optimization jobs with parameter validation and MLflow run tracking
- Extended optimization logic to support dataset entities and zero-shot optimization when no training data is provided
Reviewed changes
Copilot reviewed 9 out of 12 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| mlflow/protos/prompt_optimization.proto | New protobuf definitions for job status, optimizer types, and prompt optimization job entities |
| mlflow/protos/prompt_optimization_pb2.py | Generated Python protobuf code for prompt optimization messages |
| mlflow/protos/prompt_optimization_pb2.pyi | Generated Python type stubs for protobuf messages |
| mlflow/protos/service.proto | Added RPC definitions for createPromptOptimizationJob and cancelPromptOptimizationJob endpoints |
| mlflow/protos/service_pb2.pyi | Generated type stubs for new RPC messages |
| mlflow/server/handlers.py | Implemented _create_prompt_optimization_job and _cancel_prompt_optimization_job handlers with validation, run creation, and job submission |
| mlflow/genai/optimize/optimize.py | Added support for converting dataset entities to dataframes for optimization |
| mlflow/genai/optimize/job.py | Updated to support optional dataset_id for zero-shot optimization |
| tests/server/test_handlers.py | Added comprehensive tests for job creation, cancellation, and error cases |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Documentation preview for 737555f is available at: More info
|
| ) | ||
| job_result.dump(result_dump_path) | ||
| except Exception as e: | ||
| _logger.error( |
There was a problem hiding this comment.
Right now critical errors are also hidden from the user, which makes it really hard to debug, so I am adding this change. @WeichenXu123 Please let me know if this makes sense.
| dataset_id=dataset_id, | ||
| scorer_names=scorer_names, | ||
| ) | ||
| return asdict(job_result) |
There was a problem hiding this comment.
This is somehow required by mlflow job
There was a problem hiding this comment.
@WeichenXu123 is the requirement that all job result needs to be dict (or json serializable)?
There was a problem hiding this comment.
If so, shall update PromptOptimizationJobResult to TypedDict?
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 18 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| optional string experiment_id = 4; | ||
|
|
||
| // URI of the source prompt that optimization started from (e.g., "prompts:/my-prompt/1"). | ||
| optional string source_prompt_uri = 5; |
There was a problem hiding this comment.
What is the relationship between source_prompt_uri here and PromptOptimizationJobConfig.target_prompt_uri?
There was a problem hiding this comment.
this is a mistake, target_prompt_uri is meaningless, changed!
|
|
||
| // List of scorer names. Can be built-in scorer class names | ||
| // (e.g., "Correctness", "Safety") or registered scorer names. | ||
| repeated string scorers = 4; |
There was a problem hiding this comment.
q: how would we support Guidelines or ExpectationsGuidelines that accepts parameters?
There was a problem hiding this comment.
Yes good question, there are two ways we can support it:
- Require users to wrap the scorer in a custom scorer and register it in the MLflow experiment, in this approach the scorer will never need an arg.
- Change the scorer from string to a dict so that it can take in args.
With a first look that option 2 is more flexible, however this backend API is supposed to only be invoked by MLflow UI, and on MLflow UI it's a bit odd to let users configure the args for their scorer, because optimization -> scorer -> scorer args has many hops, which is not trivial to understand IMO. Please let me know your thoughts!
There was a problem hiding this comment.
Got it, I think #1 makes sense to start with. When we release this feature, let's make sure this expected flow is clearly documented!
mlflow/server/handlers.py
Outdated
| tags=[InputTag(key="mlflow.data.context", value="optimization")], | ||
| ) | ||
| tracking_store.log_inputs(run_id=run_id, datasets=[dataset_input]) | ||
| except Exception as e: |
There was a problem hiding this comment.
Which method of the try block do we expect to fail?
There was a problem hiding this comment.
mostly the get_genai_dataset, if there is network issue or the id is invalid (less likely if requested from the UI). Happy to delete the try-except as well!
There was a problem hiding this comment.
I see. Maybe we should handle differently for different exceptions? If dataset_id is invalid, we should raise an exception immediately, but if it's a temporally network error, we can still accepts the request.
There was a problem hiding this comment.
Makes sense! And given a second thought, I feel this try block is a bit redundant because it's not common for get_dataset to fail as we specify the dataset_id through UI, and it makes sense to stop the request handling if there is a dataset loading error.
mlflow/server/handlers.py
Outdated
|
|
||
| # Create MLflow run upfront so run_id is immediately available | ||
| # The job will resume this run when it starts executing | ||
| from mlflow.tracking.context.default_context import _get_user |
There was a problem hiding this comment.
Could we move this import to the module level?
There was a problem hiding this comment.
Yes, done! there were a few circular import cases, but this one is fine.
| @@ -250963,6 +250963,3680 @@ public org.mlflow.api.proto.Service.GetSecretsConfig getDefaultInstanceForType() | |||
|
|
|||
| } | |||
|
|
|||
| public interface CreatePromptOptimizationJobOrBuilder extends | |||
There was a problem hiding this comment.
are these java proto classes useful ?
There was a problem hiding this comment.
It's auto generated by the proto generation script, I actually have little knowledge on the current status of MLflow java server.
There was a problem hiding this comment.
mlflow does not have java server, the java proto classes are for mlflow java client , but the job optimization APIs are only called from UI . so these should be useless.
we might need to clean up these useless java proto classes and update the proto generation script.
this issue does not block the PR merging.
mlflow/server/handlers.py
Outdated
| user_id=_get_user(), | ||
| start_time=int(time.time() * 1000), | ||
| tags=[], | ||
| run_name=None, |
There was a problem hiding this comment.
why not set a meaningful run name ? like optimize_prompt_xxx ?
There was a problem hiding this comment.
good idea, changed!
| Raises: | ||
| MlflowException: If the proto value is unspecified or unsupported. | ||
| """ | ||
| if proto_value == OPTIMIZER_TYPE_UNSPECIFIED: |
There was a problem hiding this comment.
Let's use match statement
There was a problem hiding this comment.
actually match statement makes it less clean, since it needs to be:
case x if x == OPTIMIZER_TYPE_GEPA:
Because OPTIMIZER_TYPE_GEPA is a variable not a literal. Here is the claude code judgement:
⏺ I see - when matching against literal values (strings like "gzip", "deflate"), it works
directly. But for variables/constants, Python's match-case treats bare names as capture
patterns, not as value comparisons.
The issue is that OPTIMIZER_TYPE_GEPA is a variable, not a literal. In match-case, when you
write case OPTIMIZER_TYPE_GEPA:, Python interprets it as "capture the value into a new
variable named OPTIMIZER_TYPE_GEPA", not "compare against the existing variable
OPTIMIZER_TYPE_GEPA".
That's why the if-elif-else was reverted - it's the correct approach for matching against
variables/constants in Python. The match-case syntax would require using a guard (case x if
x == OPTIMIZER_TYPE_GEPA:) which is more verbose than the original if-elif.
So the current if-elif-else code is actually the right choice for this case. The reviewer's
suggestion to use match doesn't apply well here since we're matching against imported
constants, not literal values.
|
|
||
| // List of scorer names. Can be built-in scorer class names | ||
| // (e.g., "Correctness", "Safety") or registered scorer names. | ||
| repeated string scorers = 3; |
There was a problem hiding this comment.
Another question: how do distinguish built-in Correctness scorer and a custom scorer with name=Correctness? Maybe we need a flag to differentiate these?
There was a problem hiding this comment.
Good question. Right now the built-in scorer is chosen over the custom scorer if there is a name conflict. My honest preference is we forbid users register scorers of builtin names, and throw an exception when users try it.
TomeHirata
left a comment
There was a problem hiding this comment.
LGTM once these comments are addressed!
| start_time=start_time, | ||
| tags=[], | ||
| run_name=f"optimize_prompt_{optimizer_type}_{start_time}", | ||
| ) |
There was a problem hiding this comment.
why not attach the prompt name / version in the run name ? like optimize_prompt_{optimizer_type}_{prompt_name}_{prompt_version}_{start_time}
There was a problem hiding this comment.
good call, changed!
WeichenXu123
left a comment
There was a problem hiding this comment.
LGTM except one comment about run name.
|
|
||
| @catch_mlflow_exception | ||
| @_disable_if_artifacts_only | ||
| def _create_prompt_optimization_job(): |
There was a problem hiding this comment.
follow-up task: let's add permission validation rule for new added endpoints in mlflow/server/auth/__init__.py
Related Issues/PRs
#xxxWhat changes are proposed in this pull request?
Add CreatePromptOptimizationJob and CancelPromptOptimizationJob.
For testing purpose, please first clone this PR and spin up the mlflow server:
Then create a dataset via the following script, which will output a dataset id (or you can copy it from the MLflow UI):
Then copy the dataset id into the script below:
Feel free to change the optimizer type to play with GEPA and Metaprompting optimizer. I put a breakpoint in the second script so that you can wait for a while before trying the job cancellation.
How is this PR tested?
Does this PR require documentation update?
Release Notes
Is this a user-facing change?
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/tracking: Tracking Service, tracking client APIs, autologgingarea/models: MLmodel format, model serialization/deserialization, flavorsarea/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registryarea/scoring: MLflow Model server, model deployment tools, Spark UDFsarea/evaluation: MLflow model evaluation features, evaluation metrics, and evaluation workflowsarea/gateway: MLflow AI Gateway client APIs, server, and third-party integrationsarea/prompts: MLflow prompt engineering features, prompt templates, and prompt managementarea/tracing: MLflow Tracing features, tracing APIs, and LLM tracing functionalityarea/projects: MLproject format, project running backendsarea/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/build: Build and test infrastructure for MLflowarea/docs: MLflow documentation pagesHow should the PR be classified in the release notes? Choose one:
rn/none- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/breaking-change- The PR will be mentioned in the "Breaking Changes" sectionrn/feature- A new user-facing feature worth mentioning in the release notesrn/bug-fix- A user-facing bug fix worth mentioning in the release notesrn/documentation- A user-facing documentation change worth mentioning in the release notesShould this PR be included in the next patch release?
Yesshould be selected for bug fixes, documentation updates, and other small changes.Noshould be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.What is a minor/patch release?
Bug fixes, doc updates and new features usually go into minor releases.
Bug fixes and doc updates usually go into patch releases.