Conversation
e64ae77 to
10116cf
Compare
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
10116cf to
ea7af3e
Compare
|
Documentation preview for d41151a is available at: More info
|
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
| @@ -0,0 +1 @@ | |||
| """Adapters for different LLM providers used by judges.""" | |||
There was a problem hiding this comment.
Removed this docstring, but kept the init around
|
|
||
| from databricks.rag_eval import context, env_vars | ||
|
|
||
| env_vars.RAG_EVAL_EVAL_SESSION_CLIENT_NAME.set(f"mlflow-judge-optimizer-v{VERSION}") |
There was a problem hiding this comment.
Is this session name always correct? It seems this function is called by the _invoke_databricks_judge function as well, which is not related to judge optimizer.
There was a problem hiding this comment.
Good catch, I set a default here and set it specifically to the judge optimizer when called from an optimization function.
There was a problem hiding this comment.
Personally the most confusing part of the original implementation was we have three very similar functions:
_invoke_databricks_model_invoke_databricks_judge_invoke_databricks_judge_model
Is there any way to get out of this state?
There was a problem hiding this comment.
I did a rename of these to make it clearer so now it's much clearer -
_invoke_databricks_serving_endpoint_invoke_databricks_serving_endpoint_judge_invoke_databricks_default_judge
There was a problem hiding this comment.
Yea, it looks better. To make it even clearer, shall we separate managed judge and serving endpoint to different adapters? They are both Databricks, but the invocation logic is totally different.
There was a problem hiding this comment.
Good point! This will be even nicer once we do the refactor below to have a generic Adapter which is extended by all of these.
| _check_databricks_agents_installed, | ||
| _invoke_databricks_judge, | ||
| _invoke_databricks_judge_model, | ||
| _invoke_databricks_model, | ||
| _parse_databricks_judge_response, | ||
| _parse_databricks_model_response, | ||
| _record_judge_model_usage_failure_databricks_telemetry, | ||
| _record_judge_model_usage_success_databricks_telemetry, |
There was a problem hiding this comment.
[Non blocking] I love the adapter abstraction. However, it looks a bit of code smell if we need to expose all these private utilities after splitting them. Can we update adapter abstraction such that we don't need to expose them here?
Ideally we can refactor the logic to sth like this:
- Each adapter exposes one
invokepublic API. - Adapters also expose a helper function like
is_applicable(provider, model, ...)(or some factory returns which adapter to use). - The top level
invoke_judge_modelshould not contain any adapter specific logic.
Feel free to address this in a follow-up to reduce the size of change tho.
There was a problem hiding this comment.
I cleaned up this file, but it makes sense to have some common abstraction. Let me do this in a follow-up so it's easier to review.
|
/review ✅ Review completed. Review OutputPerfect! I've completed the PR review. Let me provide a summary: PR Review SummaryI've reviewed PR #18622 "Refactor Judge utils" which splits the large Issues FoundI identified 4 style guide violations and added review comments for each:
All issues are minor style improvements that would make the code more concise and aligned with the project's Python style guide. The refactoring itself appears well-structured, maintains backward compatibility through the |
| if isinstance(content, list): | ||
| text_content = None | ||
| for item in content: | ||
| if isinstance(item, dict) and item.get("type") == "text": | ||
| text_content = item.get("text") | ||
| break |
There was a problem hiding this comment.
Per the Python style guide, consider using next() instead of a loop-and-break pattern:
| if isinstance(content, list): | |
| text_content = None | |
| for item in content: | |
| if isinstance(item, dict) and item.get("type") == "text": | |
| text_content = item.get("text") | |
| break | |
| # Handle reasoning response (list of content items) | |
| if isinstance(content, list): | |
| text_content = next( | |
| (item.get("text") for item in content if isinstance(item, dict) and item.get("type") == "text"), | |
| None | |
| ) |
This is more concise and functional.
🤖 Generated with Claude Code
| assistant_msg = None | ||
| assistant_idx = None | ||
| for i, msg in enumerate(pruned_messages): | ||
| if msg.role == "assistant" and msg.tool_calls: | ||
| assistant_msg = msg | ||
| assistant_idx = i | ||
| break |
There was a problem hiding this comment.
Per the Python style guide, consider using next() with enumerate() instead of a loop-and-break pattern:
| assistant_msg = None | |
| assistant_idx = None | |
| for i, msg in enumerate(pruned_messages): | |
| if msg.role == "assistant" and msg.tool_calls: | |
| assistant_msg = msg | |
| assistant_idx = i | |
| break | |
| # Find first assistant message with tool calls | |
| result = next( | |
| ((i, msg) for i, msg in enumerate(pruned_messages) if msg.role == "assistant" and msg.tool_calls), | |
| None | |
| ) | |
| if result is None: | |
| break # No more tool calls to remove | |
| assistant_idx, assistant_msg = result |
This is more concise and functional while still capturing both the index and message.
🤖 Generated with Claude Code
| def _is_litellm_available() -> bool: | ||
| """Check if LiteLLM is available for import.""" |
There was a problem hiding this comment.
Per the Python style guide, this docstring is redundant as it merely restates what the function name already conveys. Consider removing it:
| def _is_litellm_available() -> bool: | |
| """Check if LiteLLM is available for import.""" | |
| def _is_litellm_available() -> bool: | |
| try: |
🤖 Generated with Claude Code
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
B-Step62
left a comment
There was a problem hiding this comment.
Looks good once #18622 (comment) is addressed. Let's refactor the logic a bit more in the follow-up to make it easier to maintain and prevent mistake/regression.
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
🛠 DevTools 🛠
Install mlflow from this PR
For Databricks, use the following command:
Related Issues/PRs
#xxxWhat changes are proposed in this pull request?
Refactoring the judge utils to split up the file (1200+ lines) to make further changes.
How is this PR tested?
Does this PR require documentation update?
Release Notes
Is this a user-facing change?
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/tracking: Tracking Service, tracking client APIs, autologgingarea/models: MLmodel format, model serialization/deserialization, flavorsarea/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registryarea/scoring: MLflow Model server, model deployment tools, Spark UDFsarea/evaluation: MLflow model evaluation features, evaluation metrics, and evaluation workflowsarea/gateway: MLflow AI Gateway client APIs, server, and third-party integrationsarea/prompts: MLflow prompt engineering features, prompt templates, and prompt managementarea/tracing: MLflow Tracing features, tracing APIs, and LLM tracing functionalityarea/projects: MLproject format, project running backendsarea/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/build: Build and test infrastructure for MLflowarea/docs: MLflow documentation pagesHow should the PR be classified in the release notes? Choose one:
rn/none- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/breaking-change- The PR will be mentioned in the "Breaking Changes" sectionrn/feature- A new user-facing feature worth mentioning in the release notesrn/bug-fix- A user-facing bug fix worth mentioning in the release notesrn/documentation- A user-facing documentation change worth mentioning in the release notesShould this PR be included in the next patch release?
Yesshould be selected for bug fixes, documentation updates, and other small changes.Noshould be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.What is a minor/patch release?
Bug fixes, doc updates and new features usually go into minor releases.
Bug fixes and doc updates usually go into patch releases.