Refactor Judge utils by smoorjani · Pull Request #18622 · mlflow/mlflow

smoorjani · 2025-11-03T01:15:23Z

🛠 DevTools 🛠

Install mlflow from this PR

# mlflow
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/18622/merge
# mlflow-skinny
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/18622/merge#subdirectory=libs/skinny

For Databricks, use the following command:

%sh curl -LsSf https://raw.githubusercontent.com/mlflow/mlflow/HEAD/dev/install-skinny.sh | sh -s pull/18622/merge

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

Refactoring the judge utils to split up the file (1200+ lines) to make further changes.

How is this PR tested?

Existing unit/integration tests
New unit/integration tests
Manual tests

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

How should the PR be classified in the release notes? Choose one:

rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

Should this PR be included in the next patch release?

Yes should be selected for bug fixes, documentation updates, and other small changes. No should be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.

What is a minor/patch release?

Minor release: a release that increments the second part of the version number (e.g., 1.2.0 -> 1.3.0).
Bug fixes, doc updates and new features usually go into minor releases.
Patch release: a release that increments the third part of the version number (e.g., 1.2.0 -> 1.2.1).
Bug fixes and doc updates usually go into patch releases.

Yes (this PR will be cherry-picked and included in the next patch release)
No (this PR will be included in the next minor release)

Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

github-actions · 2025-11-03T20:02:52Z

Documentation preview for d41151a is available at:

https://pr-18622--mlflow-docs-preview.netlify.app/docs/latest/

More info

Ignore this comment if this PR does not change the documentation.
The preview is updated when a new commit is pushed to this PR.
This comment was created by this workflow run.
The documentation was built by this workflow run.

Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

B-Step62 · 2025-11-04T00:52:42Z

mlflow/genai/judges/adapters/__init__.py

@@ -0,0 +1 @@
+"""Adapters for different LLM providers used by judges."""


Do we need this file?

Removed this docstring, but kept the init around

B-Step62 · 2025-11-04T00:54:11Z

mlflow/genai/judges/adapters/databricks_adapter.py

+
+    from databricks.rag_eval import context, env_vars
+
+    env_vars.RAG_EVAL_EVAL_SESSION_CLIENT_NAME.set(f"mlflow-judge-optimizer-v{VERSION}")


Is this session name always correct? It seems this function is called by the _invoke_databricks_judge function as well, which is not related to judge optimizer.

Good catch, I set a default here and set it specifically to the judge optimizer when called from an optimization function.

mlflow/genai/judges/adapters/databricks_adapter.py

B-Step62 · 2025-11-04T00:59:48Z

mlflow/genai/judges/adapters/databricks_serving_endpoint_adapter.py

Personally the most confusing part of the original implementation was we have three very similar functions:

_invoke_databricks_model

_invoke_databricks_judge

_invoke_databricks_judge_model

Is there any way to get out of this state?

I did a rename of these to make it clearer so now it's much clearer -

_invoke_databricks_serving_endpoint

_invoke_databricks_serving_endpoint_judge

_invoke_databricks_default_judge

Yea, it looks better. To make it even clearer, shall we separate managed judge and serving endpoint to different adapters? They are both Databricks, but the invocation logic is totally different.

Good point! This will be even nicer once we do the refactor below to have a generic Adapter which is extended by all of these.

B-Step62 · 2025-11-04T01:02:02Z

mlflow/genai/judges/utils/__init__.py

+    _check_databricks_agents_installed,
+    _invoke_databricks_judge,
+    _invoke_databricks_judge_model,
+    _invoke_databricks_model,
+    _parse_databricks_judge_response,
+    _parse_databricks_model_response,
+    _record_judge_model_usage_failure_databricks_telemetry,
+    _record_judge_model_usage_success_databricks_telemetry,


[Non blocking] I love the adapter abstraction. However, it looks a bit of code smell if we need to expose all these private utilities after splitting them. Can we update adapter abstraction such that we don't need to expose them here?

Ideally we can refactor the logic to sth like this:

Each adapter exposes one invoke public API.

Adapters also expose a helper function like is_applicable(provider, model, ...) (or some factory returns which adapter to use).

The top level invoke_judge_model should not contain any adapter specific logic.

Feel free to address this in a follow-up to reduce the size of change tho.

I cleaned up this file, but it makes sense to have some common abstraction. Let me do this in a follow-up so it's easier to review.

mlflow/genai/judges/utils/parsing_utils.py

Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

harupy · 2025-11-05T06:26:54Z

/review

🚀 Review workflow started

✅ Review completed.

Review Output

Perfect! I've completed the PR review. Let me provide a summary:

PR Review Summary

I've reviewed PR #18622 "Refactor Judge utils" which splits the large judge/utils.py file (1200+ lines) into a more organized module structure with adapters and utility submodules.

Issues Found

I identified 4 style guide violations and added review comments for each:

databricks_adapter.py:220-225 - Loop-and-break pattern that should use next() for finding text content in reasoning responses
litellm_adapter.py:318-324 - Loop-and-break pattern that should use next() with enumerate() for finding assistant messages with tool calls
litellm_adapter.py:42-43 - Redundant docstring that merely restates the function name _is_litellm_available
invocation_utils.py:42-43 - Redundant docstring that merely restates the function name _is_litellm_available

All issues are minor style improvements that would make the code more concise and aligned with the project's Python style guide. The refactoring itself appears well-structured, maintains backward compatibility through the __init__.py exports, and the test files follow proper testing practices with mock assertions and parametrization.

github-actions · 2025-11-05T06:29:38Z

mlflow/genai/judges/adapters/databricks_adapter.py

+    if isinstance(content, list):
+        text_content = None
+        for item in content:
+            if isinstance(item, dict) and item.get("type") == "text":
+                text_content = item.get("text")
+                break


Per the Python style guide, consider using next() instead of a loop-and-break pattern:

Suggested change

if isinstance(content, list):

text_content = None

for item in content:

if isinstance(item, dict) and item.get("type") == "text":

text_content = item.get("text")

break

# Handle reasoning response (list of content items)

if isinstance(content, list):

text_content = next(

(item.get("text") for item in content if isinstance(item, dict) and item.get("type") == "text"),

None

)

This is more concise and functional.

🤖 Generated with Claude Code

github-actions · 2025-11-05T06:29:46Z

mlflow/genai/judges/adapters/litellm_adapter.py

+        assistant_msg = None
+        assistant_idx = None
+        for i, msg in enumerate(pruned_messages):
+            if msg.role == "assistant" and msg.tool_calls:
+                assistant_msg = msg
+                assistant_idx = i
+                break


Per the Python style guide, consider using next() with enumerate() instead of a loop-and-break pattern:

Suggested change

assistant_msg = None

assistant_idx = None

for i, msg in enumerate(pruned_messages):

if msg.role == "assistant" and msg.tool_calls:

assistant_msg = msg

assistant_idx = i

break

# Find first assistant message with tool calls

result = next(

((i, msg) for i, msg in enumerate(pruned_messages) if msg.role == "assistant" and msg.tool_calls),

None

)

if result is None:

break # No more tool calls to remove

assistant_idx, assistant_msg = result

This is more concise and functional while still capturing both the index and message.

🤖 Generated with Claude Code

github-actions · 2025-11-05T06:30:01Z

mlflow/genai/judges/utils/invocation_utils.py

+def _is_litellm_available() -> bool:
+    """Check if LiteLLM is available for import."""


Per the Python style guide, this docstring is redundant as it merely restates what the function name already conveys. Consider removing it:

Suggested change

def _is_litellm_available() -> bool:

"""Check if LiteLLM is available for import."""

def _is_litellm_available() -> bool:

try:

🤖 Generated with Claude Code

Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

B-Step62

Looks good once #18622 (comment) is addressed. Let's refactor the logic a bit more in the follow-up to make it easier to maintain and prevent mistake/regression.

Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

github-actions bot added area/evaluation MLflow Evaluation rn/none List under Small Changes in Changelogs. v3.5.2 labels Nov 3, 2025

smoorjani force-pushed the refactor-judge-utils branch from e64ae77 to 10116cf Compare November 3, 2025 16:41

pure refactor off of master

ea7af3e

Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

smoorjani force-pushed the refactor-judge-utils branch from 10116cf to ea7af3e Compare November 3, 2025 19:51

fix unit tests

8101347

Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

smoorjani requested review from B-Step62, dbczumar and harupy November 3, 2025 21:51

fixes

6951393

Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

B-Step62 reviewed Nov 4, 2025

View reviewed changes

smoorjani added 3 commits November 3, 2025 17:59

.

6b2b995

Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

address pr feedback

d148bdf

Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

rename

cf82e9a

Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

smoorjani requested a review from B-Step62 November 4, 2025 19:35

fix test

5eb825f

Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

github-actions bot reviewed Nov 5, 2025

View reviewed changes

mlflow deleted a comment from github-actions bot Nov 5, 2025

smoorjani and others added 2 commits November 5, 2025 17:17

address review comments

757ceae

Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

Merge branch 'master' into refactor-judge-utils

181f4d7

B-Step62 approved these changes Nov 6, 2025

View reviewed changes

address PR feedback

c2bb79a

Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

smoorjani enabled auto-merge November 6, 2025 18:29

.

d41151a

Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

smoorjani added this pull request to the merge queue Nov 6, 2025

Merged via the queue into mlflow:master with commit 3a2ee2d Nov 6, 2025
64 of 66 checks passed

smoorjani deleted the refactor-judge-utils branch November 6, 2025 21:06

B-Step62 added the v3.6.0 label Nov 7, 2025

B-Step62 pushed a commit to B-Step62/mlflow that referenced this pull request Nov 7, 2025

Refactor Judge utils (mlflow#18622)

931d999

Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

github-actions bot added v3.6.1 and removed v3.6.0 labels Nov 8, 2025

B-Step62 pushed a commit to B-Step62/mlflow that referenced this pull request Nov 11, 2025

Refactor Judge utils (mlflow#18622)

5a34fff

Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

B-Step62 pushed a commit that referenced this pull request Nov 11, 2025

Refactor Judge utils (#18622)

286d06d

Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

B-Step62 added v3.6.0 and removed v3.6.1 labels Nov 11, 2025

		@@ -0,0 +1 @@
		"""Adapters for different LLM providers used by judges."""


		from databricks.rag_eval import context, env_vars

		env_vars.RAG_EVAL_EVAL_SESSION_CLIENT_NAME.set(f"mlflow-judge-optimizer-v{VERSION}")

		def _is_litellm_available() -> bool:
		"""Check if LiteLLM is available for import."""

Conversation

smoorjani commented Nov 3, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Install mlflow from this PR

Related Issues/PRs

What changes are proposed in this pull request?

How is this PR tested?

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

Should this PR be included in the next patch release?

Uh oh!

github-actions bot commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

harupy commented Nov 5, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Summary

Issues Found

Uh oh!

github-actions bot Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

B-Step62 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

smoorjani commented Nov 3, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Nov 3, 2025 •

edited

Loading

harupy commented Nov 5, 2025 •

edited by github-actions bot

Loading