Fix error initialization in Feedback by alkispoly-db · Pull Request #19340 · mlflow/mlflow

alkispoly-db · 2025-12-11T20:25:16Z

🛠 DevTools 🛠

Install mlflow from this PR

# mlflow
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/19340/merge
# mlflow-skinny
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/19340/merge#subdirectory=libs/skinny

For Databricks, use the following command:

%sh curl -LsSf https://raw.githubusercontent.com/mlflow/mlflow/HEAD/dev/install-skinny.sh | sh -s pull/19340/merge

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

This PR fixes an intermittent bug in assessment logging where Feedback objects with string error messages cause 'str' object has no attribute 'to_proto' errors during protobuf serialization.

Root Cause: The Databricks managed judge adapter passes string error messages to the Feedback constructor (e.g., error="Empty response from Databricks judge"). However, Feedback.__init__() only converted Exception objects to AssessmentError, allowing strings to pass through. When FeedbackValue.to_proto() later called error.to_proto() on the string, it crashed.

Solution: Fix the issue at the source by handling string errors in Feedback.__init__():

Convert string errors to AssessmentError objects with error code "ASSESSMENT_ERROR"
Continue converting Exception objects to AssessmentError with proper error code and stack trace
Reject invalid error types (int, list, dict) with clear error messages

Additional Changes:

Removed defensive validation code that was added as a workaround (286 lines deleted)
Removed unnecessary try-catch around individual assessment logging
Updated type hint for error parameter to include str

This approach is cleaner and more maintainable than defensive validation throughout the codebase.

How is this PR tested?

Existing unit/integration tests
New unit/integration tests (4 comprehensive tests added)

New Tests in tests/entities/test_assessment.py:

test_feedback_converts_string_error_to_assessment_error - Verifies string-to-AssessmentError conversion
test_feedback_converts_exception_error_to_assessment_error - Verifies Exception-to-AssessmentError with stack trace
test_feedback_passes_through_assessment_error - Verifies AssessmentError objects pass through unchanged
test_feedback_rejects_invalid_error_types - Verifies rejection of int, list, dict error types

Does this PR require documentation update?

No. You can skip the rest of this section.

Release Notes

Is this a user-facing change?

Yes. Give a description of this change to be included in the release notes for MLflow users.

Fixed assessment logging failures when using Databricks managed judges. Error messages from judges are now properly converted to AssessmentError objects, preventing serialization failures during evaluation.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

area/evaluation: MLflow model evaluation features, evaluation metrics, and evaluation workflows
area/tracing: MLflow Tracing features, tracing APIs, and LLM tracing functionality

How should the PR be classified in the release notes? Choose one:

rn/bug-fix - A user-facing bug fix worth mentioning in the release notes

Should this PR be included in the next patch release?

Yes (this PR will be cherry-picked and included in the next patch release)

🤖 Generated with Claude Code

Adds defensive type validation to handle cases where external packages (like databricks-agents) return assessment source fields as strings instead of proper AssessmentSource objects. This prevents 'str' object has no attribute 'to_proto' errors during protobuf serialization. The fix: - Validates assessment types before logging - Converts string sources to AssessmentSource objects automatically - Handles nested source fields in feedback and expectation objects - Logs warnings for malformed assessments without breaking evaluation - Includes comprehensive test coverage (5 new unit tests) This resolves an intermittent heisenbug where trace evaluation would fail unpredictably, causing missing assessment data in both table and JSON output formats. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>

github-actions · 2025-12-11T20:34:10Z

Documentation preview for d591aec is available at:

https://pr-19340--mlflow-docs-preview.netlify.app/docs/latest/

More info

Ignore this comment if this PR does not change the documentation.
The preview is updated when a new commit is pushed to this PR.
This comment was created by this workflow run.
The documentation was built by this workflow run.

Extends the defensive validation from commit 5053be1 by adding low-level type checking directly in Assessment.to_proto(). This protects all code paths that convert assessments to proto format, not just the evaluation harness. Key improvements: - Validates source, feedback.source, and expectation.source fields - Automatically converts string sources to AssessmentSource objects - Logs warnings when malformed fields are detected and fixed - Raises clear TypeErrors for truly invalid field types This fixes the heisenbug where 'str' object has no attribute 'to_proto' errors occurred during trace conversion to DataFrame, which affected both table and json output formats in `mlflow traces evaluate`. Protected code paths: - Trace conversion to DataFrame (trace.to_pandas_dataframe_row) - Evaluation harness (genai.evaluate) - Direct assessment logging (mlflow.log_assessment) - All backend stores (Databricks, file, SQL, REST) The validation happens at the lowest level (Assessment.to_proto) ensuring defense-in-depth regardless of how assessments are created or retrieved. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>

Fixed AttributeError when Feedback objects with string error messages were serialized to protobuf. The Databricks managed judge adapter was passing string error messages to Feedback(), but Feedback.__init__() only converted Exception objects to AssessmentError, causing 'str' object has no attribute 'to_proto' errors during serialization. Changes: - Added string-to-AssessmentError conversion in Feedback.__init__() - Updated type hint to accept str in error parameter - Removed defensive validation code from harness (173 lines) - Added comprehensive tests for all error type scenarios This simplifies the codebase by fixing the issue at the source rather than using defensive validation throughout the logging pipeline. Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>

Removed 5 tests from test_eval.py that tested the defensive source validation functionality which was removed in the previous commit. These tests are no longer relevant since the fix now handles string errors at the source in Feedback.__init__() rather than using defensive validation. Removed tests: - test_validate_assessment_types_with_string_source - test_validate_assessment_types_with_proper_source - test_validate_assessment_types_with_invalid_feedback - test_validate_assessment_types_with_expectation_string_source - test_log_assessments_with_string_source Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>

Simplified test_feedback_converts_exception_error_to_assessment_error by directly passing the exception to Feedback instead of using a try/except block. The test still verifies that Exception objects are properly converted to AssessmentError with error code and stack trace. Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>

AveshCSingh

LGTM. Thank you for fixing this!

Updated tests to reflect the new behavior where string errors in Feedback are automatically converted to AssessmentError objects. Tests were checking for string errors, but now need to verify AssessmentError objects with the correct error_message field. Fixed tests: - test_sanitize_scorer_feedback_preserves_empty_string - test_databricks_model_handles_errors_gracefully (3 error scenarios) Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>

mlflow/entities/assessment.py

Copilot

Pull request overview

This PR fixes an intermittent bug in assessment logging where Feedback objects initialized with string error messages would cause 'str' object has no attribute 'to_proto' errors during protobuf serialization. The fix ensures that all error types (strings, Exceptions, and AssessmentError objects) are properly converted to AssessmentError instances in the Feedback.__init__() method.

Key Changes

Modified Feedback.__init__() to convert string errors to AssessmentError objects with error code "ASSESSMENT_ERROR"
Updated the type hint for the error parameter to include str type
Added validation to reject invalid error types (int, list, dict) with clear error messages
Added comprehensive test coverage for string, Exception, and AssessmentError handling

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
`mlflow/entities/assessment.py`	Core fix: Added string error handling and type validation in `Feedback.__init__()`, updated type hints
`tests/entities/test_assessment.py`	Added 4 comprehensive tests covering string-to-AssessmentError conversion, Exception handling, AssessmentError pass-through, and invalid type rejection
`tests/genai/judges/test_make_judge.py`	Updated existing tests to verify that string errors from Databricks judges are properly converted to AssessmentError objects
`tests/genai/judges/test_builtin.py`	Updated test to verify string errors are converted to AssessmentError with proper error_message and error_code fields

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/genai/judges/test_make_judge.py

mlflow/entities/assessment.py

@harupy

- Remove unused logging import and _logger variable from assessment.py - Remove unnecessary blank line in Feedback.from_proto() - Move AssessmentError import to top of test_make_judge.py for consistency Addresses feedback from @harupy and @Copilot Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>

harupy · 2025-12-13T09:44:10Z

tests/entities/test_assessment.py

+def test_feedback_rejects_invalid_error_types():
+    # Test with integer
+    with pytest.raises(
+        MlflowException, match="'error' must be an Exception, AssessmentError, or string"
+    ):
+        Feedback(name="test", error=123)
+
+    # Test with list
+    with pytest.raises(
+        MlflowException, match="'error' must be an Exception, AssessmentError, or string"
+    ):
+        Feedback(name="test", error=["error"])
+
+    # Test with dict
+    with pytest.raises(
+        MlflowException, match="'error' must be an Exception, AssessmentError, or string"
+    ):
+        Feedback(name="test", error={"error": "message"})


let's use pytest.mark.parametrize

Apologies for missing this -- will send you a follow-up PR.

harupy

LGTM once https://github.com/mlflow/mlflow/pull/19340/files#r2616202724 is addressed!

github-actions bot added v3.7.1 area/evaluation MLflow Evaluation area/tracing MLflow Tracing and its integrations rn/bug-fix Mention under Bug Fixes in Changelogs. labels Dec 11, 2025

alkispoly-db and others added 3 commits December 12, 2025 23:58

alkispoly-db force-pushed the mlflow-json-output-bug branch from 3ef48b0 to e143dc3 Compare December 13, 2025 01:04

alkispoly-db requested review from AveshCSingh and smoorjani December 13, 2025 01:11

alkispoly-db changed the title ~~Fix assessment logging failure from malformed source fields~~ Fix error initialization in Feedback Dec 13, 2025

AveshCSingh approved these changes Dec 13, 2025

View reviewed changes

alkispoly-db enabled auto-merge December 13, 2025 02:01

alkispoly-db requested a review from harupy December 13, 2025 04:26

harupy requested a review from Copilot December 13, 2025 05:10

harupy reviewed Dec 13, 2025

View reviewed changes

mlflow/entities/assessment.py Outdated Show resolved Hide resolved

Copilot started reviewing on behalf of harupy December 13, 2025 05:10 View session

harupy reviewed Dec 13, 2025

View reviewed changes

mlflow/entities/assessment.py Outdated Show resolved Hide resolved

Copilot AI reviewed Dec 13, 2025

View reviewed changes

tests/genai/judges/test_make_judge.py Outdated Show resolved Hide resolved

mlflow/entities/assessment.py Outdated Show resolved Hide resolved

github-actions bot assigned harupy Dec 13, 2025

alkispoly-db requested a review from harupy December 13, 2025 06:25

harupy reviewed Dec 13, 2025

View reviewed changes

harupy approved these changes Dec 13, 2025

View reviewed changes

alkispoly-db added this pull request to the merge queue Dec 13, 2025

Merged via the queue into mlflow:master with commit 6eb876c Dec 13, 2025
52 of 54 checks passed

alkispoly-db deleted the mlflow-json-output-bug branch December 13, 2025 09:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix error initialization in Feedback #19340

Fix error initialization in Feedback #19340
alkispoly-db merged 7 commits intomlflow:masterfrom
alkispoly-db:mlflow-json-output-bug

alkispoly-db commented Dec 11, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 11, 2025 •

edited

Loading

Uh oh!

AveshCSingh left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

harupy Dec 13, 2025

Uh oh!

alkispoly-db Dec 13, 2025

Uh oh!

alkispoly-db Dec 13, 2025

Uh oh!

harupy left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

alkispoly-db commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Install mlflow from this PR

Related Issues/PRs

What changes are proposed in this pull request?

How is this PR tested?

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

Should this PR be included in the next patch release?

Uh oh!

github-actions bot commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AveshCSingh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

harupy Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

alkispoly-db Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

alkispoly-db Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

harupy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

alkispoly-db commented Dec 11, 2025 •

edited

Loading

github-actions bot commented Dec 11, 2025 •

edited

Loading