Fix error initialization in Feedback #19340
Conversation
Adds defensive type validation to handle cases where external packages (like databricks-agents) return assessment source fields as strings instead of proper AssessmentSource objects. This prevents 'str' object has no attribute 'to_proto' errors during protobuf serialization. The fix: - Validates assessment types before logging - Converts string sources to AssessmentSource objects automatically - Handles nested source fields in feedback and expectation objects - Logs warnings for malformed assessments without breaking evaluation - Includes comprehensive test coverage (5 new unit tests) This resolves an intermittent heisenbug where trace evaluation would fail unpredictably, causing missing assessment data in both table and JSON output formats. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>
|
Documentation preview for d591aec is available at: More info
|
Extends the defensive validation from commit 5053be1 by adding low-level type checking directly in Assessment.to_proto(). This protects all code paths that convert assessments to proto format, not just the evaluation harness. Key improvements: - Validates source, feedback.source, and expectation.source fields - Automatically converts string sources to AssessmentSource objects - Logs warnings when malformed fields are detected and fixed - Raises clear TypeErrors for truly invalid field types This fixes the heisenbug where 'str' object has no attribute 'to_proto' errors occurred during trace conversion to DataFrame, which affected both table and json output formats in `mlflow traces evaluate`. Protected code paths: - Trace conversion to DataFrame (trace.to_pandas_dataframe_row) - Evaluation harness (genai.evaluate) - Direct assessment logging (mlflow.log_assessment) - All backend stores (Databricks, file, SQL, REST) The validation happens at the lowest level (Assessment.to_proto) ensuring defense-in-depth regardless of how assessments are created or retrieved. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>
Fixed AttributeError when Feedback objects with string error messages were serialized to protobuf. The Databricks managed judge adapter was passing string error messages to Feedback(), but Feedback.__init__() only converted Exception objects to AssessmentError, causing 'str' object has no attribute 'to_proto' errors during serialization. Changes: - Added string-to-AssessmentError conversion in Feedback.__init__() - Updated type hint to accept str in error parameter - Removed defensive validation code from harness (173 lines) - Added comprehensive tests for all error type scenarios This simplifies the codebase by fixing the issue at the source rather than using defensive validation throughout the logging pipeline. Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>
Removed 5 tests from test_eval.py that tested the defensive source validation functionality which was removed in the previous commit. These tests are no longer relevant since the fix now handles string errors at the source in Feedback.__init__() rather than using defensive validation. Removed tests: - test_validate_assessment_types_with_string_source - test_validate_assessment_types_with_proper_source - test_validate_assessment_types_with_invalid_feedback - test_validate_assessment_types_with_expectation_string_source - test_log_assessments_with_string_source Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>
3ef48b0 to
e143dc3
Compare
Simplified test_feedback_converts_exception_error_to_assessment_error by directly passing the exception to Feedback instead of using a try/except block. The test still verifies that Exception objects are properly converted to AssessmentError with error code and stack trace. Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>
AveshCSingh
left a comment
There was a problem hiding this comment.
LGTM. Thank you for fixing this!
Updated tests to reflect the new behavior where string errors in Feedback are automatically converted to AssessmentError objects. Tests were checking for string errors, but now need to verify AssessmentError objects with the correct error_message field. Fixed tests: - test_sanitize_scorer_feedback_preserves_empty_string - test_databricks_model_handles_errors_gracefully (3 error scenarios) Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes an intermittent bug in assessment logging where Feedback objects initialized with string error messages would cause 'str' object has no attribute 'to_proto' errors during protobuf serialization. The fix ensures that all error types (strings, Exceptions, and AssessmentError objects) are properly converted to AssessmentError instances in the Feedback.__init__() method.
Key Changes
- Modified
Feedback.__init__()to convert string errors toAssessmentErrorobjects with error code"ASSESSMENT_ERROR" - Updated the type hint for the
errorparameter to includestrtype - Added validation to reject invalid error types (int, list, dict) with clear error messages
- Added comprehensive test coverage for string, Exception, and AssessmentError handling
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
mlflow/entities/assessment.py |
Core fix: Added string error handling and type validation in Feedback.__init__(), updated type hints |
tests/entities/test_assessment.py |
Added 4 comprehensive tests covering string-to-AssessmentError conversion, Exception handling, AssessmentError pass-through, and invalid type rejection |
tests/genai/judges/test_make_judge.py |
Updated existing tests to verify that string errors from Databricks judges are properly converted to AssessmentError objects |
tests/genai/judges/test_builtin.py |
Updated test to verify string errors are converted to AssessmentError with proper error_message and error_code fields |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Remove unused logging import and _logger variable from assessment.py - Remove unnecessary blank line in Feedback.from_proto() - Move AssessmentError import to top of test_make_judge.py for consistency Addresses feedback from @harupy and @Copilot Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>
| def test_feedback_rejects_invalid_error_types(): | ||
| # Test with integer | ||
| with pytest.raises( | ||
| MlflowException, match="'error' must be an Exception, AssessmentError, or string" | ||
| ): | ||
| Feedback(name="test", error=123) | ||
|
|
||
| # Test with list | ||
| with pytest.raises( | ||
| MlflowException, match="'error' must be an Exception, AssessmentError, or string" | ||
| ): | ||
| Feedback(name="test", error=["error"]) | ||
|
|
||
| # Test with dict | ||
| with pytest.raises( | ||
| MlflowException, match="'error' must be an Exception, AssessmentError, or string" | ||
| ): | ||
| Feedback(name="test", error={"error": "message"}) |
There was a problem hiding this comment.
let's use pytest.mark.parametrize
There was a problem hiding this comment.
Apologies for missing this -- will send you a follow-up PR.
harupy
left a comment
There was a problem hiding this comment.
LGTM once https://github.com/mlflow/mlflow/pull/19340/files#r2616202724 is addressed!
🛠 DevTools 🛠
Install mlflow from this PR
For Databricks, use the following command:
Related Issues/PRs
#xxxWhat changes are proposed in this pull request?
This PR fixes an intermittent bug in assessment logging where
Feedbackobjects with string error messages cause'str' object has no attribute 'to_proto'errors during protobuf serialization.Root Cause: The Databricks managed judge adapter passes string error messages to the
Feedbackconstructor (e.g.,error="Empty response from Databricks judge"). However,Feedback.__init__()only convertedExceptionobjects toAssessmentError, allowing strings to pass through. WhenFeedbackValue.to_proto()later callederror.to_proto()on the string, it crashed.Solution: Fix the issue at the source by handling string errors in
Feedback.__init__():AssessmentErrorobjects with error code"ASSESSMENT_ERROR"Exceptionobjects toAssessmentErrorwith proper error code and stack traceAdditional Changes:
errorparameter to includestrThis approach is cleaner and more maintainable than defensive validation throughout the codebase.
How is this PR tested?
New Tests in
tests/entities/test_assessment.py:test_feedback_converts_string_error_to_assessment_error- Verifies string-to-AssessmentError conversiontest_feedback_converts_exception_error_to_assessment_error- Verifies Exception-to-AssessmentError with stack tracetest_feedback_passes_through_assessment_error- Verifies AssessmentError objects pass through unchangedtest_feedback_rejects_invalid_error_types- Verifies rejection of int, list, dict error typesDoes this PR require documentation update?
Release Notes
Is this a user-facing change?
Fixed assessment logging failures when using Databricks managed judges. Error messages from judges are now properly converted to AssessmentError objects, preventing serialization failures during evaluation.
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/evaluation: MLflow model evaluation features, evaluation metrics, and evaluation workflowsarea/tracing: MLflow Tracing features, tracing APIs, and LLM tracing functionalityHow should the PR be classified in the release notes? Choose one:
rn/bug-fix- A user-facing bug fix worth mentioning in the release notesShould this PR be included in the next patch release?
🤖 Generated with Claude Code