Skip to content

Fix error initialization in Feedback #19340

Merged
alkispoly-db merged 7 commits intomlflow:masterfrom
alkispoly-db:mlflow-json-output-bug
Dec 13, 2025
Merged

Fix error initialization in Feedback #19340
alkispoly-db merged 7 commits intomlflow:masterfrom
alkispoly-db:mlflow-json-output-bug

Conversation

@alkispoly-db
Copy link
Collaborator

@alkispoly-db alkispoly-db commented Dec 11, 2025

🛠 DevTools 🛠

Open in GitHub Codespaces

Install mlflow from this PR

# mlflow
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/19340/merge
# mlflow-skinny
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/19340/merge#subdirectory=libs/skinny

For Databricks, use the following command:

%sh curl -LsSf https://raw.githubusercontent.com/mlflow/mlflow/HEAD/dev/install-skinny.sh | sh -s pull/19340/merge

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

This PR fixes an intermittent bug in assessment logging where Feedback objects with string error messages cause 'str' object has no attribute 'to_proto' errors during protobuf serialization.

Root Cause: The Databricks managed judge adapter passes string error messages to the Feedback constructor (e.g., error="Empty response from Databricks judge"). However, Feedback.__init__() only converted Exception objects to AssessmentError, allowing strings to pass through. When FeedbackValue.to_proto() later called error.to_proto() on the string, it crashed.

Solution: Fix the issue at the source by handling string errors in Feedback.__init__():

  • Convert string errors to AssessmentError objects with error code "ASSESSMENT_ERROR"
  • Continue converting Exception objects to AssessmentError with proper error code and stack trace
  • Reject invalid error types (int, list, dict) with clear error messages

Additional Changes:

  • Removed defensive validation code that was added as a workaround (286 lines deleted)
  • Removed unnecessary try-catch around individual assessment logging
  • Updated type hint for error parameter to include str

This approach is cleaner and more maintainable than defensive validation throughout the codebase.

How is this PR tested?

  • Existing unit/integration tests
  • New unit/integration tests (4 comprehensive tests added)

New Tests in tests/entities/test_assessment.py:

  1. test_feedback_converts_string_error_to_assessment_error - Verifies string-to-AssessmentError conversion
  2. test_feedback_converts_exception_error_to_assessment_error - Verifies Exception-to-AssessmentError with stack trace
  3. test_feedback_passes_through_assessment_error - Verifies AssessmentError objects pass through unchanged
  4. test_feedback_rejects_invalid_error_types - Verifies rejection of int, list, dict error types

Does this PR require documentation update?

  • No. You can skip the rest of this section.

Release Notes

Is this a user-facing change?

  • Yes. Give a description of this change to be included in the release notes for MLflow users.

Fixed assessment logging failures when using Databricks managed judges. Error messages from judges are now properly converted to AssessmentError objects, preventing serialization failures during evaluation.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/evaluation: MLflow model evaluation features, evaluation metrics, and evaluation workflows
  • area/tracing: MLflow Tracing features, tracing APIs, and LLM tracing functionality

How should the PR be classified in the release notes? Choose one:

  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes

Should this PR be included in the next patch release?

  • Yes (this PR will be cherry-picked and included in the next patch release)

🤖 Generated with Claude Code

Adds defensive type validation to handle cases where external packages
(like databricks-agents) return assessment source fields as strings
instead of proper AssessmentSource objects. This prevents 'str' object
has no attribute 'to_proto' errors during protobuf serialization.

The fix:
- Validates assessment types before logging
- Converts string sources to AssessmentSource objects automatically
- Handles nested source fields in feedback and expectation objects
- Logs warnings for malformed assessments without breaking evaluation
- Includes comprehensive test coverage (5 new unit tests)

This resolves an intermittent heisenbug where trace evaluation would
fail unpredictably, causing missing assessment data in both table and
JSON output formats.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>
@github-actions github-actions bot added v3.7.1 area/evaluation MLflow Evaluation area/tracing MLflow Tracing and its integrations rn/bug-fix Mention under Bug Fixes in Changelogs. labels Dec 11, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Dec 11, 2025

Documentation preview for d591aec is available at:

More info
  • Ignore this comment if this PR does not change the documentation.
  • The preview is updated when a new commit is pushed to this PR.
  • This comment was created by this workflow run.
  • The documentation was built by this workflow run.

alkispoly-db and others added 3 commits December 12, 2025 23:58
Extends the defensive validation from commit 5053be1 by adding
low-level type checking directly in Assessment.to_proto(). This protects
all code paths that convert assessments to proto format, not just the
evaluation harness.

Key improvements:
- Validates source, feedback.source, and expectation.source fields
- Automatically converts string sources to AssessmentSource objects
- Logs warnings when malformed fields are detected and fixed
- Raises clear TypeErrors for truly invalid field types

This fixes the heisenbug where 'str' object has no attribute 'to_proto'
errors occurred during trace conversion to DataFrame, which affected both
table and json output formats in `mlflow traces evaluate`.

Protected code paths:
- Trace conversion to DataFrame (trace.to_pandas_dataframe_row)
- Evaluation harness (genai.evaluate)
- Direct assessment logging (mlflow.log_assessment)
- All backend stores (Databricks, file, SQL, REST)

The validation happens at the lowest level (Assessment.to_proto) ensuring
defense-in-depth regardless of how assessments are created or retrieved.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>
Fixed AttributeError when Feedback objects with string error messages
were serialized to protobuf. The Databricks managed judge adapter was
passing string error messages to Feedback(), but Feedback.__init__()
only converted Exception objects to AssessmentError, causing
'str' object has no attribute 'to_proto' errors during serialization.

Changes:
- Added string-to-AssessmentError conversion in Feedback.__init__()
- Updated type hint to accept str in error parameter
- Removed defensive validation code from harness (173 lines)
- Added comprehensive tests for all error type scenarios

This simplifies the codebase by fixing the issue at the source rather
than using defensive validation throughout the logging pipeline.

Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>
Removed 5 tests from test_eval.py that tested the defensive source
validation functionality which was removed in the previous commit.
These tests are no longer relevant since the fix now handles string
errors at the source in Feedback.__init__() rather than using
defensive validation.

Removed tests:
- test_validate_assessment_types_with_string_source
- test_validate_assessment_types_with_proper_source
- test_validate_assessment_types_with_invalid_feedback
- test_validate_assessment_types_with_expectation_string_source
- test_log_assessments_with_string_source

Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>
@alkispoly-db alkispoly-db force-pushed the mlflow-json-output-bug branch from 3ef48b0 to e143dc3 Compare December 13, 2025 01:04
Simplified test_feedback_converts_exception_error_to_assessment_error
by directly passing the exception to Feedback instead of using a
try/except block. The test still verifies that Exception objects are
properly converted to AssessmentError with error code and stack trace.

Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>
@alkispoly-db alkispoly-db changed the title Fix assessment logging failure from malformed source fields Fix error initialization in Feedback Dec 13, 2025
Copy link
Collaborator

@AveshCSingh AveshCSingh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you for fixing this!

Updated tests to reflect the new behavior where string errors in Feedback
are automatically converted to AssessmentError objects. Tests were checking
for string errors, but now need to verify AssessmentError objects with the
correct error_message field.

Fixed tests:
- test_sanitize_scorer_feedback_preserves_empty_string
- test_databricks_model_handles_errors_gracefully (3 error scenarios)

Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>
@alkispoly-db alkispoly-db requested a review from harupy December 13, 2025 04:26
@harupy harupy requested a review from Copilot December 13, 2025 05:10
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an intermittent bug in assessment logging where Feedback objects initialized with string error messages would cause 'str' object has no attribute 'to_proto' errors during protobuf serialization. The fix ensures that all error types (strings, Exceptions, and AssessmentError objects) are properly converted to AssessmentError instances in the Feedback.__init__() method.

Key Changes

  • Modified Feedback.__init__() to convert string errors to AssessmentError objects with error code "ASSESSMENT_ERROR"
  • Updated the type hint for the error parameter to include str type
  • Added validation to reject invalid error types (int, list, dict) with clear error messages
  • Added comprehensive test coverage for string, Exception, and AssessmentError handling

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
mlflow/entities/assessment.py Core fix: Added string error handling and type validation in Feedback.__init__(), updated type hints
tests/entities/test_assessment.py Added 4 comprehensive tests covering string-to-AssessmentError conversion, Exception handling, AssessmentError pass-through, and invalid type rejection
tests/genai/judges/test_make_judge.py Updated existing tests to verify that string errors from Databricks judges are properly converted to AssessmentError objects
tests/genai/judges/test_builtin.py Updated test to verify string errors are converted to AssessmentError with proper error_message and error_code fields

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Remove unused logging import and _logger variable from assessment.py
- Remove unnecessary blank line in Feedback.from_proto()
- Move AssessmentError import to top of test_make_judge.py for consistency

Addresses feedback from @harupy and @Copilot

Signed-off-by: Alkis Polyzotis <alkis.polyzotis@databricks.com>
@alkispoly-db alkispoly-db requested a review from harupy December 13, 2025 06:25
Comment on lines +610 to +627
def test_feedback_rejects_invalid_error_types():
# Test with integer
with pytest.raises(
MlflowException, match="'error' must be an Exception, AssessmentError, or string"
):
Feedback(name="test", error=123)

# Test with list
with pytest.raises(
MlflowException, match="'error' must be an Exception, AssessmentError, or string"
):
Feedback(name="test", error=["error"])

# Test with dict
with pytest.raises(
MlflowException, match="'error' must be an Exception, AssessmentError, or string"
):
Feedback(name="test", error={"error": "message"})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use pytest.mark.parametrize

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for missing this -- will send you a follow-up PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@harupy harupy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alkispoly-db alkispoly-db added this pull request to the merge queue Dec 13, 2025
Merged via the queue into mlflow:master with commit 6eb876c Dec 13, 2025
52 of 54 checks passed
@alkispoly-db alkispoly-db deleted the mlflow-json-output-bug branch December 13, 2025 09:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/evaluation MLflow Evaluation area/tracing MLflow Tracing and its integrations rn/bug-fix Mention under Bug Fixes in Changelogs. v3.7.1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants