Fix Claude Agent SDK tracing by capturing messages from receive_messages#20778
Fix Claude Agent SDK tracing by capturing messages from receive_messages#20778smoorjani merged 25 commits intomlflow:masterfrom
Conversation
🛠 DevTools 🛠
Install mlflow from this PRFor Databricks, use the following command: |
|
Documentation preview for 36a732e is available at: More info
|
…query instead of transcript The SDK transcript files only contain queue-operation metadata, not actual conversation content, so process_transcript() could never find user messages. This wraps query() and receive_messages() on the client instance to accumulate messages into a buffer, then builds the trace from typed SDK message objects via a new process_sdk_messages() function. Also extracts shared trace finalization logic into _finalize_trace() to reduce duplication. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
ab341c9 to
0b8ff74
Compare
…improve docstrings Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
…sult_message Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
…esultMessage Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
- Wrap receive_response() to capture ResultMessage (which contains token usage and duration but is only yielded by receive_response, not receive_messages) - Remove fake custom timestamps from SDK path — spans now use real wall-clock timing instead of computed timestamps that showed 1 second - Include cache tokens (cache_creation + cache_read) in input token count - Verify trace-level token usage aggregation in tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
The SDK fires the Stop hook BEFORE yielding ResultMessage (which carries token usage and duration). This caused both execution_duration and token_usage to be missing from traces. Fix: build the trace when receive_response() is fully consumed instead of in the Stop hook. A receiving_response flag prevents the stop hook from building a partial trace mid-stream. The stop hook still serves as a fallback for code paths that only use receive_messages(). Also sets token usage directly on trace_metadata as belt-and-suspenders, and uses ResultMessage.duration_ms for custom span timestamps. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
- Replace 3 wrappers + stop hook + 2 flags with a single receive_response() wrapper that builds the trace on exhaustion - Use native Anthropic message format instead of converting to OpenAI - Only include messages since last LLM span (not full history) - Set MESSAGE_FORMAT: "anthropic" on LLM spans for Chat UI rendering - Remove _build_trace helper, query/receive_messages wrappers, and all hook/flag machinery (-267 lines net) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
- query() doesn't echo through receive_response(), so wrap it to capture the user prompt in the message buffer - Remove implementation-detail docstrings from internal methods - Rename _sdk_msg_to_dict to _serialize_sdk_message Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
There was a problem hiding this comment.
Overall looks good once #20778 (comment) is addressed
| if options is None: | ||
| options = ClaudeAgentOptions() | ||
| # query() sends the user prompt but doesn't echo it through receive_response() | ||
| original_query = self.query |
There was a problem hiding this comment.
Can we use the @safe_patch mechanism like other autologging itegration?
There was a problem hiding this comment.
I think this is already present here:
mlflow/mlflow/anthropic/__init__.py
Line 55 in 3453341
These are patches on instance methods so I think the existing patch is sufficient, but LMK if not. These also have some limitations (e.g., async methods, they are not stateless) which make them hard to use with safe_patch
There was a problem hiding this comment.
Ah my bad, that was misread of the code.
Re:async, safe_patch should handle async now (we implement tracing for async LLM calls with it) so there may be some way to use it. But definitely not blocking.
| return tool_result_map | ||
|
|
||
|
|
||
| def _serialize_sdk_message(msg) -> dict[str, Any] | None: |
There was a problem hiding this comment.
Does asdict of dataclass work?
There was a problem hiding this comment.
Gave it a shot. asdict on the full message doesn't help because it still requires significant postporcessing. We do use asdict for serialization in _serialize_content_block where it replaces manual field extraction.
- Only include cache_creation_input_tokens in input count (not cache_read) since cache reads are significantly cheaper and would inflate cost estimates - Skip response key in outputs when final_response is None - Remove session ID fallback generation; omit if unavailable - Simplify async flush: call flush_trace_async_logging() directly (it already handles errors internally) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
… fix docstring - Move cache token detail from docstring to inline comment - Extract _is_async_trace_logging_enabled() utility with unit tests (reviewer flagged that _async_queue field could change silently) - Use dataclasses.asdict for SDK content block serialization - Fix process_sdk_messages docstring (no longer generates session IDs) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Encapsulates the async queue check + flush call so the fragile _async_queue field name is tested directly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
query() can receive an async generator of message dicts (not just a string). Wrap the generator to capture user content for the trace while passing items through to the SDK. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
…st imports to top level Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>
…ges (mlflow#20778) Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com> Co-authored-by: Claude <noreply@anthropic.com>
…ges (#20778) Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com> Co-authored-by: Claude <noreply@anthropic.com>
Related Issues/PRs
#xxxWhat changes are proposed in this pull request?
How is this PR tested?
Results:

Does this PR require documentation update?
Does this PR require updating the MLflow Skills repository?
Release Notes
Is this a user-facing change?
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/tracking: Tracking Service, tracking client APIs, autologgingarea/models: MLmodel format, model serialization/deserialization, flavorsarea/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registryarea/scoring: MLflow Model server, model deployment tools, Spark UDFsarea/evaluation: MLflow model evaluation features, evaluation metrics, and evaluation workflowsarea/gateway: MLflow AI Gateway client APIs, server, and third-party integrationsarea/prompts: MLflow prompt engineering features, prompt templates, and prompt managementarea/tracing: MLflow Tracing features, tracing APIs, and LLM tracing functionalityarea/projects: MLproject format, project running backendsarea/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/build: Build and test infrastructure for MLflowarea/docs: MLflow documentation pagesHow should the PR be classified in the release notes? Choose one:
rn/none- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/breaking-change- The PR will be mentioned in the "Breaking Changes" sectionrn/feature- A new user-facing feature worth mentioning in the release notesrn/bug-fix- A user-facing bug fix worth mentioning in the release notesrn/documentation- A user-facing documentation change worth mentioning in the release notesShould this PR be included in the next patch release?
Yesshould be selected for bug fixes, documentation updates, and other small changes.Noshould be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.What is a minor/patch release?
Bug fixes, doc updates and new features usually go into minor releases.
Bug fixes and doc updates usually go into patch releases.