Fix Claude Agent SDK tracing by capturing messages from receive_messages by smoorjani · Pull Request #20778 · mlflow/mlflow

smoorjani · 2026-02-12T17:53:25Z

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

Fix mlflow.anthropic.autolog() not creating traces when using the Claude Agent SDK. The previous hook-based approach read transcript files that suddenly started to only contain queue-operation metadata, so no trace was ever created.
Wrap query() and receive_response() on the SDK client to capture messages directly and build the trace when the response stream is exhausted.
Use native Anthropic message format for LLM span inputs/outputs, include cache tokens in input totals.

How is this PR tested?

Existing unit/integration tests
New unit/integration tests
Manual tests

import asyncio

import mlflow

mlflow.anthropic.autolog()
mlflow.set_tracking_uri("databricks")
mlflow.set_experiment(experiment_id="98459650931566")


async def main():
    from claude_agent_sdk import ClaudeSDKClient

    async with ClaudeSDKClient() as client:
        await client.query(
            "Read through the MLflow MemAlign implementation in this codebase "
            "(check mlflow/metrics/ and related files). Briefly explain what it does, "
            "then suggest 2-3 concrete performance optimizations. "
            "Use the Read and Grep tools to explore the code."
        )
        async for msg in client.receive_response():
            print(f"  [{type(msg).__name__}] {msg}")

    print("\nDone! Check experiment 98459650931566 for the trace.")


asyncio.run(main())

Results:

Does this PR require documentation update?

Does this PR require updating the MLflow Skills repository?

No. You can skip the rest of this section.
Yes. Please link the corresponding PR or explain how you plan to update it.

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

How should the PR be classified in the release notes? Choose one:

rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

Should this PR be included in the next patch release?

Yes should be selected for bug fixes, documentation updates, and other small changes. No should be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.

What is a minor/patch release?

Minor release: a release that increments the second part of the version number (e.g., 1.2.0 -> 1.3.0).
Bug fixes, doc updates and new features usually go into minor releases.
Patch release: a release that increments the third part of the version number (e.g., 1.2.0 -> 1.2.1).
Bug fixes and doc updates usually go into patch releases.

Yes (this PR will be cherry-picked and included in the next patch release)
No (this PR will be included in the next minor release)

github-actions · 2026-02-12T17:54:06Z

🛠 DevTools 🛠

Install mlflow from this PR

# mlflow
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/20778/merge
# mlflow-skinny
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/20778/merge#subdirectory=libs/skinny

For Databricks, use the following command:

%sh curl -LsSf https://raw.githubusercontent.com/mlflow/mlflow/HEAD/dev/install-skinny.sh | sh -s pull/20778/merge

github-actions · 2026-02-12T18:02:48Z

Documentation preview for 36a732e is available at:

https://pr-20778--mlflow-docs-preview.netlify.app/docs/latest/

More info

Ignore this comment if this PR does not change the documentation.
The preview is updated when a new commit is pushed to this PR.
This comment was created by this workflow run.
The documentation was built by this workflow run.

…query instead of transcript The SDK transcript files only contain queue-operation metadata, not actual conversation content, so process_transcript() could never find user messages. This wraps query() and receive_messages() on the client instance to accumulate messages into a buffer, then builds the trace from typed SDK message objects via a new process_sdk_messages() function. Also extracts shared trace finalization logic into _finalize_trace() to reduce duplication. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

…improve docstrings Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

…sult_message Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

…esultMessage Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

- Wrap receive_response() to capture ResultMessage (which contains token usage and duration but is only yielded by receive_response, not receive_messages) - Remove fake custom timestamps from SDK path — spans now use real wall-clock timing instead of computed timestamps that showed 1 second - Include cache tokens (cache_creation + cache_read) in input token count - Verify trace-level token usage aggregation in tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

The SDK fires the Stop hook BEFORE yielding ResultMessage (which carries token usage and duration). This caused both execution_duration and token_usage to be missing from traces. Fix: build the trace when receive_response() is fully consumed instead of in the Stop hook. A receiving_response flag prevents the stop hook from building a partial trace mid-stream. The stop hook still serves as a fallback for code paths that only use receive_messages(). Also sets token usage directly on trace_metadata as belt-and-suspenders, and uses ResultMessage.duration_ms for custom span timestamps. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

- Replace 3 wrappers + stop hook + 2 flags with a single receive_response() wrapper that builds the trace on exhaustion - Use native Anthropic message format instead of converting to OpenAI - Only include messages since last LLM span (not full history) - Set MESSAGE_FORMAT: "anthropic" on LLM spans for Chat UI rendering - Remove _build_trace helper, query/receive_messages wrappers, and all hook/flag machinery (-267 lines net) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

- query() doesn't echo through receive_response(), so wrap it to capture the user prompt in the message buffer - Remove implementation-detail docstrings from internal methods - Rename _sdk_msg_to_dict to _serialize_sdk_message Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

B-Step62

Overall looks good once #20778 (comment) is addressed

mlflow/claude_code/tracing.py

B-Step62 · 2026-02-17T12:08:05Z

mlflow/anthropic/autolog.py

-        if options is None:
-            options = ClaudeAgentOptions()
+        # query() sends the user prompt but doesn't echo it through receive_response()
+        original_query = self.query


Can we use the @safe_patch mechanism like other autologging itegration?

I think this is already present here:

mlflow/mlflow/anthropic/__init__.py

Line 55 in 3453341

safe_patch(

These are patches on instance methods so I think the existing patch is sufficient, but LMK if not. These also have some limitations (e.g., async methods, they are not stateless) which make them hard to use with safe_patch

Ah my bad, that was misread of the code.

Re:async, safe_patch should handle async now (we implement tracing for async LLM calls with it) so there may be some way to use it. But definitely not blocking.

mlflow/claude_code/tracing.py

B-Step62 · 2026-02-17T14:11:30Z

mlflow/claude_code/tracing.py

+    return tool_result_map
+
+
+def _serialize_sdk_message(msg) -> dict[str, Any] | None:


Does asdict of dataclass work?

Gave it a shot. asdict on the full message doesn't help because it still requires significant postporcessing. We do use asdict for serialization in _serialize_content_block where it replaces manual field extraction.

mlflow/claude_code/tracing.py

- Only include cache_creation_input_tokens in input count (not cache_read) since cache reads are significantly cheaper and would inflate cost estimates - Skip response key in outputs when final_response is None - Remove session ID fallback generation; omit if unavailable - Simplify async flush: call flush_trace_async_logging() directly (it already handles errors internally) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

… fix docstring - Move cache token detail from docstring to inline comment - Extract _is_async_trace_logging_enabled() utility with unit tests (reviewer flagged that _async_queue field could change silently) - Use dataclasses.asdict for SDK content block serialization - Fix process_sdk_messages docstring (no longer generates session IDs) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

Encapsulates the async queue check + flush call so the fragile _async_queue field name is tested directly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

query() can receive an async generator of message dicts (not just a string). Wrap the generator to capture user content for the trace while passing items through to the SDK. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

…st imports to top level Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

…ges (mlflow#20778) Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com> Co-authored-by: Claude <noreply@anthropic.com>

…ges (#20778) Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com> Co-authored-by: Claude <noreply@anthropic.com>

smoorjani added the v3.10.0 label Feb 12, 2026

github-actions bot added area/tracking Tracking service, tracking client APIs, autologging rn/bug-fix Mention under Bug Fixes in Changelogs. labels Feb 12, 2026

smoorjani force-pushed the claude-agents-sdk-bug branch from ab341c9 to 0b8ff74 Compare February 13, 2026 18:00

smoorjani and others added 16 commits February 13, 2026 10:28

Clean up SDK tracing: remove redundant defaults, fix variable names, …

9f05324

…improve docstrings Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

Run ruff format on changed files

c4d86ad

Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

Refactor SDK tracing into focused helpers

1819479

Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

Inline SDK helpers into process_sdk_messages

a831d39

Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

Inline wrappers into patched_claude_sdk_init, remove trivial _find_re…

e46c4c1

…sult_message Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

Move ResultMessage import to top of function

18544f5

Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

Consolidate duplicate tests to reduce bloat

11f2f8d

Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

Reduce test bloat: extract shared helper, remove redundant tests

dd72dde

Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

Remove redundant single-tool test

f096e93

Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

Add conversation context to LLM span inputs, use real duration from R…

cda5348

…esultMessage Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

Apply ruff formatting to SDK tracing files

42598d9

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

Fix clint lint: walrus operator and redundant test docstring

225b3bc

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

smoorjani requested a review from B-Step62 February 17, 2026 03:13

B-Step62 approved these changes Feb 17, 2026

View reviewed changes

smoorjani and others added 5 commits February 17, 2026 09:06

Extract _flush_trace_async_logging() utility with tests

487da4e

Encapsulates the async queue check + flush call so the fragile _async_queue field name is tested directly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

Clean up: top-level imports, whitespace, readable variable names

3bcab37

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

Merge branch 'master' into claude-agents-sdk-bug

0e668d1

smoorjani and others added 3 commits February 17, 2026 15:49

Rename single-letter variables to full names in SDK serialization

ba93d5b

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

Only capture user-type messages from async generator prompts, move te…

36a732e

…st imports to top level Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com>

github-actions bot assigned B-Step62 Feb 18, 2026

smoorjani added this pull request to the merge queue Feb 18, 2026

Merged via the queue into mlflow:master with commit 74c2e60 Feb 18, 2026
54 checks passed

smoorjani deleted the claude-agents-sdk-bug branch February 18, 2026 06:17

github-actions bot added the size/XL Extra-large PR (500+ LoC) label Feb 18, 2026

daniellok-db pushed a commit that referenced this pull request Feb 20, 2026

Fix Claude Agent SDK tracing by capturing messages from receive_messa…

ae3a39c

…ges (#20778) Signed-off-by: Samraj Moorjani <samraj.moorjani@databricks.com> Co-authored-by: Claude <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Claude Agent SDK tracing by capturing messages from receive_messages#20778

Fix Claude Agent SDK tracing by capturing messages from receive_messages#20778
smoorjani merged 25 commits intomlflow:masterfrom
smoorjani:claude-agents-sdk-bug

smoorjani commented Feb 12, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 12, 2026

Install mlflow from this PR

Uh oh!

github-actions bot commented Feb 12, 2026 •

edited

Loading

Uh oh!

B-Step62 left a comment •

edited

Loading

Uh oh!

Uh oh!

B-Step62 Feb 17, 2026

Uh oh!

smoorjani Feb 17, 2026

Uh oh!

B-Step62 Feb 18, 2026

Uh oh!

Uh oh!

Uh oh!

B-Step62 Feb 17, 2026

Uh oh!

smoorjani Feb 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		return tool_result_map


		def _serialize_sdk_message(msg) -> dict[str, Any] \| None:

Conversation

smoorjani commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related Issues/PRs

What changes are proposed in this pull request?

How is this PR tested?

Does this PR require documentation update?

Does this PR require updating the MLflow Skills repository?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

Should this PR be included in the next patch release?

Uh oh!

github-actions bot commented Feb 12, 2026

Install mlflow from this PR

Uh oh!

github-actions bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

B-Step62 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

B-Step62 Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

smoorjani Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

B-Step62 Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

B-Step62 Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

smoorjani Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

smoorjani commented Feb 12, 2026 •

edited

Loading

github-actions bot commented Feb 12, 2026 •

edited

Loading

B-Step62 left a comment •

edited

Loading

smoorjani Feb 17, 2026 •

edited

Loading