fix(anthropic): async stream manager #3220

dinmukhamedm · 2025-08-03T10:02:13Z

I have added tests that cover my changes.
If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
(If applicable) I have updated the documentation accordingly.

Important

Adds async and sync message stream manager support to Anthropic instrumentation with comprehensive tests.

Behavior:
- Adds WrappedAsyncMessageStreamManager and WrappedMessageStreamManager in streaming.py to handle async and sync message streams.
- Updates _wrap() and _awrap() in __init__.py to return wrapped stream managers when is_stream_manager() is true.
- Introduces is_stream_manager() in __init__.py to check for MessageStreamManager or AsyncMessageStreamManager.
Tests:
- Adds tests for legacy and event-based message streaming in test_messages.py.
- Includes tests for both sync and async message stream managers with and without content.
Misc:
- Removes AsyncMessages.stream from WRAPPED_AMETHODS in __init__.py.
- Updates VCR cassettes for new test cases.

^{This description was created by}^{for 0b4a7f5. You can customize this summary. It will automatically update as commits are pushed.}

Summary by CodeRabbit

New Features
- Added support for instrumenting both synchronous and asynchronous streaming responses from Anthropic, improving observability for streamed message handling.
Tests
- Introduced comprehensive tests covering legacy and event-based streaming scenarios for both sync and async Anthropic clients.
- Added new test cassettes to ensure reliable and repeatable testing of streaming message instrumentation.

coderabbitai · 2025-08-03T10:02:24Z

Walkthrough

This change introduces support for instrumenting Anthropic's streaming responses when they are returned as MessageStreamManager or AsyncMessageStreamManager instances. New wrapper classes and detection logic are added to ensure both synchronous and asynchronous streaming are correctly handled. Comprehensive tests and cassettes validate the new instrumentation paths.

Changes

Cohort / File(s)	Change Summary
Instrumentation: Stream Manager Detection & Wrapping `packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py`	Adds `is_stream_manager` helper, updates method wrapping to detect and wrap stream manager responses using new wrapper classes for both sync and async cases.
Streaming Wrappers `packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py`	Adds `WrappedMessageStreamManager` and `WrappedAsyncMessageStreamManager` classes to instrument streaming context managers for sync and async.
Streaming Tests `packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py`	Adds new test functions for sync and async stream manager instrumentation, covering legacy and event-based modes, with and without content.
Test Cassettes: Sync Streaming `packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_messages/test_anthropic_message_stream_manager_legacy.yaml`, `.../test_anthropic_message_stream_manager_with_events_with_content.yaml`, `.../test_anthropic_message_stream_manager_with_events_with_no_content.yaml`	Adds cassette files recording HTTP streaming interactions for sync streaming tests, covering legacy and event-based scenarios.
Test Cassettes: Async Streaming `packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_messages/test_async_anthropic_message_stream_manager_legacy.yaml`, `.../test_async_anthropic_message_stream_manager_with_events_with_content.yaml`, `.../test_async_anthropic_message_stream_manager_with_events_with_no_content.yaml`	Adds cassette files recording HTTP streaming interactions for async streaming tests, covering legacy and event-based scenarios.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Client
    participant Instrumentation
    participant StreamManager
    participant WrappedStreamManager

    User->>Client: call messages.stream(...)
    Client->>Instrumentation: return StreamManager instance
    Instrumentation->>Instrumentation: is_stream_manager(response)
    alt is StreamManager
        Instrumentation->>WrappedStreamManager: wrap with telemetry
        User->>WrappedStreamManager: use as (async) context manager
        WrappedStreamManager->>StreamManager: __enter__/__aenter__
        WrappedStreamManager->>User: yield instrumented stream
    else not StreamManager
        Instrumentation->>User: return as usual
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~18 minutes

Assessment against linked issues

Objective	Addressed	Explanation
Instrumentation correctly wraps and returns the appropriate type for `stream()` and `async stream()` methods, preventing TypeError and supporting context manager usage (#3178)	✅
Instrumentation works for both sync and async streaming, with correct telemetry and event logging (#3178)	✅
No incompatible or incorrect type returned for instrumented streaming methods (#3178)	✅

Assessment against linked issues: Out-of-scope changes

No out-of-scope changes detected.

Suggested reviewers

nirga

Poem

A stream of jokes, a river of code,
Now instrumented, in sync and async mode.
Wrapped with care, the spans all align,
No more TypeErrors—everything’s fine!
With cassettes and tests, our coverage grows,
🐇 Cheers from this rabbit, as telemetry flows!

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

ellipsis-dev

Important

Looks good to me! 👍

Reviewed everything up to 0b4a7f5 in 2 minutes and 0 seconds. Click for details.

Reviewed 1576 lines of code in 9 files
Skipped 0 files when reviewing.
Skipped posting 17 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py:106

Draft comment:
In the is_stream_manager() function, the fallback compares class.name strings. Consider using a more robust type check (e.g. isinstance if the classes are available) to ensure maintainability.
Reason this comment was not posted:
Comment looked like it was already resolved.

2. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py:376

Draft comment:
Good job implementing WrappedAsyncMessageStreamManager with proper async context management (aenter and aexit). This should fix the runtime warning by properly awaiting the underlying async methods.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

3. packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py:86

Draft comment:
Several tests use 'try/except Exception' blocks when calling messages.create with an unknown_parameter. Consider narrowing the exception (or asserting the expected error type) for improved test clarity.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

4. packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py:2470

Draft comment:
The async stream tests correctly use 'async with' and 'async for' constructs and check the wrapped streaming response. This confirms the fix for the async coroutine not being awaited issue.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

5. packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py:95

Draft comment:
Several tests compare fixed response IDs (e.g. 'msg_01MCkQZZtEKF3nVbFaExwATe'). Since these are retrieved from cassette responses, please ensure these values remain consistent if the underlying API response format changes.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50% None

6. packages/opentelemetry-instrumentation-anthropic/:0

Draft comment:
Overall, the changes correctly wrap the async streaming manager to ensure that the coroutine is awaited, thus resolving the reported bug. The combination of synchronous and asynchronous stream handling appears well integrated with metrics and log events.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

7.

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_messages/test_anthropic_message_stream_manager_legacy.yaml:60

Draft comment:
Typo: 'helpe' appears to be a typo—should it be 'helped'?
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% This appears to be a false positive. The text is intentionally split between two events - "helpe" in one event and "d" in the next event, which together form "helped". This is normal for streaming responses where text is chunked. The comment misunderstood how the streaming response format works. Could there be some other reason why fixing this apparent typo would be beneficial? Maybe for readability or testing purposes? No - this is a recorded API response that we want to keep exactly as-is for testing purposes. Modifying it would defeat the purpose of having a test cassette. Delete this comment as it incorrectly identifies a typo in what is actually an intentional splitting of the word "helped" across two streaming events in a test cassette file.

8.

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_messages/test_anthropic_message_stream_manager_legacy.yaml:64

Draft comment:
Typo: 'instea' appears to be missing a few letters. Perhaps it should be 'instead'?
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% This is a test cassette file that records actual API responses. The word "instead" is intentionally split across multiple streaming events - "instea" and "d" are separate chunks. This is normal behavior for streaming APIs and is not a bug. The comment misunderstands how streaming responses work. Could this split word cause issues for consumers of the API? Could this be a bug in Anthropic's streaming implementation? No - this is expected behavior for streaming APIs. Consumers must handle partial words across chunks. The test cassette correctly captures the real API behavior which is what we want for testing. Delete the comment. The apparent "typo" is actually correct behavior showing how the streaming API chunks its response.

9.

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_messages/test_anthropic_message_stream_manager_legacy.yaml:98

Draft comment:
Typo: 'Woul' seems to be missing the 'd'. It likely should be 'Would'.
Reason this comment was not posted:
Comment looked like it was already resolved.

10.

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_messages/test_anthropic_message_stream_manager_with_events_with_content.yaml:59

Draft comment:
Typo detected: "helpe" should likely be "helped" to form the phrase "Because it helped them trace…" (consider merging with the following delta that currently reads "d them").
Reason this comment was not posted:
Comment looked like it was already resolved.

11.

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_messages/test_anthropic_message_stream_manager_with_events_with_content.yaml:77

Draft comment:
Typographical issue: The text appears split between "distribute" and "d tracing?". Consider combining these into "distributed tracing?" to improve clarity.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% This is a test fixture file that intentionally captures the exact API response. The splitting of words is part of the actual response data and should be preserved exactly as received. Changing it would make the test fixture inaccurate. The comment suggests modifying test data that should remain unchanged. Maybe the split word could cause issues for downstream processing? Maybe there's a valid reason to normalize the test data? No - test fixtures should capture the exact real-world behavior, including any quirks in how the API chunks its responses. Modifying the response would defeat the purpose of having recorded test data. Delete this comment. The word splitting is part of the actual API response data and should be preserved exactly as received in this test fixture.

12.

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_messages/test_anthropic_message_stream_manager_with_events_with_content.yaml:82

Draft comment:
The phrase is split into "observ" and later "ability framework". It likely should read "observability framework". Please correct the splitting.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% This is a recorded API response that's being used as a test fixture. The splitting of words is part of the actual streaming response from the API - it's not a bug or issue that needs fixing. The splitting is just how the API chose to chunk the response. Changing it would make the test fixture inaccurate to the real API behavior. Maybe the splitting could cause issues for consumers of the API? Maybe there's a reason we want to normalize the chunks in our tests? No, the splitting is the actual behavior we want to test against. Our code needs to handle arbitrary chunk boundaries, and modifying the test fixture would hide real-world conditions we need to handle. The comment should be deleted. The word splitting is part of the actual API response behavior that we want to preserve in our test fixture.

13.

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_messages/test_anthropic_message_stream_manager_with_events_with_content.yaml:88

Draft comment:
Typo detected: "d understan" should be corrected to "and understand" (ensure the broken word fragments are merged correctly).
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% This is a false positive - what appears to be a typo is actually the intended format of the streaming API response. The word being split across multiple deltas is by design. This is test data that correctly captures the actual API behavior. Making the suggested change would make the test data incorrect. Could this split word format actually be a bug in the API that should be fixed? Could there be value in making the test data more readable? No - this is the documented and intended behavior of streaming responses. The test cassette needs to accurately reflect the real API behavior, not an idealized version. The comment should be removed as it incorrectly identifies intended API behavior as a typo. Changing the test data would make it inaccurate.

14.

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_messages/test_anthropic_message_stream_manager_with_events_with_content.yaml:93

Draft comment:
Typographical error: The text is split into "of a ner" and "dy tech", which should be combined to read "of a nerdy tech".
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% The splitting of words in streaming responses is normal and expected behavior. This is a test cassette that captures the actual API response, so the splitting is accurately representing how the API behaves. There's no bug or issue here that needs fixing. The comment seems to misunderstand that this is intentional streaming behavior. Could this splitting pattern indicate an issue with how the API handles word boundaries that should be investigated? Could it affect text processing downstream? No - this is standard behavior for streaming text responses. The client code that consumes these events is designed to handle partial words and will reconstruct the full text correctly. The splitting pattern doesn't matter as long as the pieces can be reassembled. The comment should be deleted because it points out an intentional and correct behavior of the streaming API response. No changes are needed.

15.

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_messages/test_anthropic_message_stream_manager_with_events_with_no_content.yaml:60

Draft comment:
Typo detected: "it helpe" should likely be "it helped" (or "it help") to form a complete sentence.
Reason this comment was not posted:
Comment was on unchanged code.

16.

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_messages/test_anthropic_message_stream_manager_with_events_with_no_content.yaml:96

Draft comment:
Typo detected: "to fin" appears to be missing the 'd'. It should be "to find".
Reason this comment was not posted:
Comment looked like it was already resolved.

17.

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_messages/test_async_anthropic_message_stream_manager_legacy.yaml:63

Draft comment:
Typo: The word "helpe" appears in the response body. It likely should be "helped" (as in "Because it helped them").
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% The comment misunderstands how streaming responses work. In streaming responses, words are often split across multiple events - this is normal and expected behavior. The word "helped" is actually complete when you combine "helpe" from one event with "d" from the next event. This is test data recording an actual API interaction, so we shouldn't modify it anyway. Could this actually be a real typo in the API response that we're recording? Should we verify the intended behavior with the API provider? No - this is clearly just normal streaming behavior where the word is split across events. The complete word "helped" is formed when combining the events, which is exactly how streaming responses work. Delete this comment as it incorrectly identifies normal streaming behavior as a typo. The test cassette correctly records the actual API interaction.

Workflow ID: wflow_jPz5PFIujZot3IQo

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

dinmukhamedm · 2025-08-03T10:06:57Z

@nirga could you please help me resolve the install issue in the Python 3.9 action? This looks new to me, and I think it shouldn't be blocking the checks on other PRs

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py (2)

304-326: Consider adding type hints for better maintainability.

While the implementation is functionally correct, adding type hints for the constructor parameters would improve code clarity and IDE support.

    def __init__(
        self,
-        stream_manager,
-        span,
-        instance,
-        start_time,
-        token_histogram,
-        choice_counter,
-        duration_histogram,
-        exception_counter,
-        event_logger,
-        kwargs,
+        stream_manager: Any,
+        span: Span,
+        instance: Any,
+        start_time: float,
+        token_histogram: Optional[Histogram],
+        choice_counter: Optional[Counter],
+        duration_histogram: Optional[Histogram],
+        exception_counter: Optional[Counter],
+        event_logger: Optional[EventLogger],
+        kwargs: dict,
    ):

352-374: Consider adding type hints for consistency.

Similar to the sync wrapper, adding type hints would improve maintainability and consistency across the codebase.

    def __init__(
        self,
-        stream_manager,
-        span,
-        instance,
-        start_time,
-        token_histogram,
-        choice_counter,
-        duration_histogram,
-        exception_counter,
-        event_logger,
-        kwargs,
+        stream_manager: Any,
+        span: Span,
+        instance: Any,
+        start_time: float,
+        token_histogram: Optional[Histogram],
+        choice_counter: Optional[Counter],
+        duration_histogram: Optional[Histogram],
+        exception_counter: Optional[Counter],
+        event_logger: Optional[EventLogger],
+        kwargs: dict,
    ):

packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py (1)

2487-2492: Consider using contextlib.suppress(Exception) for cleaner exception handling.

The static analysis correctly identifies that the try-except-pass patterns could be replaced with contextlib.suppress(Exception) for better readability. However, since this pattern is used consistently throughout the entire test file, changing only the new tests would create inconsistency.

Consider addressing this as a separate refactoring effort for the entire test file:
+import contextlib
+
 # Replace try-except-pass blocks with:
-try:
-    anthropic_client.messages.create(
-        unknown_parameter="unknown",
-    )
-except Exception:
-    pass
+with contextlib.suppress(Exception):
+    anthropic_client.messages.create(
+        unknown_parameter="unknown",
+    )
Also applies to: 2553-2558, 2617-2622, 2677-2682, 2728-2733, 2791-2796

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4410eef and 0b4a7f5.

📒 Files selected for processing (9)

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (5 hunks)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py (1 hunks)
packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_messages/test_anthropic_message_stream_manager_legacy.yaml (1 hunks)
packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_messages/test_anthropic_message_stream_manager_with_events_with_content.yaml (1 hunks)
packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_messages/test_anthropic_message_stream_manager_with_events_with_no_content.yaml (1 hunks)
packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_messages/test_async_anthropic_message_stream_manager_legacy.yaml (1 hunks)
packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_messages/test_async_anthropic_message_stream_manager_with_events_with_content.yaml (1 hunks)
packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_messages/test_async_anthropic_message_stream_manager_with_events_with_no_content.yaml (1 hunks)
packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.12.2)

packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py

2487-2492: Use contextlib.suppress(Exception) instead of try-except-pass