fix(openai): support for openai non-consumed streams #3155

minimAluminiumalism · 2025-07-20T11:47:58Z

I have added tests that cover my changes.
If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
(If applicable) I have updated the documentation accordingly.

After the fix, streaming responses are now properly instrumented even when not consumed. Both traces and metrics(gen_ai.client.operation.duration) are collected.

Unsupported Metrics

The following streaming-specific metrics are intentionally not collected for non-consumed streams. These metrics require actual stream consumption to generate accurate data.

gen_ai.server.time_to_first_token - Requires actual receipt of first chunk to set timestamp
llm.chat_completions.streaming_time_to_generate - Needs time difference between first and last tokens
gen_ai.client.generation.choices - Depends on accumulated response data from consumed chunks

Important

Enhance OpenAI instrumentation to support non-consumed streaming responses with proper cleanup and metric recording.

Behavior:
- Streaming responses are now instrumented even when not consumed, with traces and metrics collected.
- Unsupported metrics for non-consumed streams include gen_ai.server.time_to_first_token, llm.chat_completions.streaming_time_to_generate, and gen_ai.client.generation.choices.
Code Changes:
- Added cleanup logic in ChatStream class in chat_wrappers.py to handle non-consumed streams.
- Introduced _ensure_cleanup() and _record_partial_metrics() methods for safe cleanup and partial metric recording.
Tests:
- Added tests in test_chat.py for non-consumed streams, partial consumption, and exception handling during streaming.
- New test cases include test_chat_streaming_not_consumed, test_chat_streaming_partial_consumption, and test_chat_streaming_exception_during_consumption.

^{This description was created by}^{for a43ed5b. You can customize this summary. It will automatically update as commits are pushed.}

Summary by CodeRabbit

New Features
- Improved reliability and resource management for streaming chat interactions, ensuring proper cleanup and metric recording even if streams are only partially consumed or encounter errors.
Bug Fixes
- Enhanced handling to prevent memory leaks and ensure spans are always properly finalized during streaming chat sessions.
Tests
- Added comprehensive tests for streaming chat scenarios, including partial consumption, unconsumed streams, error handling during streaming, and memory leak prevention.
Style
- Improved formatting for better readability in test files.

coderabbitai · 2025-07-20T11:48:04Z

Walkthrough

The changes introduce robust, thread-safe cleanup and error handling to the ChatStream class in the OpenAI instrumentation, ensuring spans and metrics are finalized even if streaming responses are not fully consumed or if exceptions occur. New tests and VCR cassettes are added to verify correct metric and span recording in various streaming scenarios, including partial or no consumption.

Changes

File(s)	Change Summary
.../openai/shared/chat_wrappers.py	Added thread-safe cleanup logic, destructor, and improved error handling in `ChatStream` for streaming.
.../test_chat.py	Added new tests for streaming chat cleanup scenarios; improved formatting of test code.
.../test_chat/test_chat_streaming_not_consumed.yaml .../test_chat_streaming_partial_consumption.yaml .../test_chat_streaming_exception_during_consumption.yaml .../test_chat_streaming_memory_leak_prevention.yaml	Added new VCR cassettes simulating various streaming chat completion scenarios for testing.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant ChatStream
    participant Span
    participant Metrics

    User->>ChatStream: Create streaming chat completion
    ChatStream->>Span: Start span
    alt User consumes stream fully
        loop For each chunk
            User->>ChatStream: __next__/__anext__
            ChatStream->>User: Return chunk
        end
        ChatStream->>Metrics: Record full metrics
        ChatStream->>Span: End span
    else User consumes stream partially or not at all
        alt Exception during consumption
            ChatStream->>Metrics: Record partial metrics
            ChatStream->>Span: Set error, end span
        else Stream object deleted/GC'd
            ChatStream->>Metrics: Record partial metrics
            ChatStream->>Span: End span
        end
    end

Poem

In the warren of code, a stream may go cold,
But now every span is wrapped up and told.
Whether you nibble or leave chunks behind,
Metrics are gathered—none left confined!
With thread-safe paws and cleanup so neat,
This bunny ensures tracing is sweet.
🐇✨

✨ Finishing Touches

📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

ellipsis-dev

Important

Looks good to me! 👍

Reviewed everything up to a43ed5b in 2 minutes and 1 seconds. Click for details.

Reviewed 1590 lines of code in 6 files
Skipped 1 files when reviewing.
Skipped posting 6 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py:1322

Draft comment:
In the helper function assert_message_in_logs, consider adding a docstring and a type check to ensure that log_record.body is a dict or is safely convertible via dict(), to guard against unexpected types.
Reason this comment was not posted:
Comment was on unchanged code.

2. packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py:1557

Draft comment:
The test 'test_chat_streaming_not_consumed' uses gc.collect() to force cleanup. Since GC behavior can be non-deterministic, consider adding retries or additional checks to ensure the stream object is cleaned up reliably in different environments.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% While the comment raises a valid concern about GC non-determinism, the test appears to be working as intended - it's a unit test that verifies basic cleanup behavior. Adding retries would make the test more complex without clear benefit. The current approach of forcing GC and checking the result is a reasonable testing pattern. The comment identifies a real issue with GC non-determinism that could theoretically cause flaky tests. Maybe there are real-world scenarios where this matters. The test is focused on verifying basic cleanup works, not providing production-grade reliability. The current simple approach is appropriate for a unit test. If GC issues occur in practice, they can be addressed then. Delete the comment. While it raises a valid concern, the current simple testing approach is appropriate and adding complexity isn't clearly beneficial.

3. packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py:56

Draft comment:
Several tests assert hard-coded response IDs (e.g. 'chatcmpl-908MD9ivBBLb6EaIjlqwFokntayQK'). Hard-coding such values might make tests brittle if the cassette recordings change. Consider parameterizing the expected value or validating its format instead.
Reason this comment was not posted:
Comment was on unchanged code.

4. packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py:1069

Draft comment:
In test_chat_context_propagation, the trace context verification on the HTTP request could be expanded to check multiple header fields. This would ensure more robust validation of context propagation.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

5. packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py:955

Draft comment:
Some tests (e.g., test_with_asyncio_run) use 'asyncio.run' while others are decorated with @pytest.mark.asyncio. For consistent async test behavior and to avoid potential event loop issues, consider using pytest-asyncio uniformly.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

6. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py:536

Draft comment:
Typographical suggestion: In the comment, "that is used by most of the other model" might be intended to say "that is used by most of the other models".
Reason this comment was not posted:
Comment was on unchanged code.

Workflow ID: wflow_oyoJzgnQCDnw81IJ

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 1

🔭 Outside diff range comments (1)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (1)
649-650: Add cleanup to async exit method for consistency.

The async __aexit__ method should also ensure cleanup, similar to the synchronous __exit__ method.
    async def __aexit__(self, exc_type, exc_val, exc_tb):
+        self._ensure_cleanup()
        await self.__wrapped__.__aexit__(exc_type, exc_val, exc_tb)

🧹 Nitpick comments (1)

packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (1)
1711-1711: Rename unused loop variable

The loop variable chunk is not used within the loop body.
-        for chunk in response:
+        for _ in response:

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1fe6257 and a43ed5b.

⛔ Files ignored due to path filters (1)

packages/opentelemetry-instrumentation-openai/poetry.lock is excluded by !**/*.lock

📒 Files selected for processing (6)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (15 hunks)
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_streaming_exception_during_consumption.yaml (1 hunks)
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_streaming_memory_leak_prevention.yaml (1 hunks)
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_streaming_not_consumed.yaml (1 hunks)
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_streaming_partial_consumption.yaml (1 hunks)
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (31 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (1)

packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (3)

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (2)

SpanAttributes (36-229)

EventAttributes (239-258)

packages/opentelemetry-instrumentation-openai/tests/traces/test_assistant.py (1)

assert_message_in_logs (1221-1232)

packages/opentelemetry-instrumentation-openai/tests/conftest.py (2)

instrument_legacy (120-136)

openai_client (42-43)

🪛 Ruff (0.12.2)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py

644-644: return inside finally blocks cause exceptions to be silenced

(B012)

packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py

1711-1711: Loop control variable chunk not used within loop body

Rename unused chunk to _chunk

(B007)

🔇 Additional comments (14)

packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_streaming_memory_leak_prevention.yaml (1)

1-189: LGTM! Test cassette for streaming response scenarios.

This cassette properly captures a streaming chat completion interaction for testing memory leak prevention scenarios.

packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_streaming_exception_during_consumption.yaml (1)

1-189: LGTM! Test cassette for exception handling scenarios.

This cassette captures the same streaming interaction for testing exception handling during stream consumption.

packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_streaming_not_consumed.yaml (1)

1-189: LGTM! Test cassette for non-consumed stream scenarios.

This cassette supports testing the main PR objective - ensuring proper instrumentation when streams are not consumed.

packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_streaming_partial_consumption.yaml (1)

1-189: LGTM! Test cassette for partial stream consumption scenarios.

This cassette enables testing proper cleanup when streams are only partially consumed.

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (6)

4-4: LGTM! Good addition for thread safety.

Adding the threading module is necessary for the thread-safe cleanup implementation.

628-636: Excellent cleanup mechanism implementation!

The addition of cleanup state tracking with thread lock and the __del__ destructor ensures proper resource cleanup even when streams are garbage collected without being fully consumed. This directly addresses the PR's objective.

658-672: Great exception handling for stream iteration!

The enhanced exception handling in both __next__ and __anext__ ensures proper cleanup and span status setting when exceptions occur during stream iteration. This is crucial for the PR's objective of handling non-consumed streams.

Also applies to: 674-688

760-797: Robust and well-designed cleanup method!

The _ensure_cleanup method is excellently implemented with:

Thread-safe execution using locks

Idempotent behavior (safe to call multiple times)

Comprehensive error handling that doesn't mask original exceptions

Proper logging for debugging

Fallback mechanisms to ensure cleanup completion

This is the core of the fix for non-consumed streams.

798-829: Smart partial metrics recording!

The _record_partial_metrics method ensures that even when streams are not fully consumed, we still capture:

Duration metrics

Basic span attributes

Partial token metrics

Choice metrics from processed data

This aligns perfectly with the PR description about collecting available metrics for non-consumed streams.

273-274: Good formatting improvements for readability.

The formatting changes improve code readability by properly breaking long lines and using consistent indentation.

Also applies to: 425-426, 482-483, 485-486, 538-539, 545-546, 574-575, 691-692, 753-754, 859-860, 930-931, 1049-1050, 1082-1083, 1130-1131

packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (4)

27-28: Consistent formatting improvement

The multi-line formatting of the messages list improves readability.

1553-1625: Well-designed test for non-consumed streams

This test effectively validates the PR's main objective by ensuring that streaming responses are properly instrumented even when not consumed. The use of garbage collection and comprehensive assertions on spans and metrics provides good coverage.

1627-1697: Good coverage for partial stream consumption

This test appropriately covers the edge case where a stream is only partially consumed. The assertion that at least one event is recorded (line 1660) correctly validates that partial consumption is tracked.

1739-1776: Excellent memory leak prevention test

This test uses weakref effectively to verify that unconsumed stream objects are properly garbage collected, preventing memory leaks. The assertion on line 1764 provides strong evidence that the cleanup mechanism works correctly.

...elemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py

…openai/stream

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (1)
1713-1751: Solid test for exception handling during consumption.

This test validates proper cleanup when exceptions occur during stream consumption. The logic correctly:

Simulates realistic failure: Interrupts consumption after processing some chunks

Handles cleanup properly: Forces cleanup via garbage collection

Validates span status: Correctly expects OK status since exception was in user code

Verifies event recording: Confirms events were captured before exception

Minor suggestion: Address the static analysis hint by renaming chunk to _chunk since the variable isn't used within the loop body.
-        for chunk in response:
+        for _chunk in response:

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2e23016 and 8ef689b.

📒 Files selected for processing (2)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (16 hunks)
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (31 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (1)

packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (3)

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (2)

SpanAttributes (36-229)

EventAttributes (239-258)

packages/opentelemetry-instrumentation-openai/tests/traces/test_assistant.py (1)

assert_message_in_logs (1221-1232)

packages/opentelemetry-instrumentation-openai/tests/conftest.py (2)

instrument_legacy (120-136)

openai_client (42-43)

🪛 Ruff (0.12.2)

packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py

1726-1726: Loop control variable chunk not used within loop body

Rename unused chunk to _chunk

(B007)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Test Packages (3.12)
GitHub Check: Test Packages (3.11)
GitHub Check: Test Packages (3.10)
GitHub Check: Test Packages (3.9)
GitHub Check: Build Packages (3.11)
GitHub Check: Lint

🔇 Additional comments (14)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (10)

4-4: LGTM! Threading import properly added.

The threading import is correctly positioned and necessary for the thread-safe cleanup mechanism introduced in the ChatStream class.

630-632: Excellent thread-safe cleanup state design.

The cleanup state tracking with _cleanup_completed flag and _cleanup_lock provides robust protection against duplicate operations and ensures thread safety. The implementation follows best practices for concurrent programming.

634-637: Well-implemented destructor for cleanup safety.

The __del__ method provides a crucial safety net for cleanup when objects are garbage collected. The hasattr check is essential to avoid AttributeError during interpreter shutdown when attributes may not be available.

643-657: Excellent fix for the exception handling issue.

This implementation properly addresses the previous review comment about the return statement in the finally block. The current approach correctly:

Captures cleanup exceptions without suppressing the original exception

Always calls the wrapped __exit__ method

Logs cleanup errors for debugging without affecting the context manager behavior

Returns the proper result from the wrapped exit

This follows Python best practices and ensures robust exception handling.

677-682: Robust exception handling in iteration methods.

The enhanced __next__ and __anext__ methods properly handle cleanup for exceptions during stream iteration. The logic correctly distinguishes between normal iteration termination (StopIteration/StopAsyncIteration) and actual errors, ensuring spans are marked with error status only when appropriate.

Also applies to: 694-697

704-705: Good formatting improvement.

The formatting change improves readability without altering functionality.

771-771: Essential cleanup completion marking.

Setting _cleanup_completed = True after successful response processing prevents duplicate cleanup operations from the destructor or other cleanup mechanisms. This is a critical part of the idempotent cleanup design.

773-810: Excellently designed thread-safe cleanup method.

The _ensure_cleanup method demonstrates robust design principles:

Thread safety: Proper use of locking mechanism

Idempotency: Checks completion flag to prevent duplicate operations

Comprehensive error handling: Logs errors without propagating them

Graceful degradation: Fallback logic ensures cleanup completion even if metric recording fails

Proper resource management: Sets span status and ends the span correctly

This implementation ensures spans and metrics are finalized regardless of how the stream lifecycle ends.

811-841: Well-implemented partial metrics recording.

The _record_partial_metrics method effectively handles recording metrics with whatever data is available at cleanup time. It properly:

Records duration metrics using start time

Sets span attributes with available response data

Records token and choice metrics when data exists

Uses existing metric recording functions for consistency

This ensures meaningful metrics are captured even for non-consumed streams.

273-274: Good formatting improvements throughout.

The formatting changes improve code readability by properly breaking long lines and following Python style guidelines. These enhance maintainability without changing functionality.

Also applies to: 425-426, 482-486, 541-542, 552-553, 576-577, 766-767, 872-873, 943-944, 1062-1063, 1102-1103, 1150-1151

packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (4)

27-28: Excellent formatting improvements for readability.

The consistent formatting changes throughout the file improve code readability by properly structuring message arrays and function calls. These changes follow Python style guidelines and enhance maintainability.

Also applies to: 41-42, 53-54, 72-73, 93-94, 128-129, 148-149, 288-289, 362-363, 497-498, 571-572, 587-588, 606-607, 621-625, 644-645, 672-676, 714-715, 742-746, 772-773, 791-792, 806-810, 830-831, 858-862, 899-900, 927-931, 1323-1324, 1479-1480, 1497-1498, 1519-1520, 1537-1538

1553-1634: Comprehensive test for non-consumed streaming responses.

This test excellently validates the core PR objective - ensuring proper instrumentation of streaming responses even when not consumed. The test:

Correctly simulates the scenario: Creates stream without consumption

Triggers cleanup properly: Uses garbage collection to invoke destructor

Validates span lifecycle: Checks proper closure and OK status

Verifies metrics recording: Confirms duration metrics are captured

Checks streaming indicators: Validates proper attribute marking

The assertions are thorough and cover all the essential aspects of the functionality.

1636-1711: Good test coverage for partial stream consumption.

This test validates the important scenario where streams are partially consumed then abandoned. The test design is solid:

Realistic scenario: Consumes first chunk then stops (common in practice)

Proper cleanup verification: Uses garbage collection to trigger cleanup

Event validation: Confirms events are recorded for consumed chunks

Metrics verification: Ensures duration metrics are still captured

This covers an important edge case that could occur in real applications.

1753-1788: Excellent memory leak prevention test.

This test demonstrates sophisticated testing practices using weak references to validate proper garbage collection. The test effectively:

Validates resource cleanup: Uses weakref to confirm object collection

Forces garbage collection: Explicitly triggers GC to test cleanup behavior

Verifies span management: Ensures spans are properly closed even after GC

Prevents regression: Guards against memory leaks in streaming scenarios

This is exemplary test design for validating resource management and cleanup behavior.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (1)
1712-1750: Good exception handling test with minor improvement needed

This test properly validates exception handling during stream consumption. The logic correctly simulates user code exceptions and verifies proper cleanup.

However, address the static analysis hint about the unused loop variable:
-        for chunk in response:
+        for _chunk in response:
This makes it clear that the chunk content isn't used in the loop body.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8ef689b and 9545371.

📒 Files selected for processing (1)

packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (31 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (1)

packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (3)

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (2)

SpanAttributes (36-229)

EventAttributes (239-258)

packages/opentelemetry-instrumentation-openai/tests/traces/test_assistant.py (1)

assert_message_in_logs (1221-1232)

packages/opentelemetry-instrumentation-openai/tests/conftest.py (1)

openai_client (42-43)

🪛 Ruff (0.12.2)

packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py

1725-1725: Loop control variable chunk not used within loop body

Rename unused chunk to _chunk

(B007)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Test Packages (3.12)
GitHub Check: Lint
GitHub Check: Test Packages (3.9)
GitHub Check: Test Packages (3.10)
GitHub Check: Test Packages (3.11)
GitHub Check: Build Packages (3.11)

🔇 Additional comments (4)

packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (4)

27-28: LGTM - Improved code formatting

The formatting changes improve readability by breaking long lines into multiple lines while maintaining the same functionality.

Also applies to: 41-42, 53-54, 72-73, 93-94, 128-129, 148-149, 288-289, 362-363, 497-498, 571-572, 587-588, 606-607, 621-625, 644-645, 672-676, 714-715, 742-746, 772-773, 791-792, 806-810, 830-831, 858-862, 899-900, 927-931, 1323-1324, 1479-1480, 1497-1498, 1519-1520, 1537-1538

1553-1634: Excellent test coverage for non-consumed streams

This test comprehensively validates the new functionality described in the PR objectives. It properly verifies that:

Spans are closed even when streams aren't consumed

Duration metrics are recorded regardless of consumption

Streaming attributes are properly set

Cleanup occurs through garbage collection

The test includes thorough assertions for both span attributes and metric data points.

1636-1710: Good coverage for partial stream consumption

This test properly validates the scenario where a stream is partially consumed. The logic correctly:

Consumes only the first chunk to simulate partial consumption

Verifies cleanup still occurs properly

Validates that events are recorded for consumed chunks

Ensures metrics are still recorded correctly

This is an important edge case that complements the non-consumed stream test.

1752-1787: Excellent memory leak prevention test

This test demonstrates sophisticated testing practices by:

Using weak references to validate object lifecycle

Forcing garbage collection to ensure cleanup occurs

Verifying span closure happens during object destruction

Preventing potential memory leaks in streaming scenarios

This provides strong validation of the new cleanup functionality's robustness.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (1)
1725-1725: Fix unused loop variable to follow Python conventions.

The loop variable chunk is not used within the loop body. Consider renaming it to indicate it's intentionally unused.
-        for chunk in response:
+        for _chunk in response:

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9545371 and ae0f0ce.

📒 Files selected for processing (2)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (19 hunks)
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (31 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py

🧰 Additional context used

🪛 Ruff (0.12.2)

packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py

1725-1725: Loop control variable chunk not used within loop body

Rename unused chunk to _chunk

(B007)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Test Packages (3.9)
GitHub Check: Test Packages (3.11)
GitHub Check: Test Packages (3.12)
GitHub Check: Test Packages (3.10)
GitHub Check: Build Packages (3.11)
GitHub Check: Lint

🔇 Additional comments (4)

packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (4)

27-28: LGTM! Formatting improvements enhance readability.

The line breaks and formatting changes improve code readability while maintaining functionality.

Also applies to: 41-42, 53-54, 72-73, 93-94, 128-129, 148-149, 288-289, 362-363, 497-498, 571-572, 587-588, 606-607, 621-625, 644-645, 672-675, 714-715, 742-745, 772-773, 791-792, 806-809, 830-831, 858-861, 899-900, 927-930, 1323-1324, 1479-1480, 1497-1498, 1519-1520, 1537-1538

1553-1634: Excellent test coverage for non-consumed streaming responses.

This test effectively validates the core PR objective - ensuring proper instrumentation of OpenAI streaming responses even when not consumed. The test design is thorough:

Properly simulates the real-world scenario of creating but not consuming streams

Uses garbage collection to trigger cleanup mechanisms

Comprehensively validates both span attributes and duration metrics

Includes detailed assertions with descriptive error messages

1636-1710: Well-designed test for partial stream consumption scenarios.

This test covers an important edge case where streams are only partially consumed. The implementation correctly:

Simulates realistic partial consumption using next(iter(response))

Validates that cleanup occurs even with incomplete consumption

Ensures both spans and metrics are properly recorded

Verifies at least one event was captured from the consumed chunk

1752-1787: Excellent memory leak prevention test using advanced Python techniques.

This test demonstrates sophisticated testing practices:

Uses weakref to verify proper garbage collection

Validates that cleanup mechanisms don't create memory leaks

Ensures spans are properly closed even without stream consumption

Critical for long-running applications that create many streams

The implementation shows deep understanding of Python memory management and is essential for validating the robustness of the cleanup logic.

cursor

Bugbot free trial expires on July 29, 2025
Learn more in the Cursor dashboard.

cursor · 2025-07-28T11:25:24Z

...elemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py

+    def __del__(self):
+        """Cleanup when object is garbage collected"""
+        if hasattr(self, '_cleanup_completed') and not self._cleanup_completed:
+            self._ensure_cleanup()


Bug: Destructor Calls Risky Cleanup Methods

The ChatStream.__del__ method attempts complex cleanup by calling _ensure_cleanup(). Due to the unpredictable order of object destruction during garbage collection and interpreter shutdown, this can lead to AttributeError or other exceptions when accessing instance attributes (e.g., self._span, self._cleanup_lock, logger) that may have already been deallocated. This can result in incomplete resource cleanup or warnings, even with the @dont_throw decorator.

Locations (1)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py#L641-L645

cursor · 2025-07-28T11:25:25Z

...elemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py

+                # Handle cleanup for other exceptions during stream iteration
+                self._ensure_cleanup()
+                if self._span and self._span.is_recording():
+                    self._span.set_status(Status(StatusCode.ERROR, str(e)))


Bug: Exception Handling Fails to Set Span Status

When an exception occurs during streaming iteration in ChatStream's __next__ and __anext__ methods, the span status is incorrectly set. The _ensure_cleanup() method is called first, which ends the span with an OK status. Subsequent attempts to set the span status to ERROR are ineffective because the span is already ended and cannot be modified, leading to an OK status despite the exception. The error status must be set before cleanup.

Locations (1)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py#L683-L689

fix(openai): support for openai non-consumed streams

a43ed5b

ellipsis-dev bot reviewed Jul 20, 2025

View reviewed changes

coderabbitai bot reviewed Jul 20, 2025

View reviewed changes

...elemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py Outdated Show resolved Hide resolved

minimAluminiumalism added 2 commits July 20, 2025 19:53

Merge branch 'main' of https://github.com/traceloop/openllmetry into …

2e23016

…openai/stream

fix(openai): support for openai non-consumed streams

8ef689b

coderabbitai bot reviewed Jul 20, 2025

View reviewed changes

fix(openai): support for openai non-consumed streams

9545371

coderabbitai bot reviewed Jul 20, 2025

View reviewed changes

fix(openai): support for openai non-consumed streams

ae0f0ce

coderabbitai bot reviewed Jul 20, 2025

View reviewed changes

nirga approved these changes Jul 28, 2025

View reviewed changes

Merge branch 'main' into openai/stream

de937b5

cursor bot reviewed Jul 28, 2025

View reviewed changes

nirga merged commit efa8b0d into traceloop:main Jul 28, 2025
11 checks passed

This was referenced Aug 11, 2025

fix(openai): dynamically import types for 1.99 #3244

Merged

fix(openai): propagate span IDs properly to events #3243

Merged

nina-kollman pushed a commit that referenced this pull request Aug 11, 2025

fix(openai): support for openai non-consumed streams (#3155)

239211f

This was referenced Aug 17, 2025

fix(openai): improve LiteLLM LegacyAPIResponse handling for streaming #3290

Open

feat(instrumentation): updated GenAI attributes to use OTel's #3138

Merged

coderabbitai bot mentioned this pull request Oct 10, 2025

fix(openai): make streaming telemetry monotonic + privacy-safe #3405

Open

coderabbitai bot mentioned this pull request Nov 10, 2025

fix(openai): add streaming support for responses.create() api #3437

Merged

fix(openai): support for openai non-consumed streams #3155

fix(openai): support for openai non-consumed streams #3155

Uh oh!

Conversation

minimAluminiumalism commented Jul 20, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Unsupported Metrics

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Jul 28, 2025

Choose a reason for hiding this comment

Bug: Destructor Calls Risky Cleanup Methods

Uh oh!

cursor bot Jul 28, 2025

Choose a reason for hiding this comment

Bug: Exception Handling Fails to Set Span Status

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

minimAluminiumalism commented Jul 20, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 20, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)