-
Notifications
You must be signed in to change notification settings - Fork 868
fix(openai): support for openai non-consumed streams #3155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe changes introduce robust, thread-safe cleanup and error handling to the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant ChatStream
participant Span
participant Metrics
User->>ChatStream: Create streaming chat completion
ChatStream->>Span: Start span
alt User consumes stream fully
loop For each chunk
User->>ChatStream: __next__/__anext__
ChatStream->>User: Return chunk
end
ChatStream->>Metrics: Record full metrics
ChatStream->>Span: End span
else User consumes stream partially or not at all
alt Exception during consumption
ChatStream->>Metrics: Record partial metrics
ChatStream->>Span: Set error, end span
else Stream object deleted/GC'd
ChatStream->>Metrics: Record partial metrics
ChatStream->>Span: End span
end
end
Poem
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed everything up to a43ed5b in 2 minutes and 1 seconds. Click for details.
- Reviewed
1590lines of code in6files - Skipped
1files when reviewing. - Skipped posting
6draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py:1322
- Draft comment:
In the helper function assert_message_in_logs, consider adding a docstring and a type check to ensure that log_record.body is a dict or is safely convertible via dict(), to guard against unexpected types. - Reason this comment was not posted:
Comment was on unchanged code.
2. packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py:1557
- Draft comment:
The test 'test_chat_streaming_not_consumed' uses gc.collect() to force cleanup. Since GC behavior can be non-deterministic, consider adding retries or additional checks to ensure the stream object is cleaned up reliably in different environments. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% While the comment raises a valid concern about GC non-determinism, the test appears to be working as intended - it's a unit test that verifies basic cleanup behavior. Adding retries would make the test more complex without clear benefit. The current approach of forcing GC and checking the result is a reasonable testing pattern. The comment identifies a real issue with GC non-determinism that could theoretically cause flaky tests. Maybe there are real-world scenarios where this matters. The test is focused on verifying basic cleanup works, not providing production-grade reliability. The current simple approach is appropriate for a unit test. If GC issues occur in practice, they can be addressed then. Delete the comment. While it raises a valid concern, the current simple testing approach is appropriate and adding complexity isn't clearly beneficial.
3. packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py:56
- Draft comment:
Several tests assert hard-coded response IDs (e.g. 'chatcmpl-908MD9ivBBLb6EaIjlqwFokntayQK'). Hard-coding such values might make tests brittle if the cassette recordings change. Consider parameterizing the expected value or validating its format instead. - Reason this comment was not posted:
Comment was on unchanged code.
4. packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py:1069
- Draft comment:
In test_chat_context_propagation, the trace context verification on the HTTP request could be expanded to check multiple header fields. This would ensure more robust validation of context propagation. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
5. packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py:955
- Draft comment:
Some tests (e.g., test_with_asyncio_run) use 'asyncio.run' while others are decorated with @pytest.mark.asyncio. For consistent async test behavior and to avoid potential event loop issues, consider using pytest-asyncio uniformly. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
6. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py:536
- Draft comment:
Typographical suggestion: In the comment, "that is used by most of the other model" might be intended to say "that is used by most of the other models". - Reason this comment was not posted:
Comment was on unchanged code.
Workflow ID: wflow_oyoJzgnQCDnw81IJ
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (1)
649-650: Add cleanup to async exit method for consistency.The async
__aexit__method should also ensure cleanup, similar to the synchronous__exit__method.async def __aexit__(self, exc_type, exc_val, exc_tb): + self._ensure_cleanup() await self.__wrapped__.__aexit__(exc_type, exc_val, exc_tb)
🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (1)
1711-1711: Rename unused loop variableThe loop variable
chunkis not used within the loop body.- for chunk in response: + for _ in response:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
packages/opentelemetry-instrumentation-openai/poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (6)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py(15 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_streaming_exception_during_consumption.yaml(1 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_streaming_memory_leak_prevention.yaml(1 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_streaming_not_consumed.yaml(1 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_streaming_partial_consumption.yaml(1 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py(31 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (3)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (2)
SpanAttributes(36-229)EventAttributes(239-258)packages/opentelemetry-instrumentation-openai/tests/traces/test_assistant.py (1)
assert_message_in_logs(1221-1232)packages/opentelemetry-instrumentation-openai/tests/conftest.py (2)
instrument_legacy(120-136)openai_client(42-43)
🪛 Ruff (0.12.2)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py
644-644: return inside finally blocks cause exceptions to be silenced
(B012)
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py
1711-1711: Loop control variable chunk not used within loop body
Rename unused chunk to _chunk
(B007)
🔇 Additional comments (14)
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_streaming_memory_leak_prevention.yaml (1)
1-189: LGTM! Test cassette for streaming response scenarios.This cassette properly captures a streaming chat completion interaction for testing memory leak prevention scenarios.
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_streaming_exception_during_consumption.yaml (1)
1-189: LGTM! Test cassette for exception handling scenarios.This cassette captures the same streaming interaction for testing exception handling during stream consumption.
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_streaming_not_consumed.yaml (1)
1-189: LGTM! Test cassette for non-consumed stream scenarios.This cassette supports testing the main PR objective - ensuring proper instrumentation when streams are not consumed.
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_streaming_partial_consumption.yaml (1)
1-189: LGTM! Test cassette for partial stream consumption scenarios.This cassette enables testing proper cleanup when streams are only partially consumed.
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (6)
4-4: LGTM! Good addition for thread safety.Adding the threading module is necessary for the thread-safe cleanup implementation.
628-636: Excellent cleanup mechanism implementation!The addition of cleanup state tracking with thread lock and the
__del__destructor ensures proper resource cleanup even when streams are garbage collected without being fully consumed. This directly addresses the PR's objective.
658-672: Great exception handling for stream iteration!The enhanced exception handling in both
__next__and__anext__ensures proper cleanup and span status setting when exceptions occur during stream iteration. This is crucial for the PR's objective of handling non-consumed streams.Also applies to: 674-688
760-797: Robust and well-designed cleanup method!The
_ensure_cleanupmethod is excellently implemented with:
- Thread-safe execution using locks
- Idempotent behavior (safe to call multiple times)
- Comprehensive error handling that doesn't mask original exceptions
- Proper logging for debugging
- Fallback mechanisms to ensure cleanup completion
This is the core of the fix for non-consumed streams.
798-829: Smart partial metrics recording!The
_record_partial_metricsmethod ensures that even when streams are not fully consumed, we still capture:
- Duration metrics
- Basic span attributes
- Partial token metrics
- Choice metrics from processed data
This aligns perfectly with the PR description about collecting available metrics for non-consumed streams.
273-274: Good formatting improvements for readability.The formatting changes improve code readability by properly breaking long lines and using consistent indentation.
Also applies to: 425-426, 482-483, 485-486, 538-539, 545-546, 574-575, 691-692, 753-754, 859-860, 930-931, 1049-1050, 1082-1083, 1130-1131
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (4)
27-28: Consistent formatting improvementThe multi-line formatting of the messages list improves readability.
1553-1625: Well-designed test for non-consumed streamsThis test effectively validates the PR's main objective by ensuring that streaming responses are properly instrumented even when not consumed. The use of garbage collection and comprehensive assertions on spans and metrics provides good coverage.
1627-1697: Good coverage for partial stream consumptionThis test appropriately covers the edge case where a stream is only partially consumed. The assertion that at least one event is recorded (line 1660) correctly validates that partial consumption is tracked.
1739-1776: Excellent memory leak prevention testThis test uses
weakrefeffectively to verify that unconsumed stream objects are properly garbage collected, preventing memory leaks. The assertion on line 1764 provides strong evidence that the cleanup mechanism works correctly.
...elemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (1)
1713-1751: Solid test for exception handling during consumption.This test validates proper cleanup when exceptions occur during stream consumption. The logic correctly:
- Simulates realistic failure: Interrupts consumption after processing some chunks
- Handles cleanup properly: Forces cleanup via garbage collection
- Validates span status: Correctly expects OK status since exception was in user code
- Verifies event recording: Confirms events were captured before exception
Minor suggestion: Address the static analysis hint by renaming
chunkto_chunksince the variable isn't used within the loop body.- for chunk in response: + for _chunk in response:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py(16 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py(31 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (3)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (2)
SpanAttributes(36-229)EventAttributes(239-258)packages/opentelemetry-instrumentation-openai/tests/traces/test_assistant.py (1)
assert_message_in_logs(1221-1232)packages/opentelemetry-instrumentation-openai/tests/conftest.py (2)
instrument_legacy(120-136)openai_client(42-43)
🪛 Ruff (0.12.2)
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py
1726-1726: Loop control variable chunk not used within loop body
Rename unused chunk to _chunk
(B007)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Test Packages (3.9)
- GitHub Check: Build Packages (3.11)
- GitHub Check: Lint
🔇 Additional comments (14)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (10)
4-4: LGTM! Threading import properly added.The threading import is correctly positioned and necessary for the thread-safe cleanup mechanism introduced in the ChatStream class.
630-632: Excellent thread-safe cleanup state design.The cleanup state tracking with
_cleanup_completedflag and_cleanup_lockprovides robust protection against duplicate operations and ensures thread safety. The implementation follows best practices for concurrent programming.
634-637: Well-implemented destructor for cleanup safety.The
__del__method provides a crucial safety net for cleanup when objects are garbage collected. Thehasattrcheck is essential to avoid AttributeError during interpreter shutdown when attributes may not be available.
643-657: Excellent fix for the exception handling issue.This implementation properly addresses the previous review comment about the
returnstatement in thefinallyblock. The current approach correctly:
- Captures cleanup exceptions without suppressing the original exception
- Always calls the wrapped
__exit__method- Logs cleanup errors for debugging without affecting the context manager behavior
- Returns the proper result from the wrapped exit
This follows Python best practices and ensures robust exception handling.
677-682: Robust exception handling in iteration methods.The enhanced
__next__and__anext__methods properly handle cleanup for exceptions during stream iteration. The logic correctly distinguishes between normal iteration termination (StopIteration/StopAsyncIteration) and actual errors, ensuring spans are marked with error status only when appropriate.Also applies to: 694-697
704-705: Good formatting improvement.The formatting change improves readability without altering functionality.
771-771: Essential cleanup completion marking.Setting
_cleanup_completed = Trueafter successful response processing prevents duplicate cleanup operations from the destructor or other cleanup mechanisms. This is a critical part of the idempotent cleanup design.
773-810: Excellently designed thread-safe cleanup method.The
_ensure_cleanupmethod demonstrates robust design principles:
- Thread safety: Proper use of locking mechanism
- Idempotency: Checks completion flag to prevent duplicate operations
- Comprehensive error handling: Logs errors without propagating them
- Graceful degradation: Fallback logic ensures cleanup completion even if metric recording fails
- Proper resource management: Sets span status and ends the span correctly
This implementation ensures spans and metrics are finalized regardless of how the stream lifecycle ends.
811-841: Well-implemented partial metrics recording.The
_record_partial_metricsmethod effectively handles recording metrics with whatever data is available at cleanup time. It properly:
- Records duration metrics using start time
- Sets span attributes with available response data
- Records token and choice metrics when data exists
- Uses existing metric recording functions for consistency
This ensures meaningful metrics are captured even for non-consumed streams.
273-274: Good formatting improvements throughout.The formatting changes improve code readability by properly breaking long lines and following Python style guidelines. These enhance maintainability without changing functionality.
Also applies to: 425-426, 482-486, 541-542, 552-553, 576-577, 766-767, 872-873, 943-944, 1062-1063, 1102-1103, 1150-1151
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (4)
27-28: Excellent formatting improvements for readability.The consistent formatting changes throughout the file improve code readability by properly structuring message arrays and function calls. These changes follow Python style guidelines and enhance maintainability.
Also applies to: 41-42, 53-54, 72-73, 93-94, 128-129, 148-149, 288-289, 362-363, 497-498, 571-572, 587-588, 606-607, 621-625, 644-645, 672-676, 714-715, 742-746, 772-773, 791-792, 806-810, 830-831, 858-862, 899-900, 927-931, 1323-1324, 1479-1480, 1497-1498, 1519-1520, 1537-1538
1553-1634: Comprehensive test for non-consumed streaming responses.This test excellently validates the core PR objective - ensuring proper instrumentation of streaming responses even when not consumed. The test:
- Correctly simulates the scenario: Creates stream without consumption
- Triggers cleanup properly: Uses garbage collection to invoke destructor
- Validates span lifecycle: Checks proper closure and OK status
- Verifies metrics recording: Confirms duration metrics are captured
- Checks streaming indicators: Validates proper attribute marking
The assertions are thorough and cover all the essential aspects of the functionality.
1636-1711: Good test coverage for partial stream consumption.This test validates the important scenario where streams are partially consumed then abandoned. The test design is solid:
- Realistic scenario: Consumes first chunk then stops (common in practice)
- Proper cleanup verification: Uses garbage collection to trigger cleanup
- Event validation: Confirms events are recorded for consumed chunks
- Metrics verification: Ensures duration metrics are still captured
This covers an important edge case that could occur in real applications.
1753-1788: Excellent memory leak prevention test.This test demonstrates sophisticated testing practices using weak references to validate proper garbage collection. The test effectively:
- Validates resource cleanup: Uses
weakrefto confirm object collection- Forces garbage collection: Explicitly triggers GC to test cleanup behavior
- Verifies span management: Ensures spans are properly closed even after GC
- Prevents regression: Guards against memory leaks in streaming scenarios
This is exemplary test design for validating resource management and cleanup behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (1)
1712-1750: Good exception handling test with minor improvement neededThis test properly validates exception handling during stream consumption. The logic correctly simulates user code exceptions and verifies proper cleanup.
However, address the static analysis hint about the unused loop variable:
- for chunk in response: + for _chunk in response:This makes it clear that the chunk content isn't used in the loop body.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py(31 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (3)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (2)
SpanAttributes(36-229)EventAttributes(239-258)packages/opentelemetry-instrumentation-openai/tests/traces/test_assistant.py (1)
assert_message_in_logs(1221-1232)packages/opentelemetry-instrumentation-openai/tests/conftest.py (1)
openai_client(42-43)
🪛 Ruff (0.12.2)
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py
1725-1725: Loop control variable chunk not used within loop body
Rename unused chunk to _chunk
(B007)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Lint
- GitHub Check: Test Packages (3.9)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Build Packages (3.11)
🔇 Additional comments (4)
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (4)
27-28: LGTM - Improved code formattingThe formatting changes improve readability by breaking long lines into multiple lines while maintaining the same functionality.
Also applies to: 41-42, 53-54, 72-73, 93-94, 128-129, 148-149, 288-289, 362-363, 497-498, 571-572, 587-588, 606-607, 621-625, 644-645, 672-676, 714-715, 742-746, 772-773, 791-792, 806-810, 830-831, 858-862, 899-900, 927-931, 1323-1324, 1479-1480, 1497-1498, 1519-1520, 1537-1538
1553-1634: Excellent test coverage for non-consumed streamsThis test comprehensively validates the new functionality described in the PR objectives. It properly verifies that:
- Spans are closed even when streams aren't consumed
- Duration metrics are recorded regardless of consumption
- Streaming attributes are properly set
- Cleanup occurs through garbage collection
The test includes thorough assertions for both span attributes and metric data points.
1636-1710: Good coverage for partial stream consumptionThis test properly validates the scenario where a stream is partially consumed. The logic correctly:
- Consumes only the first chunk to simulate partial consumption
- Verifies cleanup still occurs properly
- Validates that events are recorded for consumed chunks
- Ensures metrics are still recorded correctly
This is an important edge case that complements the non-consumed stream test.
1752-1787: Excellent memory leak prevention testThis test demonstrates sophisticated testing practices by:
- Using weak references to validate object lifecycle
- Forcing garbage collection to ensure cleanup occurs
- Verifying span closure happens during object destruction
- Preventing potential memory leaks in streaming scenarios
This provides strong validation of the new cleanup functionality's robustness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (1)
1725-1725: Fix unused loop variable to follow Python conventions.The loop variable
chunkis not used within the loop body. Consider renaming it to indicate it's intentionally unused.- for chunk in response: + for _chunk in response:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py(19 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py(31 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py
🧰 Additional context used
🪛 Ruff (0.12.2)
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py
1725-1725: Loop control variable chunk not used within loop body
Rename unused chunk to _chunk
(B007)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Test Packages (3.9)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Build Packages (3.11)
- GitHub Check: Lint
🔇 Additional comments (4)
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (4)
27-28: LGTM! Formatting improvements enhance readability.The line breaks and formatting changes improve code readability while maintaining functionality.
Also applies to: 41-42, 53-54, 72-73, 93-94, 128-129, 148-149, 288-289, 362-363, 497-498, 571-572, 587-588, 606-607, 621-625, 644-645, 672-675, 714-715, 742-745, 772-773, 791-792, 806-809, 830-831, 858-861, 899-900, 927-930, 1323-1324, 1479-1480, 1497-1498, 1519-1520, 1537-1538
1553-1634: Excellent test coverage for non-consumed streaming responses.This test effectively validates the core PR objective - ensuring proper instrumentation of OpenAI streaming responses even when not consumed. The test design is thorough:
- Properly simulates the real-world scenario of creating but not consuming streams
- Uses garbage collection to trigger cleanup mechanisms
- Comprehensively validates both span attributes and duration metrics
- Includes detailed assertions with descriptive error messages
1636-1710: Well-designed test for partial stream consumption scenarios.This test covers an important edge case where streams are only partially consumed. The implementation correctly:
- Simulates realistic partial consumption using
next(iter(response))- Validates that cleanup occurs even with incomplete consumption
- Ensures both spans and metrics are properly recorded
- Verifies at least one event was captured from the consumed chunk
1752-1787: Excellent memory leak prevention test using advanced Python techniques.This test demonstrates sophisticated testing practices:
- Uses
weakrefto verify proper garbage collection- Validates that cleanup mechanisms don't create memory leaks
- Ensures spans are properly closed even without stream consumption
- Critical for long-running applications that create many streams
The implementation shows deep understanding of Python memory management and is essential for validating the robustness of the cleanup logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bugbot free trial expires on July 29, 2025
Learn more in the Cursor dashboard.
| def __del__(self): | ||
| """Cleanup when object is garbage collected""" | ||
| if hasattr(self, '_cleanup_completed') and not self._cleanup_completed: | ||
| self._ensure_cleanup() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Destructor Calls Risky Cleanup Methods
The ChatStream.__del__ method attempts complex cleanup by calling _ensure_cleanup(). Due to the unpredictable order of object destruction during garbage collection and interpreter shutdown, this can lead to AttributeError or other exceptions when accessing instance attributes (e.g., self._span, self._cleanup_lock, logger) that may have already been deallocated. This can result in incomplete resource cleanup or warnings, even with the @dont_throw decorator.
Locations (1)
| # Handle cleanup for other exceptions during stream iteration | ||
| self._ensure_cleanup() | ||
| if self._span and self._span.is_recording(): | ||
| self._span.set_status(Status(StatusCode.ERROR, str(e))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Exception Handling Fails to Set Span Status
When an exception occurs during streaming iteration in ChatStream's __next__ and __anext__ methods, the span status is incorrectly set. The _ensure_cleanup() method is called first, which ends the span with an OK status. Subsequent attempts to set the span status to ERROR are ineffective because the span is already ended and cannot be modified, leading to an OK status despite the exception. The error status must be set before cleanup.
feat(instrumentation): ...orfix(instrumentation): ....Fix #3151
After the fix, streaming responses are now properly instrumented even when not consumed. Both traces and metrics(
gen_ai.client.operation.duration) are collected.Unsupported Metrics
The following streaming-specific metrics are intentionally not collected for non-consumed streams. These metrics require actual stream consumption to generate accurate data.
gen_ai.server.time_to_first_token- Requires actual receipt of first chunk to set timestampllm.chat_completions.streaming_time_to_generate- Needs time difference between first and last tokensgen_ai.client.generation.choices- Depends on accumulated response data from consumed chunksImportant
Enhance OpenAI instrumentation to support non-consumed streaming responses with proper cleanup and metric recording.
gen_ai.server.time_to_first_token,llm.chat_completions.streaming_time_to_generate, andgen_ai.client.generation.choices.ChatStreamclass inchat_wrappers.pyto handle non-consumed streams._ensure_cleanup()and_record_partial_metrics()methods for safe cleanup and partial metric recording.test_chat.pyfor non-consumed streams, partial consumption, and exception handling during streaming.test_chat_streaming_not_consumed,test_chat_streaming_partial_consumption, andtest_chat_streaming_exception_during_consumption.This description was created by
for a43ed5b. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Style