fix(openai): add support for realtime api (websockets) #3511

galkleinman · 2025-12-10T12:04:23Z

Important

Adds support for OpenAI Realtime API with WebSocket instrumentation, comprehensive tests, and example usage.

Behavior:
- Adds support for OpenAI Realtime API using WebSockets in realtime_wrappers.py.
- Instruments session lifecycle, response cycles, and tool calls.
- Handles session creation, updates, response creation, text deltas, function calls, and errors.
Tests:
- Adds tests in test_realtime.py for session and response handling, error processing, and span attributes.
- Simulates WebSocket flows with mock events for comprehensive coverage.
Examples:
- Adds openai_realtime_example.py demonstrating single/multi-turn conversations and tool-enabled workflows.

^{This description was created by}^{for 27df93c. You can customize this summary. It will automatically update as commits are pushed.}

Summary by CodeRabbit

New Features
- Added OpenAI Realtime instrumentation to trace session lifecycle, per-response spans, streaming content, usage metrics, and tool-call tracking for WebSocket realtime flows.
Tests
- Added comprehensive tests simulating realtime WebSocket flows: session events, streaming deltas, tool calls, multi-turn conversations, error handling, and span hierarchies.
Documentation
- Added a realtime example demonstrating single/multi-turn conversations, streaming handling, and tool-enabled workflows with tracing.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-10T12:04:30Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds OpenAI Realtime instrumentation: a new realtime_wrappers module implements session/response/span lifecycle over WebSocket events, registers a connect wrapper during instrumentation, and includes tests plus a sample realtime example script.

Changes

Cohort / File(s)	Change Summary
Instrumentation registration `packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py`	Import and register `realtime_connect_wrapper` to wrap `Realtime.connect` and `AsyncRealtime.connect` during instrumentation; unwrap on uninstrument.
Realtime instrumentation module `packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py`	New module adding RealtimeSessionState, RealtimeEventProcessor, multiple wrapper classes (connection, session, conversation, item, response, connection manager), async event processing, span lifecycle and attribute management, error handling, and `realtime_connect_wrapper`.
Tests `packages/opentelemetry-instrumentation-openai/tests/traces/test_realtime.py`	New comprehensive async unit tests with mocks to simulate realtime event streams; validate span creation, hierarchy, attributes, multi-turn flows, tool-call handling, error paths, and wrapper behavior.
Example app `packages/sample-app/sample_app/openai_realtime_example.py`	New sample script demonstrating three realtime workflows (text, multi-turn, tools) using an async OpenAI Realtime client and streamed handling with tracing enabled.

Sequence Diagram(s)

sequenceDiagram
    participant Client as User Client
    participant Wrapper as realtime_connect_wrapper
    participant ConnMgr as RealtimeConnectionManagerWrapper
    participant Session as RealtimeSessionWrapper
    participant Events as Realtime Event Stream
    participant Tracer as OpenTelemetry Tracer

    Client->>Wrapper: connect()
    Wrapper->>ConnMgr: return wrapped connection manager
    ConnMgr->>Tracer: start openai.session span
    ConnMgr->>Session: provide wrapped session

    Client->>Session: send conversation.item / response.create (outgoing)
    Session->>Tracer: start openai.realtime (response) span
    Events->>Session: response.* events (text.delta, audio.delta, function_call, done, error)
    Session->>Tracer: accumulate content, annotate spans, record tool calls
    Events->>Session: response.done
    Session->>Tracer: end response span (status, usage, content)

    Client->>ConnMgr: close()
    ConnMgr->>Tracer: end openai.session span

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Areas needing extra attention:
- realtime_wrappers.py — async event processing, span lifecycle, context propagation, cleanup on errors/close.
- test_realtime.py — validity of async mocks and span relationship assertions.
- __init__.py — wrapping/unwrapping safety to prevent double-wraps or leaks.
- openai_realtime_example.py — correctness of example flows and environment usage.

Poem

🐰 I hopped into the realtime stream tonight,
Spans like carrots glowing in the light.
Sessions stitched and responses softly spun,
Tool calls counted, every delta won—
Trace the hop, the story’s bright and light! 🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 71.91% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly describes the main change: adding support for OpenAI's Realtime API with WebSockets. It accurately reflects the primary objective of the pull request.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch gk/openai-realtime-support

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ellipsis-dev

Caution

Changes requested ❌

Reviewed everything up to b80fdbb in 2 minutes and 32 seconds. Click for details.

Reviewed 2099 lines of code in 4 files
Skipped 0 files when reviewing.
Skipped posting 3 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py:345

Draft comment:
Consider adding logging in error handlers (e.g. _handle_error_event and _handle_error) for better traceability of failures.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% This comment suggests adding logging as a "nice to have" for better traceability. However, this is instrumentation code that already captures errors through OpenTelemetry spans (via record_exception() and error status). The comment doesn't point to a bug or missing functionality - it's a speculative suggestion about adding additional logging on top of the existing telemetry. The rules state we should not make comments that are purely informative or suggest optional improvements without clear necessity. This falls into the category of "consider doing X" which is not actionable enough - it doesn't say logging is required or that there's a problem, just that it might be nice to have. The comment could be valid if there's a specific debugging scenario where logging would help beyond what spans provide. Perhaps during development or when spans aren't being exported, logging could be useful. However, without evidence that this is actually needed, it's speculative. While logging might be helpful in some scenarios, this is an instrumentation library where the primary observability mechanism is OpenTelemetry spans, which already capture the errors. The comment uses "Consider" which makes it optional/speculative rather than identifying a clear issue. Without strong evidence that logging is needed here, this falls under "nice to have" suggestions that should be removed per the rules. This comment should be deleted. It's a speculative suggestion ("Consider adding...") for an optional improvement rather than identifying a clear issue. The code already captures errors through OpenTelemetry spans, which is the appropriate mechanism for instrumentation code.

2. packages/sample-app/sample_app/openai_realtime_example.py:41

Draft comment:
The sample app code cleanly demonstrates realtime conversation flows; ensure production implementations add appropriate error handling and retries.
Reason this comment was not posted:
Confidence changes required: 30% <= threshold 50% None

3. packages/opentelemetry-instrumentation-openai/tests/traces/test_realtime.py:60

Draft comment:
The tests cover various realtime flows comprehensively. Consider adding tests for edge cases (e.g. partial events or unexpected event types) if not already covered.
Reason this comment was not posted:
Confidence changes required: 40% <= threshold 50% None

Workflow ID: wflow_sHzA1OILQvYikK3j

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev · 2025-12-10T21:14:09Z

...elemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py

+            self._state.response_span.end()
+            self._reset_response_state()
+
+    def _start_response_span(self):


Consider refactoring duplicate logic for starting response spans (_start_response_span) across different wrappers (e.g., in RealtimeConnectionWrapper and RealtimeResponseWrapper) to reduce maintenance duplication.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

packages/opentelemetry-instrumentation-openai/tests/traces/test_realtime.py (1)
402-413: Consider using direct import for better readability.

The dynamic __import__ can be replaced with a regular import.
     @pytest.fixture
     def tracer_provider_and_exporter(self):
         """Create a tracer provider with in-memory exporter."""
+        from opentelemetry.sdk.trace.export import SimpleSpanProcessor
+
         exporter = InMemorySpanExporter()
         provider = TracerProvider()
         provider.add_span_processor(
-            __import__(
-                "opentelemetry.sdk.trace.export",
-                fromlist=["SimpleSpanProcessor"]
-            ).SimpleSpanProcessor(exporter)
+            SimpleSpanProcessor(exporter)
         )
         return provider, exporter
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py (1)

241-342: Consider extracting shared response handling logic.

The _handle_response_done method is largely duplicated between RealtimeConnectionWrapper and RealtimeEventIterator. While code duplication is acceptable in this package based on learnings, extracting the common logic into a helper function in RealtimeSessionState could improve maintainability.

Also applies to: 568-666

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 5da12d8 and b80fdbb.

📒 Files selected for processing (4)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py (3 hunks)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py (1 hunks)
packages/opentelemetry-instrumentation-openai/tests/traces/test_realtime.py (1 hunks)
packages/sample-app/sample_app/openai_realtime_example.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_realtime.py
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py
packages/sample-app/sample_app/openai_realtime_example.py

🧠 Learnings (1)

📚 Learning: 2025-12-02T21:09:48.690Z

Learnt from: duanyutong
Repo: traceloop/openllmetry PR: 3487
File: packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py:177-178
Timestamp: 2025-12-02T21:09:48.690Z
Learning: The opentelemetry-instrumentation-openai and opentelemetry-instrumentation-openai-agents packages must remain independent and not share code, so code duplication between them is acceptable.

Applied to files:

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py

🧬 Code graph analysis (3)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py (1)

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)

SpanAttributes (64-245)

packages/opentelemetry-instrumentation-openai/tests/traces/test_realtime.py (3)

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)

SpanAttributes (64-245)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py (19)

RealtimeSessionState (44-59)

RealtimeConnectionManagerWrapper (861-944)

RealtimeEventIterator (455-686)

session (80-82)

_process_event (137-158)

_process_event (472-499)

response (90-92)

realtime_connect_wrapper (948-959)

RealtimeResponseWrapper (770-858)

create (753-767)

create (780-785)

RealtimeConnectionWrapper (62-452)

RealtimeSessionWrapper (689-724)

update (699-724)

conversation (85-87)

item (738-740)

close (113-118)

RealtimeConversationItemWrapper (743-767)

cancel (787-800)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py (1)

OpenAIV1Instrumentor (52-374)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py (1)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py (1)

realtime_connect_wrapper (948-959)

🪛 Flake8 (7.3.0)

packages/sample-app/sample_app/openai_realtime_example.py

[error] 85-85: f-string is missing placeholders

(F541)

🪛 Ruff (0.14.8)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py

944-944: return inside finally blocks cause exceptions to be silenced

(B012)

948-948: Unused function argument: instance

(ARG001)

packages/opentelemetry-instrumentation-openai/tests/traces/test_realtime.py

672-672: Abstract raise to an inner function

(TRY301)

672-672: Avoid specifying long messages outside the exception class

(TRY003)

packages/sample-app/sample_app/openai_realtime_example.py

85-85: f-string without any placeholders

Remove extraneous f prefix

(F541)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Test Packages (3.12)
GitHub Check: Test Packages (3.10)
GitHub Check: Test Packages (3.11)
GitHub Check: Build Packages (3.11)
GitHub Check: Lint

🔇 Additional comments (12)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py (3)

36-38: LGTM!

The import follows the existing pattern in the file and is appropriately placed with other wrapper imports.

371-372: LGTM!

The unwrap calls are correctly placed inside the try-except block alongside other beta API unwraps, ensuring graceful handling when the module is not available.

335-345: LGTM!

Using _try_wrap is the correct approach for beta APIs that may not be available in all OpenAI SDK versions. The module path openai.resources.beta.realtime.realtime matches the OpenAI SDK structure, and the pattern is consistent with other beta API wrappers in this file. The wrapper gracefully handles cases where the Realtime API classes are unavailable by catching AttributeError and ModuleNotFoundError.

packages/sample-app/sample_app/openai_realtime_example.py (2)

27-27: LGTM!

API key is correctly loaded from an environment variable, following security best practices as per coding guidelines.

30-94: Well-structured demonstration of the Realtime API instrumentation.

The function demonstrates session configuration, message sending, response streaming, and usage tracking. The event loop pattern is clear and educational.

packages/opentelemetry-instrumentation-openai/tests/traces/test_realtime.py (3)

20-55: LGTM!

The mock classes are well-designed and flexible, using **kwargs to allow setting arbitrary attributes needed for different test scenarios.

415-474: Comprehensive test for the full conversation flow.

The test validates span creation, attributes, parent-child relationships, and finish reasons. Good coverage of the happy path scenario.

670-674: Acceptable test pattern for simulating connection errors.

The static analysis hints (TRY301, TRY003) are false positives for test code where the goal is specifically to test error handling behavior.

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py (4)

1-14: Excellent module-level documentation.

The docstring clearly explains the key concepts and the span hierarchy, which helps maintainers understand the instrumentation design.

44-59: LGTM!

Clean state management class with appropriate defaults. The trace_context field is key for establishing proper span parent-child relationships.

344-353: LGTM!

Error handling properly distinguishes between API error events (ending only the response span) and connection errors (affecting both response and session spans). Exceptions are recorded appropriately.

Also applies to: 432-446

947-958: LGTM!

The instance parameter is part of the wrapt wrapper signature and required by the @_with_tracer_wrapper decorator, so the Ruff ARG001 warning is a false positive. The implementation correctly extracts the model and wraps the connection manager.

...elemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py

packages/sample-app/sample_app/openai_realtime_example.py

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 9fb7b2b in 38 seconds. Click for details.

Reviewed 41 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 3 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/sample-app/sample_app/openai_realtime_example.py:84

Draft comment:
Good refactor: removed unnecessary f-string. Using a normal string is cleaner as there's no interpolation.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

2. packages/sample-app/sample_app/openai_realtime_example.py:140

Draft comment:
Formatting improvement: converting inline array to multi-line improves readability.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

3. packages/sample-app/sample_app/openai_realtime_example.py:201

Draft comment:
Consistent formatting: multi-line formatting for 'content' in tool message enhances readability.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_fEgG7Cv2cTOAtvXr

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 0c44434 in 1 minute and 6 seconds. Click for details.

Reviewed 34 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 3 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py:358

Draft comment:
Explicitly setting the span status to OK before ending an existing response span clarifies completion. Ensure this behavior is intended even if the span might have had an error status.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

2. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py:803

Draft comment:
Similarly, in the cancel method, marking the response span as OK before ending it ensures consistency. Confirm that cancellation events should be marked with an OK status.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

3. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py:943

Draft comment:
Removing the return in aexit allows exceptions from the underlying connection manager to propagate naturally, which aligns with typical async context manager behavior.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

Workflow ID: wflow_ridHlMSu5uLvH6Oo

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py (1)

344-404: Past review feedback addressed, but code duplication remains.

The past review concern about missing status on span end has been resolved (line 359 now sets OK status). However, _start_response_span logic is duplicated across RealtimeConnectionWrapper (lines 355-404), RealtimeEventIterator (lines 539-568), and RealtimeResponseWrapper (lines 803-851).

Consider extracting the shared span creation logic into a helper method on RealtimeSessionState:

# In RealtimeSessionState class
def start_response_span(self, input_text: str = None):
    """Start a new response span with common initialization."""
    if self.response_span is not None:
        self.response_span.set_status(Status(StatusCode.OK))
        self.response_span.end()
        self.reset_response_state()
    
    self.response_start_time = time.time_ns()
    ctx = self.trace_context or context_api.get_current()
    
    self.response_span = self.tracer.start_span(
        SPAN_NAME_RESPONSE,
        kind=SpanKind.CLIENT,
        start_time=self.response_start_time,
        context=ctx,
    )
    
    if self.response_span.is_recording():
        _set_span_attribute(self.response_span, GenAIAttributes.GEN_AI_SYSTEM, "openai")
        _set_span_attribute(self.response_span, GenAIAttributes.GEN_AI_REQUEST_MODEL, self.model)
        _set_span_attribute(self.response_span, SpanAttributes.LLM_REQUEST_TYPE, "realtime")
        
        if should_send_prompts() and (input_text or self.input_text):
            text = input_text or self.input_text
            _set_span_attribute(self.response_span, f"{GenAIAttributes.GEN_AI_PROMPT}.0.content", text)
            _set_span_attribute(self.response_span, f"{GenAIAttributes.GEN_AI_PROMPT}.0.role", "user")

Then replace all three implementations with calls to self._state.start_response_span().

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9fb7b2b and 0c44434.

📒 Files selected for processing (1)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py

🧬 Code graph analysis (1)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py (2)

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)

SpanAttributes (64-245)

packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/dispatcher_wrapper.py (1)

end (103-111)

🪛 Ruff (0.14.8)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py

951-951: Unused function argument: instance

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Lint
GitHub Check: Test Packages (3.12)
GitHub Check: Test Packages (3.10)
GitHub Check: Test Packages (3.11)
GitHub Check: Build Packages (3.11)

🔇 Additional comments (9)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py (9)

1-42: LGTM!

Module structure, imports, and constants are well-organized and appropriate for the instrumentation purpose.

44-60: LGTM!

State tracking class is well-structured with clear attribute initialization for managing session and response lifecycle.

62-93: LGTM!

Connection wrapper properly delegates attribute access and wraps nested objects to ensure complete instrumentation coverage.

94-135: LGTM!

Event handling, async iteration, and context manager implementation are correct. Proper error propagation in __aexit__ by returning False.

136-240: LGTM!

Event processing logic comprehensively handles incoming and outgoing events with proper error isolation via @dont_throw decorator.

241-343: LGTM!

Response completion handling thoroughly captures all relevant telemetry data including usage metrics, content, and tool calls with proper semantic convention attributes.

690-861: LGTM on session, conversation, and response wrappers!

The wrapper classes properly intercept method calls to track state:

RealtimeSessionWrapper.update captures session config changes

RealtimeConversationItemWrapper.create extracts input text

RealtimeResponseWrapper.create starts response span

RealtimeResponseWrapper.cancel properly ends span with cancellation status and attribute

863-948: Past review feedback addressed correctly.

The previous issue with return inside finally block has been resolved. Line 947 now correctly awaits the underlying __aexit__ without returning its result, allowing exceptions to propagate naturally.

The connection manager wrapper properly:

Creates session span on entry with trace context propagation (lines 886-898)

Handles errors and sets appropriate span status (lines 931-938)

Ensures session span ends in finally block (lines 941-942)

Delegates to underlying manager without suppressing exceptions (line 947)

950-962: LGTM!

Entry point wrapper correctly:

Respects instrumentation suppression flag

Extracts model parameter with sensible default

Returns wrapped connection manager for full instrumentation

The static analysis warning about unused instance parameter is a false positive—this parameter is required by the wrapt wrapper signature used by @_with_tracer_wrapper, even when not directly referenced.

...elemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 27df93c in 2 minutes and 19 seconds. Click for details.

Reviewed 888 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 5 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py:69

Draft comment:
Consider adding a type annotation for the 'event' parameter in process_event for improved clarity.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% Looking at the diff, the process_event method was renamed from _process_event and moved from RealtimeConnectionWrapper to the new RealtimeEventProcessor class. The method signature itself (def process_event(self, event):) appears in the new code at line 69. This is technically a change since the method was refactored and moved. The comment is suggesting a code quality improvement (adding type annotations) which could be considered actionable. However, the rules state that comments should only be made if there is "clearly a code change required" and should not be "obvious or unimportant." Adding type hints is more of a style preference and not a required code change. The code works fine without the type annotation. This seems like a minor code quality suggestion rather than something that must be addressed. While the method was moved/refactored in this diff, the lack of type annotation existed in the original code as well. The comment is about improving code style rather than fixing a bug or addressing a functional issue introduced by the changes. This might be considered too minor or stylistic to warrant a comment. Even though this is a refactoring where the method was moved to a new class, the comment is suggesting a non-critical style improvement. The rules explicitly state not to make comments that are "obvious or unimportant" and to only comment when "there is clearly a code change required." Type annotations, while helpful, are not required for the code to function correctly. This falls into the category of a minor style suggestion rather than a necessary change. This comment should be deleted. It's a minor style suggestion about adding type hints, which is not a required code change. The code functions correctly without the type annotation, and the comment doesn't identify a bug or functional issue with the refactoring.

2. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py:280

Draft comment:
When an active response span exists and end_existing is false, the method returns silently. Ensure this behavior is documented and intended.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the author to ensure that the behavior is documented and intended, which violates the rule against asking the author to confirm their intention or ensure behavior is intended. The comment does not provide a specific code suggestion or ask for a test to be written, which would have been acceptable.

3. packages/opentelemetry-instrumentation-openai/tests/traces/test_realtime.py:117

Draft comment:
Tests directly invoke the protected _process_event method. Consider using the public API to reduce coupling to internal implementation details.
Reason this comment was not posted:
Comment was on unchanged code.

4. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py:452

Draft comment:
Use of dependency injection for passing the RealtimeEventProcessor (e.g., into RealtimeResponseWrapper and RealtimeEventIterator) improves testability. Ensure all error paths consistently call reset_response_state.
Reason this comment was not posted:
Comment was on unchanged code.

5. packages/opentelemetry-instrumentation-openai/tests/traces/test_realtime.py:293

Draft comment:
Consider adding type annotations to test mocks and helper methods for improved maintainability and clarity in tests.
Reason this comment was not posted:
Comment was on unchanged code.

Workflow ID: wflow_iRcc66LVo8oiJTBw

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py (1)
718-730: Silence ARG001 by marking the unused instance parameter as intentional.

Ruff flags instance as unused in realtime_connect_wrapper, but the parameter is required by the _with_tracer_wrapper protocol. You can satisfy the linter without changing behavior by renaming it to _instance:
-@_with_tracer_wrapper
-def realtime_connect_wrapper(tracer: Tracer, wrapped, instance, args, kwargs):
+@_with_tracer_wrapper
+def realtime_connect_wrapper(tracer: Tracer, wrapped, _instance, args, kwargs):
This makes the intent explicit and keeps the wrapper signature compatible with the instrumentation helper.
packages/opentelemetry-instrumentation-openai/tests/traces/test_realtime.py (1)

664-686: Ruff TRY003/TRY301 warnings in tests are safe to ignore or locally silence.

The explicit raise ConnectionError("WebSocket disconnected") is clear and idiomatic for this test. If you want a clean lint run, you can either:

Extract the raise into a small helper used only in the test, or

Add a local # noqa: TRY003,TRY301 comment on that line.

Given this is test code and the message is short, keeping it as-is is also reasonable.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0c44434 and 27df93c.

📒 Files selected for processing (2)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py (1 hunks)
packages/opentelemetry-instrumentation-openai/tests/traces/test_realtime.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/opentelemetry-instrumentation-openai/tests/traces/test_realtime.py
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py

🧠 Learnings (1)

📚 Learning: 2025-12-02T21:09:48.690Z

Learnt from: duanyutong
Repo: traceloop/openllmetry PR: 3487
File: packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py:177-178
Timestamp: 2025-12-02T21:09:48.690Z
Learning: The opentelemetry-instrumentation-openai and opentelemetry-instrumentation-openai-agents packages must remain independent and not share code, so code duplication between them is acceptable.

Applied to files:

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py

🧬 Code graph analysis (1)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py (1)

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)

SpanAttributes (64-245)

🪛 Ruff (0.14.8)

packages/opentelemetry-instrumentation-openai/tests/traces/test_realtime.py

684-684: Abstract raise to an inner function

(TRY301)

684-684: Avoid specifying long messages outside the exception class

(TRY003)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py

719-719: Unused function argument: instance

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Test Packages (3.11)
GitHub Check: Test Packages (3.10)
GitHub Check: Lint
GitHub Check: Test Packages (3.12)
GitHub Check: Build Packages (3.11)

🔇 Additional comments (2)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py (1)

44-171: Good separation of session state and shared event processing.

The RealtimeSessionState + RealtimeEventProcessor split gives a clear single source of truth for session/response state, and both RealtimeConnectionWrapper.recv and RealtimeEventIterator.__anext__ correctly delegate through the same processor, avoiding the duplication noted in earlier iterations. The response lifecycle (response.created → deltas → response.done/error) is coherently handled here and matches the span behavior exercised in the tests.

packages/opentelemetry-instrumentation-openai/tests/traces/test_realtime.py (1)

411-577: End-to-end realtime flow tests provide strong coverage.

The mocked full-flow tests (single-turn, multi-turn, function-call, error paths) do a good job asserting span naming, parent/child relationships, and key attributes, which should catch most regressions in the realtime wrappers’ lifecycle logic.

coderabbitai · 2025-12-10T22:00:02Z

...elemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py

+    def _handle_error(self, error):
+        """Handle connection errors."""
+        if self._state.response_span and self._state.response_span.is_recording():
+            self._state.response_span.set_status(
+                Status(StatusCode.ERROR, str(error))
+            )
+            self._state.response_span.record_exception(error)
+            self._state.response_span.end()
+            self._processor.reset_response_state()
+
+        if self._state.session_span and self._state.session_span.is_recording():
+            self._state.session_span.set_status(
+                Status(StatusCode.ERROR, str(error))
+            )
+            self._state.session_span.record_exception(error)
+
+    def _end_session_span(self):
+        """End the session span."""
+        if self._state.session_span and self._state.session_span.is_recording():
+            self._state.session_span.set_status(Status(StatusCode.OK))
+            self._state.session_span.end()
+


⚠️ Potential issue | 🟠 Major

Preserve error status when ending the session span.

_handle_error sets an ERROR status on self._state.session_span, but _end_session_span unconditionally overwrites it with OK before ending the span. When a connection error occurs followed by connection close (the normal lifecycle via on_error() → _handle_error() then on_close() → _end_session_span()), the error status is masked as success on the session span.

Modify _end_session_span to only set OK status when the current status is UNSET, preserving any earlier error status:

def _end_session_span(self): """End the session span.""" if self._state.session_span and self._state.session_span.is_recording(): + # Preserve any existing non-UNSET status (e.g., ERROR set in _handle_error) + span = self._state.session_span + if not hasattr(span, 'status') or span.status is None or span.status.status_code is StatusCode.UNSET: + span.set_status(Status(StatusCode.OK)) self._state.session_span.end()

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py around lines 451 to 472, _end_session_span currently unconditionally sets the session span status to OK, which overwrites any prior ERROR set in _handle_error; change _end_session_span so that it only sets Status(StatusCode.OK) if the span's current status is UNSET (preserving ERROR or other statuses), then end the span as before while keeping the is_recording check.

coderabbitai · 2025-12-10T22:00:02Z

packages/opentelemetry-instrumentation-openai/tests/traces/test_realtime.py

+        # Verify finish_reason is "stop" for text response without tool calls
+        attrs = dict(response_span.attributes)
+        assert attrs.get("gen_ai.completion.0.finish_reason") == "stop"
+


⚠️ Potential issue | 🟡 Minor

Tests assume prompts/tool metadata are always captured; consider decoupling from should_send_prompts() config.

Assertions on gen_ai.completion.* attributes (finish_reason, role, tool_calls, etc.) rely on should_send_prompts() returning True at runtime. If prompt capture is disabled via configuration or environment, these tests will start failing even though the instrumentation is behaving as configured.

To make the tests robust, consider explicitly forcing prompt capture just for these cases (e.g., monkeypatching should_send_prompts to return True or temporarily adjusting the relevant config), or relaxing the assertions when prompt capture is disabled.

Also applies to: 620-627

🤖 Prompt for AI Agents

In packages/opentelemetry-instrumentation-openai/tests/traces/test_realtime.py around lines 484-487 (and similarly 620-627), the test unconditionally asserts presence of gen_ai.completion.* attributes which breaks when prompt capture is disabled; update the tests to either force prompt capture for these cases (e.g., monkeypatch should_send_prompts() to return True or set the config/env variable enabling prompt capture for the duration of the test) before creating the response_span, or make the assertions conditional by checking should_send_prompts() (or the same config flag) and only asserting gen_ai.completion.* attributes when prompt capture is enabled; ensure you restore the original behavior after the test.

fix(openai): add support for realtime api (websockets)

33c715f

fixes

b80fdbb

galkleinman marked this pull request as ready for review December 10, 2025 21:11

ellipsis-dev bot reviewed Dec 10, 2025

View reviewed changes

lint

9fb7b2b

coderabbitai bot reviewed Dec 10, 2025

View reviewed changes

ellipsis-dev bot reviewed Dec 10, 2025

View reviewed changes

fix pr comments

0c44434

ellipsis-dev bot reviewed Dec 10, 2025

View reviewed changes

coderabbitai bot reviewed Dec 10, 2025

View reviewed changes

...elemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py Show resolved Hide resolved

DRY

27df93c

ellipsis-dev bot reviewed Dec 10, 2025

View reviewed changes

coderabbitai bot reviewed Dec 10, 2025

View reviewed changes

nirga approved these changes Dec 11, 2025

View reviewed changes

galkleinman merged commit 9aba92d into main Dec 11, 2025
12 checks passed

fix(openai): add support for realtime api (websockets) #3511

fix(openai): add support for realtime api (websockets) #3511

Uh oh!

Conversation

galkleinman commented Dec 10, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

galkleinman commented Dec 10, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 10, 2025 •

edited

Loading