feat(openai): add reasoning attributes #3336

prane-eth · 2025-08-22T13:43:40Z

Solves issue #3257

I have added tests that cover my changes.
If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
(If applicable) I have updated the documentation accordingly. docs: add reasoning attributes docs#100

Tested using command: nx run opentelemetry-instrumentation-openai:test. All 162 test cases got passed.
New SemConv variables were added using #3330.

Important

This PR adds reasoning attributes to OpenAI instrumentation, updates relevant functions to handle these attributes, and includes tests to verify the changes.

Behavior:
- Adds reasoning attributes reasoning_effort and reasoning_summary to OpenAI instrumentation in chat_wrappers.py and responses_wrappers.py.
- Updates _handle_request() and _handle_response() in chat_wrappers.py to set reasoning attributes on spans.
- Updates set_data_attributes() in responses_wrappers.py to handle reasoning attributes.
Utils:
- Adds is_reasoning_supported() in utils.py to check OpenAI version compatibility.
Tests:
- Adds tests for reasoning attributes in test_chat.py, test_azure.py, and test_responses.py.
- Adds VCR cassettes for reasoning tests in test_chat_reasoning.yaml, test_azure_reasoning.yaml, and test_responses_reasoning.yaml.

^{This description was created by}^{for 86be4cf. You can customize this summary. It will automatically update as commits are pushed.}

Summary by CodeRabbit

New Features
- Tracing now records reasoning metadata (request/response reasoning effort, reasoning summaries, and reasoning token counts) in spans for Chat and Responses APIs, including Azure.
- Adds automatic detection of OpenAI library versions that support reasoning to enable these metrics when available.
Tests
- Added test cassettes and conditional tests for Chat, Responses, and Azure flows to validate reasoning attributes and token counts.

coderabbitai · 2025-08-22T13:43:47Z

Walkthrough

Adds reasoning metadata capture to OpenAI instrumentation: records request reasoning effort/summary and response reasoning effort, derives reasoning token counts from varied usage shapes, extends TracedData with reasoning fields, adds is_reasoning_supported() version check, and adds tests and VCR cassettes for chat, Azure, and responses APIs.

Changes

Cohort / File(s)	Summary
Instrumentation utils/versioning `packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py`	Switches to `packaging.version` for comparisons and adds `is_reasoning_supported()` (OpenAI >= 1.58.0) for gating reasoning-related behavior.
Shared chat wrappers `packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py`	Reads `reasoning_effort` from request kwargs and sets `SpanAttributes.LLM_REQUEST_REASONING_EFFORT`; extracts `reasoning_tokens` from response `usage` supporting both dict- and attribute-style shapes; sets `SpanAttributes.LLM_USAGE_REASONING_TOKENS`.
Responses v1 tracing / data model `packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py`	Adds `TracedData` fields: `request_reasoning_summary`, `request_reasoning_effort`, `response_reasoning_effort`; propagates those fields through wrappers; derives `reasoning_tokens` from `usage.output_tokens_details` (dict/attr tolerant) and sets reasoning-related span attributes.
Tests: chat `packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py`, `.../tests/traces/cassettes/test_chat/test_chat_reasoning.yaml`	Adds version-guarded `test_chat_reasoning` (uses `is_reasoning_supported()`); new VCR cassette for OpenAI chat completions including `reasoning_effort`.
Tests: Azure chat `packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py`, `.../tests/traces/cassettes/test_azure/test_chat_reasoning.yaml`	Adds version-guarded `test_chat_reasoning` for Azure client; new Azure-specific VCR cassette capturing reasoning fields and response usage.
Tests: responses API `packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py`, `.../tests/traces/cassettes/test_responses/test_responses_reasoning.yaml`	Adds version-guarded `test_responses_reasoning`; new cassette with compressed/base64 response body; asserts request/response reasoning attributes and reasoning token counts.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor App
  participant Wrapper as Chat/Responses Wrapper
  participant Tracer as OTel Tracer
  participant OpenAI as OpenAI API

  App->>Wrapper: call create(..., reasoning={effort, summary})
  activate Wrapper
  Wrapper->>Tracer: start_span()
  Note right of Tracer #E6F0FF: Span started (new reasoning attrs)
  Wrapper->>OpenAI: HTTP request (body includes reasoning_effort/summary)
  OpenAI-->>Wrapper: HTTP response (usage / output_tokens_details)
  Wrapper->>Wrapper: parse usage (dict- or attr-style tolerant)
  Wrapper->>Tracer: set attrs\n- gen_ai.request.reasoning_effort\n- gen_ai.request.reasoning_summary\n- gen_ai.response.reasoning_effort\n- gen_ai.usage.reasoning_tokens
  Wrapper->>Tracer: end_span()
  deactivate Wrapper
  Tracer-->>App: span recorded

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat(semantic-conventions-ai): Add reasoning attributes #3330 — Adds/updates the reasoning-related span attributes and semantic-conventions version used here.
fix(openai): propagate span IDs properly to events #3243 — Modifies chat wrapper span lifecycle and context handling in the same codepaths extended for reasoning attributes.
fix(openai): prioritize api-provided token over tiktoken calculation #3142 — Adjusts usage/token extraction in shared chat wrappers; overlaps with dict/attribute-tolerant parsing added here.

Suggested reviewers

nirga
doronkopit5

Poem

Little paws tap keys and trace,
Counting tokens, finding place.
Effort low, the spans now sing,
Summaries and tokens bring.
I hop with joy — metrics in a race! 🥕

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

ellipsis-dev

Important

Looks good to me! 👍

Reviewed everything up to 86be4cf in 2 minutes and 4 seconds. Click for details.

Reviewed 640 lines of code in 9 files
Skipped 0 files when reviewing.
Skipped posting 7 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:161

Draft comment:
In the test_responses_reasoning function, the assertions expect the 'gen_ai.request.reasoning_summary' and 'gen_ai.completion.0.reasoning' attributes to equal an empty tuple when the provided summary is null. Please confirm that an empty tuple is the intended default rather than, for example, an empty string. Adding a brief comment on this choice might help clarify the expected type.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This comment violates several rules. It's asking for confirmation of intended behavior and suggesting documentation, rather than pointing out a clear issue. The test is explicitly verifying the current behavior, so this appears to be the intended behavior. The choice between empty tuple vs empty string is an implementation detail that the test is correctly asserting. Maybe there's a legitimate reason why empty strings would be better than empty tuples for these fields? Maybe this is documenting an important design decision? The test's role is to verify the actual behavior, not question implementation choices. If there was a problem with using tuples, it would be raised in the implementation code review, not in a test that verifies the current behavior. Delete the comment. It's asking for confirmation of intended behavior and suggesting documentation, rather than identifying a clear issue that needs fixing.

2. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:172

Draft comment:
The test asserts that 'gen_ai.usage.reasoning_tokens' is greater than 0. This works with the current fixture but might be fragile if the underlying response ever omits reasoning tokens. It may be useful to document that the cassette is expected to include reasoning token details, or to parameterize this expectation to match the fixture.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment raises a valid point about test fragility, but it's more of a speculative "what if" scenario. The test is specifically for reasoning functionality (note the @pytest.mark.skipif with is_reasoning_supported()), so if reasoning tokens are missing, that would indicate a real bug. The suggestion to document or parameterize is not clearly actionable without more specific guidance. The comment identifies a real potential issue with test maintenance. Maybe the test should be more flexible in how it verifies reasoning functionality. Since this is a test specifically for reasoning functionality, asserting the presence of reasoning tokens is a valid test case. If tokens are missing, that's a legitimate test failure, not test fragility. Delete the comment. It raises speculative concerns about a valid assertion in a test that's specifically checking reasoning functionality.

3. packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_azure/test_chat_reasoning.yaml:3

Draft comment:
Typographical note: The string "Count r''s in strawberry" uses double apostrophes (r''s). Confirm if this is intentional or if it should be "Count r's in strawberry".
Reason this comment was not posted:
Comment looked like it was already resolved.

4. packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_azure/test_chat_reasoning.yaml:44

Draft comment:
Typographical error: There appears to be a stray trailing single quote on this line in the JSON string body. Consider removing it to ensure correct formatting.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% The comment is incorrect. What appears to be a "stray" quote is actually the proper closing quote for a multi-line YAML string. The formatting is valid YAML. The indentation and quote placement is intentional to properly format the response body. This is a common pattern in YAML for handling long string values. Could this quote style be causing issues in some YAML parsers? Could there be a reason why the commenter thought this was incorrect? No, this is standard YAML syntax for multi-line strings, and the file appears to be a VCR cassette which is a common testing pattern. The formatting is correct and intentional. The comment should be deleted as it incorrectly identifies valid YAML syntax as an error.

5. packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_reasoning.yaml:3

Draft comment:
Typographical note: The text "Count r''s in strawberry" (line 3) may contain an extra apostrophe. If the intended text is "Count r's in strawberry", please update the quotation.
Reason this comment was not posted:
Comment looked like it was already resolved.

6. packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_reasoning.yaml:3

Draft comment:
Typographical error: The string "Count r''s in strawberry" appears to have an extra apostrophe. Consider changing it to "Count r's in strawberry".
Reason this comment was not posted:
Comment looked like it was already resolved.

7. packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py:15

Draft comment:
There's a stray closing parenthesis on line 15. Please verify if it's intentional or if it should be removed.
Reason this comment was not posted:
Comment was on unchanged code.

Workflow ID: wflow_Mukm0zhsDRx5VFqV

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (1)

769-781: Parity: set reasoning token attribute for streaming responses as well.

For streaming, we accumulate usage into _complete_response, but never set LLM_USAGE_REASONING_TOKENS on the span. Add the same extraction logic used in non-streaming path to avoid missing data for streams.

         _set_response_attributes(self._span, self._complete_response)
+        # Reasoning usage attributes (streaming): set only if present
+        usage = self._complete_response.get("usage")
+        if usage:
+            tokens_details = (
+                usage.get("completion_tokens_details")
+                if isinstance(usage, dict)
+                else getattr(usage, "completion_tokens_details", None)
+            )
+            if tokens_details:
+                rt = (
+                    tokens_details.get("reasoning_tokens")
+                    if isinstance(tokens_details, dict)
+                    else getattr(tokens_details, "reasoning_tokens", None)
+                )
+                if rt is not None:
+                    _set_span_attribute(
+                        self._span,
+                        SpanAttributes.LLM_USAGE_REASONING_TOKENS,
+                        rt,
+                    )

🧹 Nitpick comments (7)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (1)

24-29: Confirm and document version guard for reasoning support

Verified that OpenAI Python SDK v1.58.0 (released 2024-12-17) introduced the reasoning_effort parameter (github.com) and that the OpenAI API preview 2024-12-01-preview added the usage.completion_tokens_details.reasoning_tokens field (autotaker.github.io). The existing >= "1.58.0" check is accurate:

SDK threshold: v1.58.0 (2024-12-17) – adds reasoning_effort to all chat methods

API preview: 2024-12-01-preview – introduces reasoning_effort request param and reasoning_tokens in responses

If possible, prefer runtime feature detection (e.g., testing for response.usage.completion_tokens_details.reasoning_tokens or presence of the ChatCompletionReasoningEffort enum) for production behavior, and reserve the hardcoded version check for unit tests.

Please update the docstring on is_reasoning_supported() to reference:

GitHub release notes for v1.58.0

OpenAI Platform changelog entry for 2024-12-01-preview
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_reasoning.yaml (1)
83-90: Consider redacting organization/project identifiers.

Headers openai-organization and openai-project can be considered sensitive identifiers. Redact to minimize metadata leakage.
-      openai-organization:
-      - user-mktczbuqo14ok5zq3zvvus0l
+      # openai-organization:
+      # - REDACTED
-      openai-project:
-      - proj_HqO8HnKp7rJsjrDN6n3Y0TPc
+      # openai-project:
+      # - REDACTED
Optionally also mask x-request-id.
-      x-request-id:
-      - req_fb455d524b7f4775956fba99734cc8d9
+      # x-request-id:
+      # - REDACTED
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (1)
1498-1520: Use semconv constants in assertions and guard absent attributes.

Relying on string keys is brittle. Prefer SpanAttributes constants and .get() to avoid KeyError when a field is legitimately absent.
-    assert span.attributes["gen_ai.request.reasoning_effort"] == "low"
-    assert span.attributes["gen_ai.usage.reasoning_tokens"] > 0
+    assert span.attributes.get(SpanAttributes.LLM_REQUEST_REASONING_EFFORT) == "low"
+    assert span.attributes.get(SpanAttributes.LLM_USAGE_REASONING_TOKENS, 0) > 0
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_azure/test_chat_reasoning.yaml (1)
55-77: Optional: redact correlation IDs and internal deployment metadata.

Although not secrets, headers like apim-request-id, x-request-id, azureml-model-session, and x-ms-deployment-name create churn and leak internal topology. Redact to reduce noise and leakage surface.
-      apim-request-id:
-      - aebd8320-f701-4e7d-801f-2955f84e3811
-      azureml-model-session:
-      - d004-20250815200304
-      x-ms-deployment-name:
-      - gpt-5-nano
-      x-request-id:
-      - 7acf5821-70fa-4fab-b202-ba3700578d08
+      # apim-request-id: REDACTED
+      # azureml-model-session: REDACTED
+      # x-ms-deployment-name: REDACTED
+      # x-request-id: REDACTED
Also add cassette filtering (filter_headers=[...]) to your test config.
packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py (1)
767-789: Solid Azure reasoning coverage; consider aligning model naming for Azure deployments.

The assertions look good and validate the new attributes. Minor nit: all other Azure chat tests use the Azure deployment name ("openllmetry-testing") instead of a model name. For consistency and to reduce the chance of cassette mismatches in future re-recordings, consider using the same deployment alias here.

Apply if you want consistency with the rest of this file:
-        model="gpt-5-nano",
+        model="openllmetry-testing",
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (1)
227-245: Avoid redundant f-strings for constant attribute keys.

Passing the constant directly is cleaner. Keeping the empty-tuple default since tests assert (). If you’re open to it later, consider None to avoid type ambiguity for string-typed attributes.
-    _set_span_attribute(
-        span,
-        f"{SpanAttributes.LLM_REQUEST_REASONING_SUMMARY}",
-        traced_response.request_reasoning_summary or (),
-    )
+    _set_span_attribute(
+        span,
+        SpanAttributes.LLM_REQUEST_REASONING_SUMMARY,
+        traced_response.request_reasoning_summary or (),
+    )
@@
-    _set_span_attribute(
-        span,
-        f"{SpanAttributes.LLM_REQUEST_REASONING_EFFORT}",
-        traced_response.request_reasoning_effort or (),
-    )
+    _set_span_attribute(
+        span,
+        SpanAttributes.LLM_REQUEST_REASONING_EFFORT,
+        traced_response.request_reasoning_effort or (),
+    )
@@
-    _set_span_attribute(
-        span,
-        f"{SpanAttributes.LLM_RESPONSE_REASONING_EFFORT}",
-        traced_response.response_reasoning_effort or (),
-    )
+    _set_span_attribute(
+        span,
+        SpanAttributes.LLM_RESPONSE_REASONING_EFFORT,
+        traced_response.response_reasoning_effort or (),
+    )
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py (1)
155-179: Great end-to-end assertions for reasoning attributes; optional extra checks.

Consider also asserting gen_ai.response.id presence to mirror other tests, but current coverage already validates the essential attributes.

For example:
assert "gen_ai.response.id" in span.attributes

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 27aa69c and 86be4cf.

📒 Files selected for processing (9)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (2 hunks)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (2 hunks)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (6 hunks)
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_azure/test_chat_reasoning.yaml (1 hunks)
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_reasoning.yaml (1 hunks)
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_reasoning.yaml (1 hunks)
packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py (2 hunks)
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (2 hunks)
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py (2 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/cassettes/**/*.{yaml,yml,json}

📄 CodeRabbit inference engine (CLAUDE.md)

Never commit secrets or PII in VCR cassettes; scrub sensitive data

Files:

packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_reasoning.yaml
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_azure/test_chat_reasoning.yaml
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_reasoning.yaml

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

🪛 Ruff (0.12.2)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

211-211: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

🪛 Flake8 (7.2.0)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

[error] 211-211: local variable 'e' is assigned to but never used

(F841)

[error] 466-466: continuation line unaligned for hanging indent

(E131)

[error] 523-523: continuation line unaligned for hanging indent

(E131)

[error] 525-525: continuation line unaligned for hanging indent

(E131)

[error] 580-580: continuation line unaligned for hanging indent

(E131)

[error] 638-638: continuation line unaligned for hanging indent

(E131)

[error] 640-640: continuation line unaligned for hanging indent

(E131)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Test Packages (3.12)
GitHub Check: Test Packages (3.11)
GitHub Check: Test Packages (3.10)
GitHub Check: Build Packages (3.11)
GitHub Check: Lint

🔇 Additional comments (5)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (1)

22-22: Good: robust version comparison using packaging.version.

Switching from string comparison to packaging.version.parse avoids subtle ordering bugs (e.g., 1.10 vs 1.9). LGTM.

packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (1)

18-18: Import looks good.

Using is_reasoning_supported to gate reasoning tests avoids false failures on older client versions. Nice.

packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py (1)

4-4: Good guard: version-gated reasoning tests.

Importing and using is_reasoning_supported() to gate tests prevents false failures on older SDKs.

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (1)

135-139: TracedData: new reasoning fields look correct.

Optional strings with None defaults match how the rest of the model is structured. No concerns here.

packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py (1)

6-6: Good use of version gating for new reasoning tests.

Keeps the suite green across OpenAI client versions.

...elemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py

...lemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

...entelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_reasoning.yaml

...y-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_reasoning.yaml

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (3)

193-205: Handle dict vs object usage consistently for input/output/total tokens and cached input tokens (not just reasoning).

Current code assumes attribute access for usage.input_tokens, usage.output_tokens, etc., which breaks when usage is a dict (older SDK path). The try/except around the reasoning details was already fixed; mirror that approach for all usage fields to avoid silent drops under @dont_throw.

Apply this diff:

-    if usage := traced_response.usage:
-        _set_span_attribute(span, GEN_AI_USAGE_INPUT_TOKENS, usage.input_tokens)
-        _set_span_attribute(span, GEN_AI_USAGE_OUTPUT_TOKENS, usage.output_tokens)
-        _set_span_attribute(
-            span, SpanAttributes.LLM_USAGE_TOTAL_TOKENS, usage.total_tokens
-        )
-        if usage.input_tokens_details:
-            _set_span_attribute(
-                span,
-                SpanAttributes.LLM_USAGE_CACHE_READ_INPUT_TOKENS,
-                usage.input_tokens_details.cached_tokens,
-            )
+    if usage := traced_response.usage:
+        # Support both dict-style and object-style `usage`
+        input_tokens = (
+            usage.get("input_tokens") if isinstance(usage, dict)
+            else getattr(usage, "input_tokens", None)
+        )
+        output_tokens = (
+            usage.get("output_tokens") if isinstance(usage, dict)
+            else getattr(usage, "output_tokens", None)
+        )
+        total_tokens = (
+            usage.get("total_tokens") if isinstance(usage, dict)
+            else getattr(usage, "total_tokens", None)
+        )
+        input_tokens_details = (
+            usage.get("input_tokens_details") if isinstance(usage, dict)
+            else getattr(usage, "input_tokens_details", None)
+        )
+
+        if input_tokens is not None:
+            _set_span_attribute(span, GEN_AI_USAGE_INPUT_TOKENS, input_tokens)
+        if output_tokens is not None:
+            _set_span_attribute(span, GEN_AI_USAGE_OUTPUT_TOKENS, output_tokens)
+        if total_tokens is not None:
+            _set_span_attribute(span, SpanAttributes.LLM_USAGE_TOTAL_TOKENS, total_tokens)
+        if input_tokens_details:
+            cached_tokens = (
+                input_tokens_details.get("cached_tokens") if isinstance(input_tokens_details, dict)
+                else getattr(input_tokens_details, "cached_tokens", None)
+            )
+            if cached_tokens is not None:
+                _set_span_attribute(
+                    span,
+                    SpanAttributes.LLM_USAGE_CACHE_READ_INPUT_TOKENS,
+                    cached_tokens,
+                )
 
         # Usage - count of reasoning tokens
         reasoning_tokens = None
         # Support both dict-style and object-style `usage`
         tokens_details = (
             usage.get("output_tokens_details") if isinstance(usage, dict)
             else getattr(usage, "output_tokens_details", None)
         )
 
         if tokens_details:
             reasoning_tokens = (
                 tokens_details.get("reasoning_tokens", None) if isinstance(tokens_details, dict)
                 else getattr(tokens_details, "reasoning_tokens", None)
             )
 
         _set_span_attribute(
             span,
             SpanAttributes.LLM_USAGE_REASONING_TOKENS,
             reasoning_tokens or 0,
         )

Also applies to: 206-225

501-504: Fix None+list concatenation when merging tools.

existing_data may carry "tools": None (because previous writes set tools=None when empty). Adding None + list raises TypeError and silently short-circuits under @dont_throw paths.

Apply this diff in both places:

-    merged_tools = existing_data.get("tools", []) + request_tools
+    merged_tools = (existing_data.get("tools") or []) + request_tools

Also applies to: 629-631

447-449: Harden process_input defaults to avoid None iteration downstream.

If existing_data["input"] exists but is None, current .get(..., []) returns None and later iteration fails in set_data_attributes. Prefer explicit fallback to [].

Apply these diffs:

Exception (sync):

-                input=process_input(
-                    kwargs.get("input", existing_data.get("input", []))
-                ),
+                input=process_input(
+                    kwargs.get("input", existing_data.get("input") or [])
+                ),

Success (sync):

-            input=process_input(existing_data.get("input", kwargs.get("input"))),
+            input=process_input(existing_data.get("input") or kwargs.get("input") or []),

Exception (async):

-                input=process_input(
-                    kwargs.get("input", existing_data.get("input", []))
-                ),
+                input=process_input(
+                    kwargs.get("input", existing_data.get("input") or [])
+                ),

Success (async):

-            input=process_input(existing_data.get("input", kwargs.get("input"))),
+            input=process_input(existing_data.get("input") or kwargs.get("input") or []),

Also applies to: 577-579, 517-518, 644-645

♻️ Duplicate comments (2)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (2)

206-214: LGTM on the F841 cleanup and explicit type handling for output_tokens_details.
Refactor matches earlier feedback and resolves the unused exception variable while making access explicit.

463-475: Nice E131 cleanup; suggest DRY’ing reasoning kwarg extraction.

The switch to implicit parentheses resolves Flake8 E131. To reduce repetition, bind reasoning = kwargs.get("reasoning") or {} once per block and reuse.

Example refactor (apply similarly in all four blocks):

-            traced_data = TracedData(
+            reasoning = kwargs.get("reasoning") or {}
+            traced_data = TracedData(
                 ...
-                request_reasoning_summary=(
-                    kwargs.get("reasoning", {}).get(
-                        "summary", existing_data.get("request_reasoning_summary")
-                    )
-                ),
-                request_reasoning_effort=(
-                    kwargs.get("reasoning", {}).get(
-                        "effort", existing_data.get("request_reasoning_effort")
-                    )
-                ),
-                response_reasoning_effort=kwargs.get("reasoning", {}).get("effort"),
+                request_reasoning_summary=reasoning.get(
+                    "summary", existing_data.get("request_reasoning_summary")
+                ),
+                request_reasoning_effort=reasoning.get(
+                    "effort", existing_data.get("request_reasoning_effort")
+                ),
+                response_reasoning_effort=reasoning.get("effort"),
             )

Also applies to: 526-538, 589-601, 653-665

🧹 Nitpick comments (3)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (3)
135-139: TracedData additions look good; consider stricter typing for “effort” and avoid sentinel defaults downstream.

Fields are appropriate. Optional: use Literal["low","medium","high"] for the two “effort” fields to catch typos at type-check time.

Example change (requires importing Literal):
-from typing import Any, Optional, Union
+from typing import Any, Optional, Union, Literal
-    request_reasoning_effort: Optional[str] = pydantic.Field(default=None)
-    response_reasoning_effort: Optional[str] = pydantic.Field(default=None)
+    request_reasoning_effort: Optional[Literal["low","medium","high"]] = pydantic.Field(default=None)
+    response_reasoning_effort: Optional[Literal["low","medium","high"]] = pydantic.Field(default=None)
357-360: Guard against None output_blocks in error paths.

In certain exception flows output_blocks can be None; iterating .values() will raise. Avoid relying on @dont_throw here.

Apply:
-        tool_call_index = 0
-        for block in traced_response.output_blocks.values():
+        tool_call_index = 0
+        if traced_response.output_blocks:
+            for block in traced_response.output_blocks.values():
226-244: Optional: align response_reasoning_effort source with the actual response (if available).

Currently mirrored from request kwargs. If OpenAI Responses ever returns a response-side effort, prefer that source for accuracy, falling back to the request setting.

I can add a guarded extraction from parsed_response (e.g., output blocks of type "reasoning" or a top-level field) if you confirm the shape to read from. Want me to open a follow-up PR?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 86be4cf and 5a119a3.

📒 Files selected for processing (5)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (2 hunks)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (2 hunks)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (6 hunks)
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_reasoning.yaml (1 hunks)
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_reasoning.yaml (1 hunks)

✅ Files skipped from review due to trivial changes (1)

packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_reasoning.yaml

🚧 Files skipped from review as they are similar to previous changes (3)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_reasoning.yaml
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

🧬 Code graph analysis (1)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (1)

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)

SpanAttributes (64-261)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Test Packages (3.11)
GitHub Check: Test Packages (3.12)
GitHub Check: Build Packages (3.11)
GitHub Check: Test Packages (3.10)
GitHub Check: Lint

...lemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

nirga

Thanks @prane-eth! Great work :)

prane-eth added 2 commits August 20, 2025 15:30

Added OpenAI reasoning attributes

9a1912c

Merge branch 'traceloop:main' into feature/openai-reasoning

86be4cf

ellipsis-dev bot reviewed Aug 22, 2025

View reviewed changes

coderabbitai bot reviewed Aug 22, 2025

View reviewed changes

Resolved AI comments and lint issues

5a119a3

coderabbitai bot reviewed Aug 22, 2025

View reviewed changes

...lemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py Show resolved Hide resolved

Merge branch 'main' into feature/openai-reasoning

d5617ac

nirga changed the title ~~feat(instrumentation): OpenAI - add reasoning attributes~~ feat(openai): add reasoning attributes Aug 23, 2025

nirga approved these changes Aug 23, 2025

View reviewed changes

nirga merged commit 81db657 into traceloop:main Aug 23, 2025
9 checks passed

coderabbitai bot mentioned this pull request Nov 10, 2025

fix(openai): add streaming support for responses.create() api #3437

Merged

This was referenced Nov 21, 2025

fix(openai): record service_tier attribute #3458

Merged

fix(openai): report request attributes in responses API instrumentation #3471

Merged

fix(openai): responses instrumentation broken traces for async streaming #3475

Merged

feat(openai): add reasoning attributes #3336

feat(openai): add reasoning attributes #3336

Conversation

prane-eth commented Aug 22, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nirga left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

prane-eth commented Aug 22, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 22, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)