-
Notifications
You must be signed in to change notification settings - Fork 868
feat(openai): add reasoning attributes #3336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds reasoning metadata capture to OpenAI instrumentation: records request reasoning effort/summary and response reasoning effort, derives reasoning token counts from varied usage shapes, extends TracedData with reasoning fields, adds is_reasoning_supported() version check, and adds tests and VCR cassettes for chat, Azure, and responses APIs. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor App
participant Wrapper as Chat/Responses Wrapper
participant Tracer as OTel Tracer
participant OpenAI as OpenAI API
App->>Wrapper: call create(..., reasoning={effort, summary})
activate Wrapper
Wrapper->>Tracer: start_span()
Note right of Tracer #E6F0FF: Span started (new reasoning attrs)
Wrapper->>OpenAI: HTTP request (body includes reasoning_effort/summary)
OpenAI-->>Wrapper: HTTP response (usage / output_tokens_details)
Wrapper->>Wrapper: parse usage (dict- or attr-style tolerant)
Wrapper->>Tracer: set attrs\n- gen_ai.request.reasoning_effort\n- gen_ai.request.reasoning_summary\n- gen_ai.response.reasoning_effort\n- gen_ai.usage.reasoning_tokens
Wrapper->>Tracer: end_span()
deactivate Wrapper
Tracer-->>App: span recorded
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed everything up to 86be4cf in 2 minutes and 4 seconds. Click for details.
- Reviewed
640lines of code in9files - Skipped
0files when reviewing. - Skipped posting
7draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:161
- Draft comment:
In the test_responses_reasoning function, the assertions expect the 'gen_ai.request.reasoning_summary' and 'gen_ai.completion.0.reasoning' attributes to equal an empty tuple when the provided summary is null. Please confirm that an empty tuple is the intended default rather than, for example, an empty string. Adding a brief comment on this choice might help clarify the expected type. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This comment violates several rules. It's asking for confirmation of intended behavior and suggesting documentation, rather than pointing out a clear issue. The test is explicitly verifying the current behavior, so this appears to be the intended behavior. The choice between empty tuple vs empty string is an implementation detail that the test is correctly asserting. Maybe there's a legitimate reason why empty strings would be better than empty tuples for these fields? Maybe this is documenting an important design decision? The test's role is to verify the actual behavior, not question implementation choices. If there was a problem with using tuples, it would be raised in the implementation code review, not in a test that verifies the current behavior. Delete the comment. It's asking for confirmation of intended behavior and suggesting documentation, rather than identifying a clear issue that needs fixing.
2. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:172
- Draft comment:
The test asserts that 'gen_ai.usage.reasoning_tokens' is greater than 0. This works with the current fixture but might be fragile if the underlying response ever omits reasoning tokens. It may be useful to document that the cassette is expected to include reasoning token details, or to parameterize this expectation to match the fixture. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment raises a valid point about test fragility, but it's more of a speculative "what if" scenario. The test is specifically for reasoning functionality (note the @pytest.mark.skipif with is_reasoning_supported()), so if reasoning tokens are missing, that would indicate a real bug. The suggestion to document or parameterize is not clearly actionable without more specific guidance. The comment identifies a real potential issue with test maintenance. Maybe the test should be more flexible in how it verifies reasoning functionality. Since this is a test specifically for reasoning functionality, asserting the presence of reasoning tokens is a valid test case. If tokens are missing, that's a legitimate test failure, not test fragility. Delete the comment. It raises speculative concerns about a valid assertion in a test that's specifically checking reasoning functionality.
3. packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_azure/test_chat_reasoning.yaml:3
- Draft comment:
Typographical note: The string "Count r''s in strawberry" uses double apostrophes (r''s). Confirm if this is intentional or if it should be "Count r's in strawberry". - Reason this comment was not posted:
Comment looked like it was already resolved.
4. packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_azure/test_chat_reasoning.yaml:44
- Draft comment:
Typographical error: There appears to be a stray trailing single quote on this line in the JSON string body. Consider removing it to ensure correct formatting. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% The comment is incorrect. What appears to be a "stray" quote is actually the proper closing quote for a multi-line YAML string. The formatting is valid YAML. The indentation and quote placement is intentional to properly format the response body. This is a common pattern in YAML for handling long string values. Could this quote style be causing issues in some YAML parsers? Could there be a reason why the commenter thought this was incorrect? No, this is standard YAML syntax for multi-line strings, and the file appears to be a VCR cassette which is a common testing pattern. The formatting is correct and intentional. The comment should be deleted as it incorrectly identifies valid YAML syntax as an error.
5. packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_reasoning.yaml:3
- Draft comment:
Typographical note: The text "Count r''s in strawberry" (line 3) may contain an extra apostrophe. If the intended text is "Count r's in strawberry", please update the quotation. - Reason this comment was not posted:
Comment looked like it was already resolved.
6. packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_reasoning.yaml:3
- Draft comment:
Typographical error: The string "Count r''s in strawberry" appears to have an extra apostrophe. Consider changing it to "Count r's in strawberry". - Reason this comment was not posted:
Comment looked like it was already resolved.
7. packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py:15
- Draft comment:
There's a stray closing parenthesis on line 15. Please verify if it's intentional or if it should be removed. - Reason this comment was not posted:
Comment was on unchanged code.
Workflow ID: wflow_Mukm0zhsDRx5VFqV
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 11
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (1)
769-781: Parity: set reasoning token attribute for streaming responses as well.For streaming, we accumulate
usageinto_complete_response, but never setLLM_USAGE_REASONING_TOKENSon the span. Add the same extraction logic used in non-streaming path to avoid missing data for streams._set_response_attributes(self._span, self._complete_response) + # Reasoning usage attributes (streaming): set only if present + usage = self._complete_response.get("usage") + if usage: + tokens_details = ( + usage.get("completion_tokens_details") + if isinstance(usage, dict) + else getattr(usage, "completion_tokens_details", None) + ) + if tokens_details: + rt = ( + tokens_details.get("reasoning_tokens") + if isinstance(tokens_details, dict) + else getattr(tokens_details, "reasoning_tokens", None) + ) + if rt is not None: + _set_span_attribute( + self._span, + SpanAttributes.LLM_USAGE_REASONING_TOKENS, + rt, + )
🧹 Nitpick comments (7)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (1)
24-29: Confirm and document version guard for reasoning supportVerified that OpenAI Python SDK v1.58.0 (released 2024-12-17) introduced the
reasoning_effortparameter (github.com) and that the OpenAI API preview2024-12-01-previewadded theusage.completion_tokens_details.reasoning_tokensfield (autotaker.github.io). The existing>= "1.58.0"check is accurate:
- SDK threshold: v1.58.0 (2024-12-17) – adds
reasoning_effortto all chat methods- API preview: 2024-12-01-preview – introduces
reasoning_effortrequest param andreasoning_tokensin responsesIf possible, prefer runtime feature detection (e.g., testing for
response.usage.completion_tokens_details.reasoning_tokensor presence of theChatCompletionReasoningEffortenum) for production behavior, and reserve the hardcoded version check for unit tests.Please update the docstring on
is_reasoning_supported()to reference:
- GitHub release notes for v1.58.0
- OpenAI Platform changelog entry for 2024-12-01-preview
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_reasoning.yaml (1)
83-90: Consider redacting organization/project identifiers.Headers
openai-organizationandopenai-projectcan be considered sensitive identifiers. Redact to minimize metadata leakage.- openai-organization: - - user-mktczbuqo14ok5zq3zvvus0l + # openai-organization: + # - REDACTED - openai-project: - - proj_HqO8HnKp7rJsjrDN6n3Y0TPc + # openai-project: + # - REDACTEDOptionally also mask
x-request-id.- x-request-id: - - req_fb455d524b7f4775956fba99734cc8d9 + # x-request-id: + # - REDACTEDpackages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (1)
1498-1520: Use semconv constants in assertions and guard absent attributes.Relying on string keys is brittle. Prefer
SpanAttributesconstants and.get()to avoid KeyError when a field is legitimately absent.- assert span.attributes["gen_ai.request.reasoning_effort"] == "low" - assert span.attributes["gen_ai.usage.reasoning_tokens"] > 0 + assert span.attributes.get(SpanAttributes.LLM_REQUEST_REASONING_EFFORT) == "low" + assert span.attributes.get(SpanAttributes.LLM_USAGE_REASONING_TOKENS, 0) > 0packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_azure/test_chat_reasoning.yaml (1)
55-77: Optional: redact correlation IDs and internal deployment metadata.Although not secrets, headers like
apim-request-id,x-request-id,azureml-model-session, andx-ms-deployment-namecreate churn and leak internal topology. Redact to reduce noise and leakage surface.- apim-request-id: - - aebd8320-f701-4e7d-801f-2955f84e3811 - azureml-model-session: - - d004-20250815200304 - x-ms-deployment-name: - - gpt-5-nano - x-request-id: - - 7acf5821-70fa-4fab-b202-ba3700578d08 + # apim-request-id: REDACTED + # azureml-model-session: REDACTED + # x-ms-deployment-name: REDACTED + # x-request-id: REDACTEDAlso add cassette filtering (
filter_headers=[...]) to your test config.packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py (1)
767-789: Solid Azure reasoning coverage; consider aligning model naming for Azure deployments.The assertions look good and validate the new attributes. Minor nit: all other Azure chat tests use the Azure deployment name (
"openllmetry-testing") instead of a model name. For consistency and to reduce the chance of cassette mismatches in future re-recordings, consider using the same deployment alias here.Apply if you want consistency with the rest of this file:
- model="gpt-5-nano", + model="openllmetry-testing",packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (1)
227-245: Avoid redundant f-strings for constant attribute keys.Passing the constant directly is cleaner. Keeping the empty-tuple default since tests assert
(). If you’re open to it later, considerNoneto avoid type ambiguity for string-typed attributes.- _set_span_attribute( - span, - f"{SpanAttributes.LLM_REQUEST_REASONING_SUMMARY}", - traced_response.request_reasoning_summary or (), - ) + _set_span_attribute( + span, + SpanAttributes.LLM_REQUEST_REASONING_SUMMARY, + traced_response.request_reasoning_summary or (), + ) @@ - _set_span_attribute( - span, - f"{SpanAttributes.LLM_REQUEST_REASONING_EFFORT}", - traced_response.request_reasoning_effort or (), - ) + _set_span_attribute( + span, + SpanAttributes.LLM_REQUEST_REASONING_EFFORT, + traced_response.request_reasoning_effort or (), + ) @@ - _set_span_attribute( - span, - f"{SpanAttributes.LLM_RESPONSE_REASONING_EFFORT}", - traced_response.response_reasoning_effort or (), - ) + _set_span_attribute( + span, + SpanAttributes.LLM_RESPONSE_REASONING_EFFORT, + traced_response.response_reasoning_effort or (), + )packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py (1)
155-179: Great end-to-end assertions for reasoning attributes; optional extra checks.Consider also asserting
gen_ai.response.idpresence to mirror other tests, but current coverage already validates the essential attributes.For example:
assert "gen_ai.response.id" in span.attributes
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (9)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py(2 hunks)packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py(2 hunks)packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py(6 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_azure/test_chat_reasoning.yaml(1 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_reasoning.yaml(1 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_reasoning.yaml(1 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py(2 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py(2 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/cassettes/**/*.{yaml,yml,json}
📄 CodeRabbit inference engine (CLAUDE.md)
Never commit secrets or PII in VCR cassettes; scrub sensitive data
Files:
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_reasoning.yamlpackages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_azure/test_chat_reasoning.yamlpackages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_reasoning.yaml
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.pypackages/opentelemetry-instrumentation-openai/tests/traces/test_azure.pypackages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.pypackages/opentelemetry-instrumentation-openai/tests/traces/test_responses.pypackages/opentelemetry-instrumentation-openai/tests/traces/test_chat.pypackages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
🪛 Ruff (0.12.2)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
211-211: Local variable e is assigned to but never used
Remove assignment to unused variable e
(F841)
🪛 Flake8 (7.2.0)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
[error] 211-211: local variable 'e' is assigned to but never used
(F841)
[error] 466-466: continuation line unaligned for hanging indent
(E131)
[error] 523-523: continuation line unaligned for hanging indent
(E131)
[error] 525-525: continuation line unaligned for hanging indent
(E131)
[error] 580-580: continuation line unaligned for hanging indent
(E131)
[error] 638-638: continuation line unaligned for hanging indent
(E131)
[error] 640-640: continuation line unaligned for hanging indent
(E131)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Build Packages (3.11)
- GitHub Check: Lint
🔇 Additional comments (5)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (1)
22-22: Good: robust version comparison usingpackaging.version.Switching from string comparison to
packaging.version.parseavoids subtle ordering bugs (e.g., 1.10 vs 1.9). LGTM.packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (1)
18-18: Import looks good.Using
is_reasoning_supportedto gate reasoning tests avoids false failures on older client versions. Nice.packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py (1)
4-4: Good guard: version-gated reasoning tests.Importing and using
is_reasoning_supported()to gate tests prevents false failures on older SDKs.packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (1)
135-139: TracedData: new reasoning fields look correct.Optional strings with
Nonedefaults match how the rest of the model is structured. No concerns here.packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py (1)
6-6: Good use of version gating for new reasoning tests.Keeps the suite green across OpenAI client versions.
...elemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py
Show resolved
Hide resolved
...elemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py
Show resolved
Hide resolved
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py
Show resolved
Hide resolved
...lemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
Show resolved
Hide resolved
...lemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
Outdated
Show resolved
Hide resolved
...lemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
Outdated
Show resolved
Hide resolved
...lemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
Outdated
Show resolved
Hide resolved
...entelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_reasoning.yaml
Outdated
Show resolved
Hide resolved
...y-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_reasoning.yaml
Outdated
Show resolved
Hide resolved
...y-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_reasoning.yaml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (3)
193-205: Handle dict vs object usage consistently for input/output/total tokens and cached input tokens (not just reasoning).Current code assumes attribute access for usage.input_tokens, usage.output_tokens, etc., which breaks when usage is a dict (older SDK path). The try/except around the reasoning details was already fixed; mirror that approach for all usage fields to avoid silent drops under @dont_throw.
Apply this diff:
- if usage := traced_response.usage: - _set_span_attribute(span, GEN_AI_USAGE_INPUT_TOKENS, usage.input_tokens) - _set_span_attribute(span, GEN_AI_USAGE_OUTPUT_TOKENS, usage.output_tokens) - _set_span_attribute( - span, SpanAttributes.LLM_USAGE_TOTAL_TOKENS, usage.total_tokens - ) - if usage.input_tokens_details: - _set_span_attribute( - span, - SpanAttributes.LLM_USAGE_CACHE_READ_INPUT_TOKENS, - usage.input_tokens_details.cached_tokens, - ) + if usage := traced_response.usage: + # Support both dict-style and object-style `usage` + input_tokens = ( + usage.get("input_tokens") if isinstance(usage, dict) + else getattr(usage, "input_tokens", None) + ) + output_tokens = ( + usage.get("output_tokens") if isinstance(usage, dict) + else getattr(usage, "output_tokens", None) + ) + total_tokens = ( + usage.get("total_tokens") if isinstance(usage, dict) + else getattr(usage, "total_tokens", None) + ) + input_tokens_details = ( + usage.get("input_tokens_details") if isinstance(usage, dict) + else getattr(usage, "input_tokens_details", None) + ) + + if input_tokens is not None: + _set_span_attribute(span, GEN_AI_USAGE_INPUT_TOKENS, input_tokens) + if output_tokens is not None: + _set_span_attribute(span, GEN_AI_USAGE_OUTPUT_TOKENS, output_tokens) + if total_tokens is not None: + _set_span_attribute(span, SpanAttributes.LLM_USAGE_TOTAL_TOKENS, total_tokens) + if input_tokens_details: + cached_tokens = ( + input_tokens_details.get("cached_tokens") if isinstance(input_tokens_details, dict) + else getattr(input_tokens_details, "cached_tokens", None) + ) + if cached_tokens is not None: + _set_span_attribute( + span, + SpanAttributes.LLM_USAGE_CACHE_READ_INPUT_TOKENS, + cached_tokens, + ) # Usage - count of reasoning tokens reasoning_tokens = None # Support both dict-style and object-style `usage` tokens_details = ( usage.get("output_tokens_details") if isinstance(usage, dict) else getattr(usage, "output_tokens_details", None) ) if tokens_details: reasoning_tokens = ( tokens_details.get("reasoning_tokens", None) if isinstance(tokens_details, dict) else getattr(tokens_details, "reasoning_tokens", None) ) _set_span_attribute( span, SpanAttributes.LLM_USAGE_REASONING_TOKENS, reasoning_tokens or 0, )Also applies to: 206-225
501-504: Fix None+list concatenation when merging tools.existing_data may carry "tools": None (because previous writes set tools=None when empty). Adding None + list raises TypeError and silently short-circuits under @dont_throw paths.
Apply this diff in both places:
- merged_tools = existing_data.get("tools", []) + request_tools + merged_tools = (existing_data.get("tools") or []) + request_toolsAlso applies to: 629-631
447-449: Harden process_input defaults to avoid None iteration downstream.If existing_data["input"] exists but is None, current .get(..., []) returns None and later iteration fails in set_data_attributes. Prefer explicit fallback to [].
Apply these diffs:
Exception (sync):
- input=process_input( - kwargs.get("input", existing_data.get("input", [])) - ), + input=process_input( + kwargs.get("input", existing_data.get("input") or []) + ),Success (sync):
- input=process_input(existing_data.get("input", kwargs.get("input"))), + input=process_input(existing_data.get("input") or kwargs.get("input") or []),Exception (async):
- input=process_input( - kwargs.get("input", existing_data.get("input", [])) - ), + input=process_input( + kwargs.get("input", existing_data.get("input") or []) + ),Success (async):
- input=process_input(existing_data.get("input", kwargs.get("input"))), + input=process_input(existing_data.get("input") or kwargs.get("input") or []),Also applies to: 577-579, 517-518, 644-645
♻️ Duplicate comments (2)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (2)
206-214: LGTM on the F841 cleanup and explicit type handling for output_tokens_details.
Refactor matches earlier feedback and resolves the unused exception variable while making access explicit.
463-475: Nice E131 cleanup; suggest DRY’ing reasoning kwarg extraction.The switch to implicit parentheses resolves Flake8 E131. To reduce repetition, bind reasoning = kwargs.get("reasoning") or {} once per block and reuse.
Example refactor (apply similarly in all four blocks):
- traced_data = TracedData( + reasoning = kwargs.get("reasoning") or {} + traced_data = TracedData( ... - request_reasoning_summary=( - kwargs.get("reasoning", {}).get( - "summary", existing_data.get("request_reasoning_summary") - ) - ), - request_reasoning_effort=( - kwargs.get("reasoning", {}).get( - "effort", existing_data.get("request_reasoning_effort") - ) - ), - response_reasoning_effort=kwargs.get("reasoning", {}).get("effort"), + request_reasoning_summary=reasoning.get( + "summary", existing_data.get("request_reasoning_summary") + ), + request_reasoning_effort=reasoning.get( + "effort", existing_data.get("request_reasoning_effort") + ), + response_reasoning_effort=reasoning.get("effort"), )Also applies to: 526-538, 589-601, 653-665
🧹 Nitpick comments (3)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (3)
135-139: TracedData additions look good; consider stricter typing for “effort” and avoid sentinel defaults downstream.Fields are appropriate. Optional: use Literal["low","medium","high"] for the two “effort” fields to catch typos at type-check time.
Example change (requires importing Literal):
-from typing import Any, Optional, Union +from typing import Any, Optional, Union, Literal- request_reasoning_effort: Optional[str] = pydantic.Field(default=None) - response_reasoning_effort: Optional[str] = pydantic.Field(default=None) + request_reasoning_effort: Optional[Literal["low","medium","high"]] = pydantic.Field(default=None) + response_reasoning_effort: Optional[Literal["low","medium","high"]] = pydantic.Field(default=None)
357-360: Guard against None output_blocks in error paths.In certain exception flows output_blocks can be None; iterating .values() will raise. Avoid relying on @dont_throw here.
Apply:
- tool_call_index = 0 - for block in traced_response.output_blocks.values(): + tool_call_index = 0 + if traced_response.output_blocks: + for block in traced_response.output_blocks.values():
226-244: Optional: align response_reasoning_effort source with the actual response (if available).Currently mirrored from request kwargs. If OpenAI Responses ever returns a response-side effort, prefer that source for accuracy, falling back to the request setting.
I can add a guarded extraction from parsed_response (e.g., output blocks of type "reasoning" or a top-level field) if you confirm the shape to read from. Want me to open a follow-up PR?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (5)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py(2 hunks)packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py(2 hunks)packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py(6 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_reasoning.yaml(1 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_reasoning.yaml(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_reasoning.yaml
🚧 Files skipped from review as they are similar to previous changes (3)
- packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py
- packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_reasoning.yaml
- packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
🧬 Code graph analysis (1)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
SpanAttributes(64-261)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Build Packages (3.11)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Lint
...lemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
Show resolved
Hide resolved
nirga
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @prane-eth! Great work :)
Solves issue #3257
feat(instrumentation): ...orfix(instrumentation): ....Tested using command:
nx run opentelemetry-instrumentation-openai:test. All 162 test cases got passed.New SemConv variables were added using #3330.
Important
This PR adds reasoning attributes to OpenAI instrumentation, updates relevant functions to handle these attributes, and includes tests to verify the changes.
reasoning_effortandreasoning_summaryto OpenAI instrumentation inchat_wrappers.pyandresponses_wrappers.py._handle_request()and_handle_response()inchat_wrappers.pyto set reasoning attributes on spans.set_data_attributes()inresponses_wrappers.pyto handle reasoning attributes.is_reasoning_supported()inutils.pyto check OpenAI version compatibility.test_chat.py,test_azure.py, andtest_responses.py.test_chat_reasoning.yaml,test_azure_reasoning.yaml, andtest_responses_reasoning.yaml.This description was created by
for 86be4cf. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit
New Features
Tests