-
Notifications
You must be signed in to change notification settings - Fork 868
fix(openai): responses instrumentation broken traces for async streaming #3475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughCapture and persist OpenTelemetry trace context in response wrappers by adding a Changes
Sequence Diagram(s)sequenceDiagram
participant App as Application
participant Wrapper as Response Wrapper
participant ContextAPI as OpenTelemetry Context API
participant TracedData as TracedData Model
participant Span as OpenTelemetry Span
App->>Wrapper: invoke response wrapper (sync/async/stream)
Wrapper->>ContextAPI: capture current context (context_api.get_current())
ContextAPI-->>Wrapper: trace_context
Wrapper->>TracedData: create TracedData(trace_context=...)
TracedData-->>Wrapper: traced_data instance
Wrapper->>Span: start span with provided/restored context
Span-->>Wrapper: started span
alt streaming continuation
Wrapper->>TracedData: persist trace_context on traced_data
Wrapper->>ContextAPI: restore trace_context when creating subsequent spans
ContextAPI-->>Wrapper: restored context
Wrapper->>Span: start child span with restored context
end
alt cancellation / error cleanup
Wrapper->>ContextAPI: restore trace_context and start cleanup span
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro Disabled knowledge base sources:
📒 Files selected for processing (2)
🧰 Additional context used📓 Path-based instructions (1)**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
🧠 Learnings (1)📚 Learning: 2025-08-17T15:06:48.109ZApplied to files:
🧬 Code graph analysis (1)packages/sample-app/sample_app/openai_guardrails_example.py (3)
🪛 Ruff (0.14.6)packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py593-593: Unused function argument: (ARG001) 665-665: Unused function argument: (ARG001) packages/sample-app/sample_app/openai_guardrails_example.py142-142: Do not catch blind exception: (BLE001) ⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
🔇 Additional comments (6)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed everything up to aaf7fdf in 2 minutes and 17 seconds. Click for details.
- Reviewed
920lines of code in7files - Skipped
1files when reviewing. - Skipped posting
9draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:142
- Draft comment:
Good addition of the trace_context field with arbitrary_types_allowed. It ensures trace continuity across async calls. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
2. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:520
- Draft comment:
Trace context retrieval appears repeatedly (e.g. using existing_data.get('trace_context', context_api.get_current())). Consider refactoring to a helper to reduce duplication and potential inconsistencies. - Reason this comment was not posted:
Confidence changes required:50%<= threshold50%None
3. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:602
- Draft comment:
The restoration of trace context for completed responses uses a ternary check on traced_data.trace_context. Verify that falsy values (e.g., empty objects) won’t override an intentionally absent context. - Reason this comment was not posted:
Confidence changes required:50%<= threshold50%None
4. packages/sample-app/sample_app/openai_guardrails_example.py:20
- Draft comment:
Initialization of Traceloop before importing guardrails is well-done. Consider providing more descriptive error handling or configuration hints if guardrails not installed. - Reason this comment was not posted:
Confidence changes required:20%<= threshold50%None
5. packages/sample-app/pyproject.toml:25
- Draft comment:
Python version update to 3.11 and dependency updates look appropriate; ensure compatibility with pydantic v2 if used. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
6. packages/test_responses.py:510
- Draft comment:
The new unit test for trace context propagation is comprehensive. It correctly simulates parent span behavior and verifies child span’s trace_id. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
7. packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming_async_with_parent_span.yaml:135
- Draft comment:
Typo: The Set-Cookie header includes an expires date formatted as '26-Nov-25'. For consistency with standard HTTP-date formats (and with the Date header), it should likely be '26-Nov-2025'. - Reason this comment was not posted:
Comment was on unchanged code.
8. packages/sample-app/sample_app/openai_guardrails_example.py:5
- Draft comment:
Typo: 'OpenLLMetry' should be 'OpenTelemetry' in the docstring. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% "OpenLLMetry" appears twice in the file (lines 5 and 64), suggesting it's intentional. The file is about using Traceloop SDK, and "OpenLLMetry" is likely the name of Traceloop's LLM observability product built on OpenTelemetry. The consistent usage across the file and the context of LLM tracing strongly suggests this is not a typo but an actual product name. The comment is likely incorrect - it's assuming this is a typo when it's actually the correct terminology for this specific library/product. I might be wrong if "OpenLLMetry" is indeed a typo and the author accidentally used it twice. However, given that this is a new file being added (not a modification), and the term is used consistently in both documentation and print statements, it seems very unlikely to be an accidental typo. Even if there's a small chance this could be a typo, the consistent usage across multiple contexts (docstring and print statement) and the fact that it makes sense as a product name (OpenTelemetry for LLMs = OpenLLMetry) strongly indicates this is intentional. The comment lacks strong evidence that this is actually wrong. This comment should be deleted. "OpenLLMetry" appears to be the correct product name (Traceloop's LLM observability product), not a typo. It's used consistently twice in the file and makes sense in context. There's no strong evidence this is actually incorrect.
9. packages/sample-app/sample_app/openai_guardrails_example.py:64
- Draft comment:
Typo: 'OpenLLMetry' should be 'OpenTelemetry' in the output message. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The comment assumes "OpenLLMetry" is a typo, but this appears to be incorrect. The docstring on line 5 uses "OpenLLMetry tracing" consistently, and this seems to be the actual name of the Traceloop SDK's tracing functionality. The file is demonstrating integration with the Traceloop SDK (which provides OpenLLMetry), not directly with OpenTelemetry. While OpenTelemetry is used under the hood (as seen in the import), OpenLLMetry appears to be the branded name for the observability solution. The comment is making an incorrect assumption about what is a typo. Could "OpenLLMetry" actually be a typo that was repeated in multiple places? Maybe the author intended to use "OpenTelemetry" throughout but made a consistent mistake? While it's possible there's a consistent typo, the evidence strongly suggests "OpenLLMetry" is intentional: it's used in the docstring (line 5), the print statement (line 64), and aligns with the Traceloop SDK being used. The naming pattern "OpenLLMetry" (Open + LLM + Telemetry) makes sense as a branded name for LLM observability. This is not a typo. This comment should be deleted. "OpenLLMetry" is not a typo - it appears to be the correct name for the Traceloop SDK's tracing functionality. The comment incorrectly assumes this is a mistake when it's actually intentional branding/naming.
Workflow ID: wflow_XoaO6wM4EKXSfWOb
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py (2)
524-530: Remove unused imports.Static analysis correctly identifies unused imports.
Mock,Response,ResponseOutputItem, andResponseUsageare imported but not used in this test function.- from unittest.mock import MagicMock, Mock - from opentelemetry import trace, context - from opentelemetry.sdk.trace import TracerProvider - from opentelemetry.sdk.trace.export.in_memory_span_exporter import InMemorySpanExporter - from opentelemetry.instrumentation.openai.v1.responses_wrappers import TracedData - from openai.types.responses import Response, ResponseOutputItem, ResponseUsage - import time + from opentelemetry import trace, context + from opentelemetry.sdk.trace import TracerProvider + from opentelemetry.sdk.trace.export.in_memory_span_exporter import InMemorySpanExporter + from opentelemetry.instrumentation.openai.v1.responses_wrappers import TracedData + import time
637-638: Remove unused variable assignment.The
parentvariable is assigned but never used. This is flagged by both Ruff and Flake8.- parent = parent_spans[0] - openai_span = openai_spans[0] + openai_span = openai_spans[0]Apply the same fix at line 710-711.
packages/sample-app/pyproject.toml (1)
25-62: Python and dependency updates look coherent; document & verify downstream impact.The stricter Python constraint (
>=3.11,<3.13), theopenai-agentsbump to^0.3.3, and the newopenai-guardrailsdependency are all consistent with the new Guardrails tracing example and the.python-versionchange. To avoid surprises:
- Call out the Python 3.11+ requirement in any sample‑app docs/README.
- Ensure CI, containers, and sample‑app runners use Python 3.11.
- Quickly sanity‑check that your OpenAI/Guardrails/OpenAI‑agents instrumentation tests pass against these specific versions.
packages/sample-app/sample_app/openai_guardrails_example.py (2)
27-37: Avoid exiting the process at import time for missing Guardrails.Catching
ImportErrorand printing install instructions is helpful, but callingexit(1)at module import time means any code that merely importsopenai_guardrails_example(e.g., for tests or documentation builds) will terminate the whole process if Guardrails isn’t installed.Consider refactoring so that:
- The import is wrapped in a feature flag at module scope, e.g.
_GUARDRAILS_AVAILABLE = True/False, and- The user‑facing message and
sys.exit(1)happen inmain()(or right before runningreproduce_issue()), not during import.This keeps imports side‑effect‑free while preserving the current CLI behavior.
96-101: Tighten or explicitly justify the broadexcept Exception(Ruff BLE001).The generic
except Exception as e:block is fine as a last‑resort handler for a demo script, but Ruff flags it (BLE001). To keep linters and intent aligned, consider either:
- Narrowing this to the specific error types you expect from Guardrails/OpenAI/network failures, or
- Keeping the broad handler but marking it as intentional (e.g.
except Exception as e: # noqa: BLE001) and optionally re‑raising after logging so unexpected failures still produce a non‑zero exit.Also ensure
flake8/Ruff are run overpackages/sample-appso this file doesn’t introduce new lint failures.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
packages/sample-app/poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (7)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py(9 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming_async_with_parent_span.yaml(1 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming_with_parent_span.yaml(1 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py(1 hunks)packages/sample-app/.python-version(1 hunks)packages/sample-app/pyproject.toml(2 hunks)packages/sample-app/sample_app/openai_guardrails_example.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/cassettes/**/*.{yaml,yml,json}
📄 CodeRabbit inference engine (CLAUDE.md)
Never commit secrets or PII in VCR cassettes; scrub sensitive data
Files:
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming_async_with_parent_span.yamlpackages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming_with_parent_span.yaml
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.pypackages/opentelemetry-instrumentation-openai/tests/traces/test_responses.pypackages/sample-app/sample_app/openai_guardrails_example.py
🧠 Learnings (1)
📚 Learning: 2025-08-22T14:41:26.962Z
Learnt from: prane-eth
Repo: traceloop/openllmetry PR: 3336
File: packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py:8-8
Timestamp: 2025-08-22T14:41:26.962Z
Learning: In the openllmetry project, the `packaging` library is available as a transitive dependency through other packages (visible in poetry.lock) and doesn't need to be explicitly declared in pyproject.toml dependencies.
Applied to files:
packages/sample-app/pyproject.toml
🧬 Code graph analysis (2)
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py (2)
packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (3)
export(45-51)InMemorySpanExporter(22-61)get_finished_spans(40-43)packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (1)
TracedData(115-146)
packages/sample-app/sample_app/openai_guardrails_example.py (3)
packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (1)
export(45-51)packages/traceloop-sdk/traceloop/sdk/__init__.py (2)
Traceloop(36-267)init(48-198)packages/traceloop-sdk/traceloop/sdk/tracing/tracing.py (1)
flush(218-223)
🪛 Flake8 (7.3.0)
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py
[error] 524-524: 'unittest.mock.MagicMock' imported but unused
(F401)
[error] 524-524: 'unittest.mock.Mock' imported but unused
(F401)
[error] 529-529: 'openai.types.responses.Response' imported but unused
(F401)
[error] 529-529: 'openai.types.responses.ResponseOutputItem' imported but unused
(F401)
[error] 529-529: 'openai.types.responses.ResponseUsage' imported but unused
(F401)
[error] 637-637: local variable 'parent' is assigned to but never used
(F841)
[error] 710-710: local variable 'parent' is assigned to but never used
(F841)
🪛 Ruff (0.14.5)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
524-524: Do not catch blind exception: Exception
(BLE001)
597-597: Do not catch blind exception: Exception
(BLE001)
683-683: Do not catch blind exception: Exception
(BLE001)
757-757: Do not catch blind exception: Exception
(BLE001)
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py
592-592: Unused function argument: instrument_legacy
(ARG001)
637-637: Local variable parent is assigned to but never used
Remove assignment to unused variable parent
(F841)
665-665: Unused function argument: instrument_legacy
(ARG001)
710-710: Local variable parent is assigned to but never used
Remove assignment to unused variable parent
(F841)
packages/sample-app/sample_app/openai_guardrails_example.py
98-98: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (14)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (7)
142-146: LGTM - trace context field and config addition.The
trace_contextfield witharbitrary_types_allowed = Truein the Config is correctly implemented to store OpenTelemetry Context objects, which are not JSON-serializable by default. This enables trace continuity across async boundaries.
465-471: LGTM - context capture for streaming path.The trace context is correctly captured at span creation time and passed to
start_span, ensuring the span inherits the correct parent context.
521-536: LGTM - error path context propagation.The error handling path correctly captures and restores trace context, ensuring spans created during error handling maintain proper parent-child relationships.
593-607: LGTM - completed response context propagation.The pattern for capturing and restoring trace context in the success path is consistent with other paths and correctly ensures trace continuity.
628-634: LGTM - async streaming context propagation.The async streaming path correctly captures and propagates trace context, mirroring the synchronous implementation.
788-795: LGTM - cancel wrapper context propagation.The cancel wrapper correctly restores trace context from existing data, ensuring cancellation spans are properly linked to their parent traces.
524-525: Broad exception catch is acceptable for instrumentation code.The bare
except Exceptionclause here is intentional defensive programming. Instrumentation code should never break the underlying API calls. IfTracedDatacreation fails, settingtraced_data = Noneand continuing is the correct behavior to prevent instrumentation issues from affecting the user's application.The static analysis hint (BLE001) can be safely ignored in this context.
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming_async_with_parent_span.yaml (1)
1-165: LGTM - cassette appears properly sanitized.The cassette file contains no API keys or sensitive credentials. The organization and project identifiers are non-secret metadata. Cloudflare cookies present are session management tokens, not user credentials. This follows the coding guidelines for not committing secrets or PII in VCR cassettes.
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming_with_parent_span.yaml (1)
1-186: LGTM - cassette appears properly sanitized.The sync streaming cassette file follows the same sanitization pattern as the async variant. No API keys or sensitive credentials are present.
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py (3)
590-660: Well-designed integration test for trace context propagation.This test effectively validates the core fix by:
- Simulating the guardrails wrapper pattern with a parent span
- Making a streaming call within the parent context
- Asserting both
trace_idmatching and proper parent-child relationships- Verifying streaming functionality still works
The test will catch regressions where trace context is lost during streaming operations.
591-596: Fixture parameter is used for side effects - static analysis false positive.The
instrument_legacyfixture is likely used for its side effect of setting up OpenAI instrumentation. This is a common pytest pattern where fixtures are included in the signature for their setup/teardown behavior, not for direct usage. The Ruff warning (ARG001) can be safely ignored.
510-538: I'll help you verify this review comment. Let me start by examining the test file and understanding the test isolation setup, particularly around TracerProvider management.
<function_calls>
Search for TracerProvider usage and test setup/teardown patterns in the test file
#!/bin/bashFirst, let's get the full test file to understand context
wc -l packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py
</function_calls>
<function_calls>
Read the specific test function and surrounding context
#!/bin/bashRead lines around the test function in question
sed -n '500,600p' packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py
</function_calls>
<function_calls>
Look for pytest fixtures and test setup/teardown mechanisms in the test file
#!/bin/bashSearch for fixture definitions and test setup patterns
rg -n "@pytest.fixture|def setup|def teardown|conftest" packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py -B 2 -A 5
</function_calls>
<function_calls>
Search for conftest.py files that might contain shared fixtures
#!/bin/bashFind conftest files
find packages/opentelemetry-instrumentation-openai/tests -name "conftest.py" -type f
</function_calls>
<function_calls>
Check conftest.py for fixture definitions related to tracer or context
#!/bin/bashRead conftest files to understand test isolation
find packages/opentelemetry-instrumentation-openai/tests -name "conftest.py" -exec cat {} ;
</function_calls>
<function_calls>
Check for pytest marks or autouse fixtures that reset tracer state
#!/bin/bashSearch for context management and tracer provider restoration patterns
rg -n "set_tracer_provider|detach_context|Token|_TRACER_PROVIDER" packages/opentelemetry-instrumentation-openai/tests/traces/ --type py -B 2 -A 2 | head -100
</function_calls>packages/sample-app/.python-version (1)
1-1: Python version bump is consistent; ensure environments are updated.
3.11here matches thepython = ">=3.11,<3.13"constraint inpyproject.toml, so tooling/runtime are aligned. Please double‑check CI images, devcontainers, and any Dockerfiles to ensure they also run Python 3.11 to avoid version skew.packages/sample-app/sample_app/openai_guardrails_example.py (1)
19-25: Traceloop initialization order is correct for instrumentation.Initializing
Traceloop(withdisable_batch=TrueandConsoleSpanExporter) before importing Guardrails ensures the OpenAI client created insideGuardrailsAsyncOpenAIparticipates in tracing, which is what you want for this repro. No changes needed here; just confirmopentelemetry-sdk(or the package providingConsoleSpanExporter) is present in the sample‑app environment so imports don’t fail at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 0a42cff in 2 minutes and 45 seconds. Click for details.
- Reviewed
183lines of code in2files - Skipped
0files when reviewing. - Skipped posting
7draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/sample-app/pyproject.toml:24
- Draft comment:
The Python version requirement was changed from '>=3.11,<3.13' to '>=3.10,<3.13', which contradicts the PR summary that mentioned an update to 3.11+. Please confirm if lowering the minimum version is intentional. - Reason this comment was not posted:
Comment looked like it was already resolved.
2. packages/sample-app/pyproject.toml:59
- Draft comment:
The openai-agents version was downgraded from '^0.3.3' to '^0.2.7', which does not match the PR summary stating an upgrade. Please verify if this version change is intentional. - Reason this comment was not posted:
Comment looked like it was already resolved.
3. packages/sample-app/pyproject.toml:61
- Draft comment:
The dependency for openai-guardrails has been removed, but the sample app code imports and relies on it. Consider re-adding the dependency to avoid runtime import errors. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment is about a dependency removal, which is a change in the diff. However, the comment makes a specific claim that "the sample app code imports and relies on it" - this is information I cannot verify from the diff alone. According to the rules, I should only keep comments where I see STRONG EVIDENCE that the comment is correct. The rule also states "If you are not sure, or if understanding the comment requires more context (such as seeing other files or parts of the codebase), you should delete the comment." I cannot see the sample app code files to verify if openai-guardrails is actually imported or used. This requires cross-file context that I don't have. The reviewer provided evidence showing the context around the change, but this doesn't actually prove whether openai-guardrails is used in the code. Maybe the automated tool scanned the codebase and found actual imports, which would make this a valid comment. The removal of a dependency that's still in use would cause runtime errors, which is a serious issue. While it's true that removing a used dependency would be serious, I cannot verify this claim from the diff alone. The rules explicitly state to ignore cross-file issues and only think about the file being reviewed. Without seeing the actual import statements in the sample app code, I cannot confirm this is a real issue. The automated tool could be wrong or outdated. This comment requires cross-file context to verify - I would need to see the actual sample app code to confirm if openai-guardrails is imported. Since I cannot see strong evidence in this diff alone, and the rules state to delete comments that require more context, I should delete this comment.
4. packages/sample-app/sample_app/openai_guardrails_example.py:20
- Draft comment:
The Traceloop.init() call was modified by removing the 'api_endpoint' and 'exporter' parameters. Confirm that this change is intentional, as these settings may affect trace export behavior. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The comment is asking the PR author to "confirm that this change is intentional" - this is explicitly against the rules which state "Do NOT ask the PR author to confirm their intention, to explain, to double-check things, to ensure the behavior is intended." The comment doesn't identify a clear bug or issue that needs fixing; it's just asking for confirmation. The fact that ConsoleSpanExporter is still imported but unused might be worth noting, but the comment doesn't focus on that - it focuses on asking for confirmation about the intentionality of the change. This is a sample/example app, and simplifying the initialization by removing unnecessary parameters seems like a reasonable change for demonstration purposes. Perhaps the removal of the exporter parameter could actually break functionality if the example was specifically designed to show console output of traces. The unused import of ConsoleSpanExporter suggests this might be an oversight rather than intentional simplification. While the unused import is suspicious, the comment doesn't actually point that out or suggest removing it. Instead, it asks the author to "confirm" the change is intentional, which is explicitly prohibited. If the comment had said "Remove the unused ConsoleSpanExporter import" or "The exporter parameter was removed but ConsoleSpanExporter is still imported", that would be actionable. As written, it's just asking for confirmation. This comment should be deleted because it asks the PR author to confirm their intention, which violates the rules. The comment doesn't provide an actionable code change suggestion - it just asks for verification that the change was intentional.
5. packages/sample-app/sample_app/openai_guardrails_example.py:170
- Draft comment:
Consider using a try/finally block to ensure that the guardrails configuration file is always cleaned up, even if an exception occurs. - Reason this comment was not posted:
Comment was on unchanged code.
6. packages/sample-app/sample_app/openai_guardrails_example.py:63
- Draft comment:
Typo: The printed text "OpenAI Guardrails with OpenLLMetry Tracing - Multiple Calls Test" may be intended to read "OpenTelemetry" instead of "OpenLLMetry" to match the import from opentelemetry. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This comment assumes "OpenLLMetry" is a typo for "OpenTelemetry", but looking at the file, "OpenLLMetry" appears multiple times (lines 5, 63, 158) and seems to be a deliberate branding choice. The Traceloop SDK appears to use "OpenLLMetry" as its product name for LLM observability built on top of OpenTelemetry. The comment is making an incorrect assumption that this is a typo when it's actually intentional terminology. This is not a useful comment because it would change correct branding to incorrect branding. Could "OpenLLMetry" actually be a typo that was repeated multiple times? Maybe the author did intend to write "OpenTelemetry" and made the same mistake in several places? While it's theoretically possible the typo was repeated, the consistency of "OpenLLMetry" across the docstring, print statements, and explanatory text strongly suggests it's intentional branding. Additionally, a quick search would reveal that OpenLLMetry is indeed the name of Traceloop's observability product. The comment lacks strong evidence that this is actually a typo. This comment should be deleted. "OpenLLMetry" appears to be intentional branding for the Traceloop SDK's observability product, not a typo for "OpenTelemetry". The term is used consistently throughout the file in multiple contexts, indicating deliberate usage.
7. packages/sample-app/sample_app/openai_guardrails_example.py:158
- Draft comment:
Typo: "OpenLLMetry" might be a mistake. Should this be "OpenTelemetry"? - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% The comment assumes "OpenLLMetry" is a typo for "OpenTelemetry", but this appears to be incorrect. OpenLLMetry is a real product - it's a tracing library specifically designed for LLM applications that builds on OpenTelemetry. The file uses "OpenLLMetry" consistently in multiple places (docstring, print statements), and the imports show usage of Traceloop SDK which is part of the OpenLLMetry ecosystem. The comment is factually wrong - this is not a typo. The author intentionally used "OpenLLMetry" because that's the specific library being demonstrated in this example. Could "OpenLLMetry" be a typo that was consistently made throughout the file? Perhaps the author intended to refer to the underlying OpenTelemetry framework rather than the specific OpenLLMetry wrapper? While it's possible for a typo to be repeated consistently, the context strongly suggests this is intentional. The file is specifically about tracing with the Traceloop SDK (which is OpenLLMetry), and the docstring explicitly mentions "OpenLLMetry tracing" as a feature being demonstrated. The print statement on line 158 is explaining what was fixed in "OpenLLMetry (responses_wrappers.py)" - referring to a specific file in the OpenLLMetry codebase. This is clearly the correct, intended term. This comment should be deleted. "OpenLLMetry" is not a typo - it's the correct name of the tracing library being used in this code. The comment is factually incorrect and would mislead the PR author into making an incorrect change.
Workflow ID: wflow_pA2VLThE7IoLmplU
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
packages/sample-app/sample_app/openai_guardrails_example.py (2)
71-73: Handle missingOPENAI_API_KEYexplicitlyIf
OPENAI_API_KEYis unset,api_key=os.getenv("OPENAI_API_KEY")passesNonetoGuardrailsAsyncOpenAI, which will likely fail later with a less clear error.Consider adding an explicit check before constructing the client:
- client = GuardrailsAsyncOpenAI( - api_key=os.getenv("OPENAI_API_KEY"), config=config_path - ) + api_key = os.getenv("OPENAI_API_KEY") + if not api_key: + raise RuntimeError( + "OPENAI_API_KEY is not set in the environment; cannot create client" + ) + + client = GuardrailsAsyncOpenAI(api_key=api_key, config=config_path)This keeps secrets in env vars (as required) while giving a clearer failure mode.
66-75: Make guardrails config cleanup robust withtry/finally
guardrails_config.jsonis cleaned up at the end, but if an error occurs before thetryblock (e.g., during client construction), the file can be left behind.You can wrap everything after
create_sample_config()in atry/finallyto guarantee cleanup:- # Create guardrails configuration - config_path = create_sample_config() - print(f"\n[Setup] Created guardrails config: {config_path}") - - # Initialize GuardrailsAsyncOpenAI (wraps AsyncOpenAI) - client = GuardrailsAsyncOpenAI( - api_key=os.getenv("OPENAI_API_KEY"), config=config_path - ) - print("[Setup] Created GuardrailsAsyncOpenAI client\n") - - # Get current span context to display trace ID - current_span = trace.get_current_span() - parent_trace_id = current_span.get_span_context().trace_id - print(f"Parent trace_id: {hex(parent_trace_id)}\n") - - try: + # Create guardrails configuration + config_path = create_sample_config() + print(f"\n[Setup] Created guardrails config: {config_path}") + + try: + # Initialize GuardrailsAsyncOpenAI (wraps AsyncOpenAI) + client = GuardrailsAsyncOpenAI( + api_key=os.getenv("OPENAI_API_KEY"), config=config_path + ) + print("[Setup] Created GuardrailsAsyncOpenAI client\n") + + # Get current span context to display trace ID + current_span = trace.get_current_span() + parent_trace_id = current_span.get_span_context().trace_id + print(f"Parent trace_id: {hex(parent_trace_id)}\n") + + # Main test calls ... - print("\n" + "=" * 80) - print("Trace Analysis") - ... - - # Cleanup - if config_path.exists(): - config_path.unlink() - print(f"\n[Cleanup] Removed {config_path}") + finally: + # Cleanup + if config_path.exists(): + config_path.unlink() + print(f"\n[Cleanup] Removed {config_path}")This keeps the sample tidy even when early failures occur.
Also applies to: 171-174
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
packages/sample-app/sample_app/openai_guardrails_example.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/sample-app/sample_app/openai_guardrails_example.py
🪛 Flake8 (7.3.0)
packages/sample-app/sample_app/openai_guardrails_example.py
[error] 16-16: 'opentelemetry.sdk.trace.export.ConsoleSpanExporter' imported but unused
(F401)
🪛 Ruff (0.14.5)
packages/sample-app/sample_app/openai_guardrails_example.py
143-143: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (1)
packages/sample-app/sample_app/openai_guardrails_example.py (1)
57-170: Nice end‑to‑end reproduction workflowThe
reproduce_issueworkflow clearly exercises non‑streaming, streaming, and chat APIs under the Traceloop@workflowspan, which is exactly what you need to validate trace‑ID continuity. The conditional handling ofllm_responsevs raw SDK responses keeps the example usable across wrapper versions.
| except GuardrailTripwireTriggered as e: | ||
| print(f"[Guardrails] Safety check triggered: {e}") | ||
| except Exception as e: | ||
| print(f"[Error] {type(e).__name__}: {e}") | ||
| import traceback | ||
|
|
||
| traceback.print_exc() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid blind except Exception (Ruff BLE001)
Catching a broad Exception here trips Ruff BLE001 and can mask unexpected failures.
Given this is a sample script, two options:
- Narrow the exception(s) you actually expect (e.g., HTTP/client errors), or
- If you intentionally want a catch‑all with traceback printing, annotate it to silence the lint rule:
- except Exception as e:
- print(f"[Error] {type(e).__name__}: {e}")
- import traceback
-
- traceback.print_exc()
+ except Exception as e: # noqa: BLE001 # Intentional catch‑all for demo logging
+ print(f"[Error] {type(e).__name__}: {e}")
+ import traceback
+
+ traceback.print_exc()This makes the intent explicit and keeps CI noise down.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| except GuardrailTripwireTriggered as e: | |
| print(f"[Guardrails] Safety check triggered: {e}") | |
| except Exception as e: | |
| print(f"[Error] {type(e).__name__}: {e}") | |
| import traceback | |
| traceback.print_exc() | |
| except GuardrailTripwireTriggered as e: | |
| print(f"[Guardrails] Safety check triggered: {e}") | |
| except Exception as e: # noqa: BLE001 # Intentional catch‑all for demo logging | |
| print(f"[Error] {type(e).__name__}: {e}") | |
| import traceback | |
| traceback.print_exc() |
🧰 Tools
🪛 Ruff (0.14.5)
143-143: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In packages/sample-app/sample_app/openai_guardrails_example.py around lines 141
to 147, the code uses a blind "except Exception" which triggers Ruff BLE001;
either narrow the catch to the specific exceptions you expect (e.g.,
HTTP/client-related exceptions or other known runtime errors) and handle them
explicitly, or if you truly want a catch-all for this sample script, add an
inline lint exemption comment to make the intent explicit (e.g., annotate the
except line to silence BLE001) and keep the traceback printing as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 34a9f6f in 2 minutes and 54 seconds. Click for details.
- Reviewed
4461lines of code in84files - Skipped
0files when reviewing. - Skipped posting
31draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/version.py:1
- Draft comment:
Version update looks consistent. Ensure that the version bump aligns with the underlying fix for async streaming trace propagation. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is related to a version bump, which falls under the category of dependency changes. The comment is asking the author to ensure alignment with an underlying fix, which is similar to asking them to double-check or verify something. This violates the rules against asking for confirmation or verification of intentions.
2. packages/opentelemetry-instrumentation-openai/pyproject.toml:10
- Draft comment:
Version and metadata updated consistently; dependency bounds (e.g., openai "1.99.7") seem correct. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative, as it only states that the version and metadata are updated consistently and that the dependency bounds seem correct. It does not provide any actionable feedback or suggestions for improvement.
3. packages/opentelemetry-instrumentation-pinecone/pyproject.toml:10
- Draft comment:
Pinecone instrumentation metadata is consistent; no issues detected. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply states that there are no issues detected, which is not useful for the PR author.
4. packages/opentelemetry-instrumentation-qdrant/pyproject.toml:10
- Draft comment:
Qdrant instrumentation configuration looks proper; version bump is consistent. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative, stating that the configuration looks proper and the version bump is consistent. It doesn't provide any actionable feedback or suggestions for improvement.
5. packages/opentelemetry-instrumentation-replicate/pyproject.toml:10
- Draft comment:
Replicate instrumentation dependencies and version metadata updated appropriately. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment seems to be purely informative and does not provide any actionable feedback or suggestions. It does not ask for confirmation or suggest any changes. It simply states that something was updated, which is not useful for the PR author.
6. packages/opentelemetry-instrumentation-sagemaker/pyproject.toml:10
- Draft comment:
SageMaker instrumentation metadata and dependency configuration are consistent. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply states that the configuration is consistent, which does not help the PR author in any way.
7. packages/opentelemetry-instrumentation-together/pyproject.toml:10
- Draft comment:
Together AI instrumentation configuration is updated properly; note the inclusion of a new author. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative and does not provide any actionable feedback or suggestions. It simply notes a change without offering any insight or guidance.
8. packages/opentelemetry-instrumentation-transformers/pyproject.toml:10
- Draft comment:
Transformers instrumentation metadata and dependency settings updated correctly. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply states that something was done correctly, which is not necessary in a code review context.
9. packages/opentelemetry-instrumentation-vertexai/pyproject.toml:10
- Draft comment:
Vertex AI instrumentation configuration is consistent; proper dependency and version metadata. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply states that the configuration is consistent, which does not align with the rules for useful comments.
10. packages/opentelemetry-instrumentation-watsonx/pyproject.toml:1
- Draft comment:
Watsonx instrumentation metadata has been updated correctly; dependency on ibm-watson-machine-learning looks valid. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative, as it states that the metadata has been updated correctly and the dependency looks valid. It doesn't provide any actionable feedback or suggestions for improvement.
11. packages/opentelemetry-instrumentation-weaviate/pyproject.toml:10
- Draft comment:
Weaviate instrumentation configuration and dependency on weaviate-client appear correct. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply states that the configuration and dependency appear correct, which is not useful for the PR author.
12. packages/opentelemetry-instrumentation-writer/pyproject.toml:10
- Draft comment:
Writer instrumentation configuration is updated; note the Python requirement now is '>=3.10,<4'. Ensure that this is intentional. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%The comment is about a change in the Python version requirement. It asks to ensure that this change is intentional, which is not allowed by the rules. The comment does not provide a specific suggestion or point out a specific issue with the change.
13. packages/sample-app/sample_app/dataset_attachments_example.py:1
- Draft comment:
Sample app for dataset attachments demonstrates various attachment types. Code is clear and examples are well-documented. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply praises the code without offering any specific insights or recommendations.
14. packages/sample-app/sample_app/dataset_example.py:1
- Draft comment:
Dataset example script looks good; usage of from_csv and from_dataframe are clear. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
15. packages/sample-app/sample_app/experiment/run_research_experiment.py:1
- Draft comment:
Experiment script for CI/CD is well-organized. Integration with GitHub context is properly handled. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply praises the organization of the script and the integration with GitHub context, which does not align with the rules provided.
16. packages/traceloop-sdk/pyproject.toml:10
- Draft comment:
Traceloop SDK package metadata is updated consistently. Dependencies and repository info are properly set. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative, as it only states that the package metadata is updated consistently and that dependencies and repository info are properly set. It does not provide any actionable feedback or suggestions for improvement.
17. packages/traceloop-sdk/tests/datasets/test_create_dataset.py:16
- Draft comment:
Dataset creation tests from CSV and DataFrame are straightforward and cover expected behaviors. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
18. packages/traceloop-sdk/tests/datasets/test_datasets_operations.py:1
- Draft comment:
Tests for dataset operations (fetch, delete, get_version_csv) are comprehensive. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative, as it only praises the comprehensiveness of the tests without providing any actionable feedback or suggestions for improvement. It does not align with the rules for useful comments.
19. packages/traceloop-sdk/traceloop/sdk/datasets/__init__.py:17
- Draft comment:
The all block in the datasets package cleanly exposes the public API. Good practice. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply praises the existing code without offering any constructive criticism or questions.
20. packages/traceloop-sdk/traceloop/sdk/datasets/attachment.py:24
- Draft comment:
Attachment classes are well-structured. The input validation is robust. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply praises the existing code structure and validation, which is not aligned with the guidelines for useful comments.
21. packages/traceloop-sdk/traceloop/sdk/datasets/model.py:1
- Draft comment:
Data models in the datasets module are clearly defined and use Pydantic effectively. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply praises the use of Pydantic in the data models, which does not align with the rules for useful comments.
22. packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py:1
- Draft comment:
Experiment class implementation is well modularized with clear separation between local and GitHub execution flows. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply praises the existing implementation without offering any constructive criticism or questions.
23. packages/traceloop-sdk/traceloop/sdk/experiment/model.py:1
- Draft comment:
Experiment models are detailed and capture essential metadata. Looks good. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
24. packages/traceloop-sdk/traceloop/sdk/version.py:1
- Draft comment:
Version in the SDK is updated consistently. No issues. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply states that there are no issues with the version update, which is not necessary for the PR author to know.
25. CHANGELOG.md:5
- Draft comment:
Typo: In '- exp: Add run in github experiment (#3459)', consider capitalizing 'GitHub' to maintain proper noun formatting. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This is a comment about capitalizing "GitHub" as a proper noun. While technically correct that GitHub should be capitalized, I need to consider the rules: 1. This is about a CHANGELOG file, which is documentation 2. The rules say "Do NOT comment unless there is clearly a code change required" 3. This is a style/formatting issue, not a functional code issue 4. The rules emphasize not making "obvious or unimportant" comments 5. Changelog entries are typically written quickly and minor typos are generally acceptable 6. This doesn't affect functionality at all The comment is technically correct but falls into the category of being overly pedantic about a minor style issue in a changelog. It's not a code change, it's documentation, and it's a very minor issue that doesn't impact understanding or functionality. Could this be considered important for maintaining professional documentation standards? Brand names should typically be capitalized correctly, and maintaining consistency in a changelog could be valuable for the project's professional appearance. While professional documentation is important, the rules explicitly state not to make "obvious or unimportant" comments. A single lowercase letter in a changelog entry is extremely minor and doesn't warrant a review comment. The rules prioritize actionable code changes over style nitpicks in documentation. This comment should be deleted. It's a minor style issue in a changelog file that doesn't affect code functionality. The rules explicitly discourage obvious or unimportant comments, and this falls into that category.
26. packages/opentelemetry-instrumentation-groq/pyproject.toml:14
- Draft comment:
Typo in the repository URL: "openllmetry" should likely be spelled "opentelemetry". - Reason this comment was not posted:
Comment was on unchanged code.
27. packages/opentelemetry-instrumentation-ollama/pyproject.toml:14
- Draft comment:
Typo: The repository URL contains "openllmetry". It looks like it should be "opentelemetry". Please verify and update if necessary. - Reason this comment was not posted:
Comment was on unchanged code.
28. packages/opentelemetry-instrumentation-openai-agents/pyproject.toml:14
- Draft comment:
There's a typo in the repository URL: "openllmetry" should be "opentelemetry". - Reason this comment was not posted:
Comment was on unchanged code.
29. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:734
- Draft comment:
Typo detected: The model string used here is "gpt-4o", which seems likely to be a typo for "gpt-4" as the tests below use "gpt-4". Please review and correct if necessary. - Reason this comment was not posted:
Comment was on unchanged code.
30. packages/opentelemetry-instrumentation-sagemaker/pyproject.toml:14
- Draft comment:
Typo found in the repository URL. It currently reads "openllmetry" but given the context (the package name is "opentelemetry-instrumentation-sagemaker"), it seems it should be "opentelemetry". Please verify and correct if necessary. - Reason this comment was not posted:
Comment was on unchanged code.
31. packages/traceloop-sdk/traceloop/sdk/experiment/model.py:79
- Draft comment:
Minor consistency issue: The class name 'GithubContext' uses a lowercase 'h' while its docstring and other references use 'GitHub' (with an uppercase H). Consider renaming it (and the related classes 'RunInGithubRequest' and 'RunInGithubResponse') to maintain consistent capitalization throughout the code. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 50% This comment is about a naming/style consistency issue. It's suggesting a refactor for consistency. The rules say "Comments that suggest code quality refactors are good! But only if they are actionable and clear." This comment is actionable and clear - it suggests renaming the classes. However, I need to consider: 1) Is this an "obvious or unimportant" issue? 2) Is this clearly a code change that's required? The capitalization of "GitHub" is a minor style issue. While technically inconsistent, it's not a bug or functional issue. The comment is more of a "nice to have" rather than something that "clearly requires a code change". It's a minor style preference. Given the rule "Do NOT comment unless there is clearly a code change required", this seems like it falls into the category of optional/minor improvements rather than required changes. This could be considered an actionable code quality refactor that improves consistency. The proper capitalization of "GitHub" is actually the official brand name, so this might be more important than just a minor style issue. The comment is clear and actionable. While the comment is actionable, the rules emphasize not commenting unless there is "clearly a code change required." This is a minor naming consistency issue that doesn't affect functionality. The code works fine as-is, and this is more of a stylistic preference. The bar for keeping comments should be high - "STRONG EVIDENCE that the comment is correct" and clearly required changes. This comment should be deleted. While it points out a valid consistency issue, it's a minor stylistic concern rather than a clearly required code change. The code functions correctly as-is, and the comment doesn't meet the threshold of being a necessary change.
Workflow ID: wflow_1xDS7okIDHCPEa1x
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py(9 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.pypackages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
🪛 Flake8 (7.3.0)
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py
[error] 524-524: 'unittest.mock.MagicMock' imported but unused
(F401)
[error] 524-524: 'unittest.mock.Mock' imported but unused
(F401)
[error] 531-531: 'openai.types.responses.Response' imported but unused
(F401)
[error] 531-531: 'openai.types.responses.ResponseOutputItem' imported but unused
(F401)
[error] 531-531: 'openai.types.responses.ResponseUsage' imported but unused
(F401)
[error] 640-640: local variable 'parent' is assigned to but never used
(F841)
[error] 713-713: local variable 'parent' is assigned to but never used
(F841)
🪛 Ruff (0.14.6)
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py
595-595: Unused function argument: instrument_legacy
(ARG001)
640-640: Local variable parent is assigned to but never used
Remove assignment to unused variable parent
(F841)
668-668: Unused function argument: instrument_legacy
(ARG001)
713-713: Local variable parent is assigned to but never used
Remove assignment to unused variable parent
(F841)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
567-567: Do not catch blind exception: Exception
(BLE001)
640-640: Do not catch blind exception: Exception
(BLE001)
729-729: Do not catch blind exception: Exception
(BLE001)
803-803: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Build Packages (3.11)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Lint
🔇 Additional comments (9)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (6)
182-186: LGTM! Trace context field properly configured.The addition of the
trace_contextfield witharbitrary_types_allowed = Truecorrectly handles OpenTelemetry context objects, which aren't Pydantic-compatible by default. This is the appropriate approach for storing trace context.
508-514: LGTM! Context capture for streaming responses.The trace context is correctly captured before creating the span and passed to
start_span, ensuring trace continuity for streaming responses.
564-579: LGTM! Exception path correctly handles trace context.The trace context is properly captured when creating
TracedData(with fallback to current context) and correctly restored when creating the error span. The pattern ensures trace continuity even in error scenarios.
636-650: LGTM! Success path trace context propagation.The trace context is correctly captured in
TracedDataand restored when creating the completion span. This ensures that completed responses maintain trace continuity with their originating context.
674-680: LGTM! Async wrapper consistently implements trace context propagation.The async wrapper functions follow the same trace context capture/restore pattern as the sync versions, ensuring trace continuity across asynchronous operations. The implementation is consistent and correct.
Also applies to: 726-741, 799-813
836-843: LGTM! Cancellation paths preserve trace context.Both sync and async cancellation wrappers correctly restore the trace context from existing data when recording cancellation spans, maintaining trace continuity even when responses are cancelled.
Also applies to: 868-875
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py (3)
510-591: Excellent unit test for trace context propagation!This test comprehensively validates that trace context is correctly captured and restored, ensuring spans maintain the proper parent-child relationship. The detailed assertions and comments make the test's purpose clear.
593-663: Excellent integration test for sync streaming!This test thoroughly validates trace context propagation for synchronous streaming responses, with comprehensive assertions and clear documentation. It effectively prevents regressions of the trace continuity issue.
665-736: Excellent integration test for async streaming!This test provides comprehensive coverage for asynchronous streaming responses, ensuring trace context is properly propagated in async scenarios. The assertions and documentation are thorough and clear.
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py
Outdated
Show resolved
Hide resolved
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py
Outdated
Show resolved
Hide resolved
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 9c7804a in 56 seconds. Click for details.
- Reviewed
47lines of code in2files - Skipped
0files when reviewing. - Skipped posting
4draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:521
- Draft comment:
Removed unused import of MagicMock and Mock. This cleanup improves clarity. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative as it just states that an unused import was removed. It doesn't provide any actionable feedback or suggestions for improvement.
2. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:637
- Draft comment:
Removed assignment to 'parent' in streaming test as it was unused. Good clean-up. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative, stating that an assignment was removed because it was unused. It does not provide a suggestion, ask for confirmation, or highlight a potential issue. It violates the rule against making purely informative comments.
3. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:710
- Draft comment:
Removed redundant assignment to 'parent' in async streaming test; this variable was not needed. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative, stating that a redundant assignment was removed. It does not provide a suggestion, ask for confirmation, or highlight a potential issue. According to the rules, purely informative comments should be removed.
4. packages/sample-app/sample_app/openai_guardrails_example.py:13
- Draft comment:
Removed unused import of ConsoleSpanExporter, which simplifies the code. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative, stating that an unused import was removed. It doesn't provide any actionable feedback or suggestions for improvement. According to the rules, purely informative comments should be removed.
Workflow ID: wflow_qlGYexzyzhBWKi7M
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
Important
Fixes trace context propagation in OpenAI async streaming responses to maintain trace continuity, adds tests, and updates documentation with examples.
responses_get_or_create_wrapperandasync_responses_get_or_create_wrapperinresponses_wrappers.pyto maintain trace continuity across async operations.trace_contexttoTracedDatato store and restore trace context.test_responses_streaming_with_parent_spanandtest_responses_streaming_async_with_parent_spanintest_responses.pyto validate trace context propagation.test_responses_streaming_async_with_parent_span.yamlandtest_responses_streaming_with_parent_span.yamlfor VCR testing.openai_guardrails_example.pyto demonstrate OpenAI Guardrails integration with tracing..python-version.This description was created by
for 9c7804a. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit
New Features
Tests
Documentation
Chores
✏️ Tip: You can customize this high-level summary in your review settings.