-
Notifications
You must be signed in to change notification settings - Fork 868
feat(instrumentation): updated GenAI attributes to use OTel's #3138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(instrumentation): updated GenAI attributes to use OTel's #3138
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed everything up to 3dad6f9 in 1 minute and 35 seconds. Click for details.
- Reviewed
591lines of code in7files - Skipped
0files when reviewing. - Skipped posting
13draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py:50
- Draft comment:
Updated constants: 'LLM_USAGE_COMPLETION_TOKENS' now uses 'gen_ai.usage.output_tokens' and 'LLM_USAGE_PROMPT_TOKENS' uses 'gen_ai.usage.input_tokens'. This aligns with the new naming conventions. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative, explaining the changes made to constants and their alignment with new naming conventions. It does not provide any actionable feedback or suggestions for improvement.
2. packages/opentelemetry-instrumentation-together/tests/test_chat.py:37
- Draft comment:
Test assertions now use the updated attribute names (input_tokens/output_tokens) for token usage. Verify consistency with new semantic conventions. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%The comment is asking the PR author to verify consistency with new semantic conventions, which falls under the rule of not asking the author to double-check things. It doesn't provide a specific suggestion or point out a specific issue with the code.
3. packages/opentelemetry-instrumentation-together/tests/test_completion.py:37
- Draft comment:
Ensure that the test for Together completions correctly sums 'gen_ai.usage.input_tokens' and 'gen_ai.usage.output_tokens' to match total tokens. - Reason this comment was not posted:
Comment looked like it was already resolved.
4. packages/opentelemetry-instrumentation-openai/tests/traces/test_prompt_caching.py:70
- Draft comment:
The OpenAI prompt caching tests have been updated to use new input/output token attribute names. Confirm that these changes are in sync with semantic conventions. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%The comment is asking the PR author to confirm that changes are in sync with semantic conventions, which is against the rules. It does not provide a specific suggestion or ask for a specific test to be written. Therefore, it should be removed.
5. packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py:54
- Draft comment:
Consider updating the accompanying documentation or changelog to mention that the legacy token-related attribute names have been replaced with 'input_tokens' and 'output_tokens'. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%The comment suggests updating documentation or changelog, which is not allowed as per the rules. It doesn't provide a specific code suggestion or ask for a test to be written.
6. packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py:109
- Draft comment:
Typographical error: "cassete" should be "cassette" in the comment. - Reason this comment was not posted:
Comment was on unchanged code.
7. packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py:195
- Draft comment:
Typo/grammatical note: The comment currently says "but not wrote to it," which could be updated to "but not write to it," for clarity. - Reason this comment was not posted:
Comment was on unchanged code.
8. packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py:604
- Draft comment:
Typographical suggestion: The comment reads "but not wrote to it," which could be clearer. Consider revising it to something like "but did not write to it," to improve clarity. - Reason this comment was not posted:
Comment was on unchanged code.
9. packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py:798
- Draft comment:
Typo in comment: "but not wrote to it" is ungrammatical. Consider using "but not writing to it" or "but not writing into it" for clarity. - Reason this comment was not posted:
Comment was on unchanged code.
10. packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py:931
- Draft comment:
Typo in comment: Consider updating "but not wrote to it" to "but did not write to it" for clearer grammar. - Reason this comment was not posted:
Comment was on unchanged code.
11. packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py:1025
- Draft comment:
Typo in comment: Consider changing 'wrote to it' to 'written to it' for clarity. - Reason this comment was not posted:
Comment was on unchanged code.
12. packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py:1029
- Draft comment:
Typographical error: 'cassete' appears to be misspelled. Consider updating it to 'cassette'. - Reason this comment was not posted:
Comment was on unchanged code.
13. packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py:1362
- Draft comment:
Typo: 'cassete' should be spelled as 'cassette' in the comment. - Reason this comment was not posted:
Comment was on unchanged code.
Workflow ID: wflow_oahtGC07iVEgOTYG
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
nirga
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @martimfasantos! in this case I'd remove them from the semantic conventions and just use it directly from the otel package - that way it will always be up to date (when we originally built it there weren't any semconvs for this)
WalkthroughMass rename of span and metric attribute keys: legacy SpanAttributes LLM_* constants replaced by GenAIAttributes GEN_AI_* equivalents across instrumentations and tests (prompts→GEN_AI_PROMPT, completions→GEN_AI_COMPLETION, prompt_tokens→input_tokens, completion_tokens→output_tokens). Mostly import/constant substitutions; one event handler now calls span.end(). Changes
Sequence Diagram(s)sequenceDiagram
participant Instr as Instrumentation
participant GenAI as gen_ai_attributes (GenAIAttributes)
participant Span as OpenTelemetry Span
Note over Instr,GenAI: import gen_ai_attributes as GenAIAttributes
Instr->>GenAI: read constants (GEN_AI_SYSTEM, GEN_AI_PROMPT, GEN_AI_COMPLETION, GEN_AI_USAGE_*)
Instr->>Span: start span with attributes (GenAIAttributes.GEN_AI_SYSTEM, GenAIAttributes.GEN_AI_REQUEST_MODEL, ...)
activate Span
Instr->>Span: set prompt attributes → GenAIAttributes.GEN_AI_PROMPT.*
Instr->>Span: set completion attributes → GenAIAttributes.GEN_AI_COMPLETION.*
Instr->>Span: set usage attributes → GEN_AI_USAGE_INPUT_TOKENS / GEN_AI_USAGE_OUTPUT_TOKENS
Instr->>Span: record token histograms (GEN_AI_TOKEN_TYPE, GEN_AI_RESPONSE_MODEL)
alt event-handler path
Instr->>Span: event handler may call self._span.end()
end
Instr->>Span: end span
deactivate Span
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Potential attention areas:
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🔭 Outside diff range comments (12)
packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/__init__.py (1)
210-225: Replace LLM token attributes with GEN_AI equivalentsThe semantic-conventions package exports
GEN_AI_USAGE_INPUT_TOKENSandGEN_AI_USAGE_OUTPUT_TOKENS, so we should stop using the oldLLM_USAGE_PROMPT_TOKENSandLLM_USAGE_COMPLETION_TOKENS. The total-tokens attribute remainsLLM_USAGE_TOTAL_TOKENS(noGEN_AI_USAGE_TOTAL_TOKENSis defined).Locations to update:
- packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/init.py lines ~210–225
- packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/init.py lines ~324–336
Suggested diff:
- _set_span_attribute( - span, - SpanAttributes.LLM_USAGE_PROMPT_TOKENS, - stream_response.get("input_token_count"), - ) + _set_span_attribute( + span, + SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS, + stream_response.get("input_token_count"), + ) - _set_span_attribute( - span, - SpanAttributes.LLM_USAGE_COMPLETION_TOKENS, - stream_response.get("generated_token_count"), - ) + _set_span_attribute( + span, + SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, + stream_response.get("generated_token_count"), + ) total_token = stream_response.get("input_token_count") + stream_response.get( "generated_token_count" ) - _set_span_attribute( - span, - SpanAttributes.LLM_USAGE_TOTAL_TOKENS, - total_token, - ) + _set_span_attribute( + span, + SpanAttributes.LLM_USAGE_TOTAL_TOKENS, + total_token, + )Please apply the same replacements in the second block (lines 324–336) and update any related tests to expect the new
GEN_AI_USAGE_*constants.packages/traceloop-sdk/traceloop/sdk/tracing/manual.py (1)
49-65: Align usage attributes with Gen AI semantic conventionsThe Gen AI spec defines
gen_ai.usage.input_tokens,gen_ai.usage.output_tokens, and the cache‐related usage attributes, but does not yet providegen_ai.usage.total_tokensor agen_ai.request.type. To stay consistent:• Replace LLM prompt/completion and cache‐input tokens with the GEN_AI equivalents
• KeepLLM_USAGE_TOTAL_TOKENSandLLM_REQUEST_TYPEas they have no Gen AI counterpartsLocations to update (packages/traceloop-sdk/traceloop/sdk/tracing/manual.py, lines 49–65):
- self._span.set_attribute( - SpanAttributes.LLM_USAGE_PROMPT_TOKENS, usage.prompt_tokens - ) - self._span.set_attribute( - SpanAttributes.LLM_USAGE_COMPLETION_TOKENS, usage.completion_tokens - ) + self._span.set_attribute( + SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS, usage.prompt_tokens + ) + self._span.set_attribute( + SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, usage.completion_tokens + ) self._span.set_attribute( SpanAttributes.LLM_USAGE_TOTAL_TOKENS, usage.total_tokens ) - self._span.set_attribute( - SpanAttributes.LLM_USAGE_CACHE_CREATION_INPUT_TOKENS, - usage.cache_creation_input_tokens, - ) - self._span.set_attribute( - SpanAttributes.LLM_USAGE_CACHE_READ_INPUT_TOKENS, - usage.cache_read_input_tokens, - ) + self._span.set_attribute( + SpanAttributes.GEN_AI_USAGE_CACHE_CREATION_INPUT_TOKENS, + usage.cache_creation_input_tokens, + ) + self._span.set_attribute( + SpanAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS, + usage.cache_read_input_tokens, + )No change needed for
LLM_REQUEST_TYPE—the semantic conventions only define"llm.request.type"at this time.packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (1)
479-497:None > 0guard needed before metric recording
prompt_tokensorcompletion_tokenscan beNonewhen the provider omits usage stats.
ComparingNone > 0raisesTypeError.- if prompt_tokens > 0: + if isinstance(prompt_tokens, (int, float)) and prompt_tokens > 0: @@ - if completion_tokens > 0: + if isinstance(completion_tokens, (int, float)) and completion_tokens > 0:packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py (1)
226-233: Inconsistent token usage attribute naming.The token usage attributes still use the legacy
LLM_USAGE_*naming instead of the newerGEN_AI_USAGE_*conventions. Based on the PR objectives to update token terminology, these should be updated to align with the new naming conventions.Apply this diff to align with the new token usage attribute naming:
_set_span_attribute( span, - SpanAttributes.LLM_USAGE_COMPLETION_TOKENS, + SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, usage_dict.get("completion_tokens"), ) _set_span_attribute( span, - SpanAttributes.LLM_USAGE_PROMPT_TOKENS, + SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS, usage_dict.get("prompt_tokens"), )packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (1)
402-416: Inconsistent token usage attribute naming.The token usage span attributes still use the legacy
LLM_USAGE_*naming instead of the newerGEN_AI_USAGE_*conventions, which is inconsistent with the overall migration to GenAI semantic conventions.Consider updating these attributes to align with the new naming conventions:
set_span_attribute( span, - SpanAttributes.LLM_USAGE_PROMPT_TOKENS, + SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS, input_tokens, )set_span_attribute( span, - SpanAttributes.LLM_USAGE_COMPLETION_TOKENS, + SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, output_tokens, )packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/span_utils.py (1)
80-89: Inconsistent attribute naming - some LLM_ attributes remain.*While the response model was updated, the token usage attributes and finish reason attribute still use the legacy
LLM_*naming instead of the newerGEN_AI_*conventions.Consider updating these remaining attributes for consistency:
span.set_attribute( - SpanAttributes.LLM_USAGE_COMPLETION_TOKENS, usage.completion_tokens + SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, usage.completion_tokens ) - span.set_attribute(SpanAttributes.LLM_USAGE_PROMPT_TOKENS, usage.prompt_tokens) + span.set_attribute(SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS, usage.prompt_tokens)And for the finish reason attribute:
span.set_attribute( - SpanAttributes.LLM_RESPONSE_FINISH_REASON, choices[0].finish_reason + SpanAttributes.GEN_AI_RESPONSE_FINISH_REASONS, choices[0].finish_reason )packages/opentelemetry-instrumentation-together/opentelemetry/instrumentation/together/span_utils.py (1)
88-109: Incomplete semantic convention migration for token usage attributesWhile
GEN_AI_RESPONSE_MODELhas been correctly updated, the token usage attributes on lines 97, 102, and 107 still use the legacyLLM_USAGE_*naming convention. Based on the PR objectives to update token usage terminology, these should be updated to:
LLM_USAGE_PROMPT_TOKENS→GEN_AI_USAGE_INPUT_TOKENSLLM_USAGE_COMPLETION_TOKENS→GEN_AI_USAGE_OUTPUT_TOKENSLLM_USAGE_TOTAL_TOKENS→GEN_AI_USAGE_TOTAL_TOKENS_set_span_attribute( span, - SpanAttributes.LLM_USAGE_TOTAL_TOKENS, + SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS, input_tokens + output_tokens, ) _set_span_attribute( span, - SpanAttributes.LLM_USAGE_COMPLETION_TOKENS, + SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, output_tokens, ) _set_span_attribute( span, - SpanAttributes.LLM_USAGE_PROMPT_TOKENS, + SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS, input_tokens, )packages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.py (1)
902-911: Update token usage attribute names in streaming testsThe new semantic conventions define
GEN_AI_USAGE_INPUT_TOKENSandGEN_AI_USAGE_OUTPUT_TOKENS, so the legacyLLM_USAGE_*constants should be replaced accordingly. Since there is noGEN_AI_USAGE_TOTAL_TOKENSconstant, compute the total by summing input and output tokens.• File: packages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.py (around lines 902–911)
- Replace
SpanAttributes.LLM_USAGE_PROMPT_TOKENS→SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS- Replace
SpanAttributes.LLM_USAGE_COMPLETION_TOKENS→SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS- Update total‐tokens assertion to sum the two new attributes instead of using
LLM_USAGE_TOTAL_TOKENSSuggested diff:
- assert ( - bedrock_span.attributes[SpanAttributes.LLM_USAGE_PROMPT_TOKENS] - == inputTokens - ) - assert ( - bedrock_span.attributes[SpanAttributes.LLM_USAGE_COMPLETION_TOKENS] - == outputTokens - ) - assert ( - bedrock_span.attributes[SpanAttributes.LLM_USAGE_TOTAL_TOKENS] - == inputTokens + outputTokens - ) + assert bedrock_span.attributes[SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS] == inputTokens + assert bedrock_span.attributes[SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] == outputTokens + assert ( + bedrock_span.attributes[SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS] + + bedrock_span.attributes[SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] + == inputTokens + outputTokens + )packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py (2)
222-230: Token usage attributes not updated to new semantic conventionsSimilar to other files in this PR, the token usage attributes still use legacy
LLM_USAGE_*naming:
LLM_USAGE_PROMPT_TOKENS(line 222)LLM_USAGE_COMPLETION_TOKENS(line 224)LLM_USAGE_TOTAL_TOKENS(line 228)For consistency with the PR objectives, these should be updated to use
GEN_AI_USAGE_*equivalents with the new "input/output" terminology instead of "prompt/completion".
76-90: Update semantic convention prefixes for penalty attributes and confirm streaming attributeThe semantic conventions for frequency and presence penalties have
GEN_AI_REQUEST_*equivalents and should replace the legacyLLM_*prefixes. The streaming attribute currently has noGEN_AI_*equivalent in the semantic conventions—continue usingLLM_IS_STREAMINGor propose addingGEN_AI_REQUEST_STREAMING.• In
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py
- Line 85: replace
SpanAttributes.LLM_FREQUENCY_PENALTYwithSpanAttributes.GEN_AI_REQUEST_FREQUENCY_PENALTY- Line 88: replace
SpanAttributes.LLM_PRESENCE_PENALTYwithSpanAttributes.GEN_AI_REQUEST_PRESENCE_PENALTY- Line 90: retain
SpanAttributes.LLM_IS_STREAMING(noGEN_AI_REQUEST_STREAMINGfound inopentelemetry-semantic-conventions-ai)Suggested diff:
- set_span_attribute( - span, SpanAttributes.LLM_FREQUENCY_PENALTY, kwargs.get("frequency_penalty") - ) + set_span_attribute( + span, SpanAttributes.GEN_AI_REQUEST_FREQUENCY_PENALTY, kwargs.get("frequency_penalty") + ) - set_span_attribute( - span, SpanAttributes.LLM_PRESENCE_PENALTY, kwargs.get("presence_penalty") - ) + set_span_attribute( + span, SpanAttributes.GEN_AI_REQUEST_PRESENCE_PENALTY, kwargs.get("presence_penalty") + )packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/span_utils.py (1)
86-140: Migrate legacyLLM_USAGE_*attributes to the new naming conventionThe old token-usage constants (
LLM_USAGE_COMPLETION_TOKENS,LLM_USAGE_PROMPT_TOKENS,LLM_USAGE_TOTAL_TOKENS) are still in use across multiple instrumentation packages (e.g.,groq/span_utils.py,together/span_utils.py,vertexai/span_utils.py, etc.). All of these must be updated to the new attribute names alongside the already-migratedGEN_AI_TOKEN_TYPEandGEN_AI_RESPONSE_MODEL.Please update every occurrence in your span utilities—including tests—to use the correct, non-legacy constants. You can locate all remaining uses with:
rg -l 'LLM_USAGE_.*TOKEN' packages/opentelemetry-instrumentation-*/• Replace
LLM_USAGE_PROMPT_TOKENS
• ReplaceLLM_USAGE_COMPLETION_TOKENS
• ReplaceLLM_USAGE_TOTAL_TOKENSwith their new equivalents in each
span_utils.py(and related test files).packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (1)
287-300:_get_vendor_from_urlreturns capital-cased values that violate the GenAI enum
GenAISystemValuesin the incuabating semconv are lower-case ("openai","aws","azure", …).
Here we return"AWS","Azure", etc.; this will produce spans that fail semantic-convention
validation and break downstream queries that expect the canonical values.- if "openai.azure.com" in base_url: - return "Azure" - elif "amazonaws.com" in base_url or "bedrock" in base_url: - return "AWS" - elif "googleapis.com" in base_url or "vertex" in base_url: - return "Google" - elif "openrouter.ai" in base_url: - return "OpenRouter" + if "openai.azure.com" in base_url: + return "azure" + elif "amazonaws.com" in base_url or "bedrock" in base_url: + return "aws" + elif "googleapis.com" in base_url or "vertex" in base_url: + return "google" + elif "openrouter.ai" in base_url: + return "openrouter"A one-line
return vendor.lower()after detection would also work.
♻️ Duplicate comments (1)
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (1)
269-276: Repeat comment: use enum for vendorSame concern as above—replace the literal
"AWS"with the enum value to stay future-proof.
🧹 Nitpick comments (13)
packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/custom_llm_instrumentor.py (1)
177-181: Simplify nested if statements.The static analysis tool correctly identified nested if statements that can be combined for better readability.
Apply this diff to combine the nested if statements:
- if should_send_prompts(): - if llm_request_type == LLMRequestTypeValues.COMPLETION: + if should_send_prompts() and llm_request_type == LLMRequestTypeValues.COMPLETION: _set_span_attribute( span, f"{SpanAttributes.GEN_AI_COMPLETION}.0.content", response.text )packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (1)
51-52: Prefer the official enum to avoid string drift
"AWS"is hard-coded while elsewhere the enumGenAIAttributes.GenAiSystemValues.AWS_BEDROCK.valueis used. Using the enum keeps tests aligned if the canonical value ever changes.-assert bedrock_span.attributes[SpanAttributes.GEN_AI_SYSTEM] == "AWS" +assert ( + bedrock_span.attributes[SpanAttributes.GEN_AI_SYSTEM] + == GenAIAttributes.GenAiSystemValues.AWS_BEDROCK.value +)packages/opentelemetry-instrumentation-cohere/tests/test_completion.py (1)
53-56: Minor consistency nitConsider using
GenAIAttributes.GenAiSystemValues.COHERE.valueinstead of the literal"Cohere"for the vendor check.packages/opentelemetry-instrumentation-openai/tests/traces/test_embeddings.py (2)
145-153: Duplication across testsThe same literal prompt string & model ID are asserted in ~20 places. Parametrize via
pytest.mark.parametrizeor a fixture to reduce duplication and improve maintainability.
321-327: Enum vs literal vendor valueThroughout the file vendor strings (
"openai", etc.) are hard-coded. Prefer the enum values fromGenAIAttributesfor future resiliency.packages/opentelemetry-instrumentation-mistralai/tests/test_chat.py (2)
27-41: Repeated literal"MistralAI"vendorSame comment as other files – consider centralising vendor constant via enum to avoid divergence.
Also applies to: 73-89, 126-143, 179-197
176-197: Large duplication – consider parametrised fixturesThe chat & streaming variants share extensive duplicated assertions differing only by the fixture used (
instrument_*). A helper asserting common span attributes would shrink the test file and make future convention changes easier.packages/opentelemetry-instrumentation-vertexai/opentelemetry/instrumentation/vertexai/span_utils.py (2)
25-29: Minor naming mismatch in prompt attribute keyOther instrumentations store the user prompt under
…prompt.0.content, but here it is stored under…prompt.0.user.
If this was accidental, switching tocontentkeeps cross-provider parity.- f"{SpanAttributes.GEN_AI_PROMPT}.0.user", + f"{SpanAttributes.GEN_AI_PROMPT}.0.content",
57-66: Unused parameter (llm_model) inset_response_attributes
llm_modelis accepted but never referenced. Either remove the argument or use it (e.g., setGEN_AI_RESPONSE_MODEL).packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (1)
273-276: Mixed attribute families in the same span
GEN_AI_SYSTEMreplacesLLM_SYSTEM, but the next line still setsLLM_REQUEST_TYPE. Consider moving that one to aGEN_AI_REQUEST_TYPEconstant (if defined) for consistency.packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (1)
391-402: Hard-coded strings for usage metrics – prefer the SpanAttributes constants
metric_shared_attributesbuilds the dict with string literals
"gen_ai.operation.name"etc., but usesSpanAttributes.GEN_AI_SYSTEM
andGEN_AI_RESPONSE_MODEL. For the two usage keys introduced in this
PR we now haveSpanAttributes.GEN_AI_USAGE_INPUT_TOKENS/
GEN_AI_USAGE_OUTPUT_TOKENS. Using the constants avoids typos and keeps
refactors mechanical.packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py (2)
1291-1299: Raw"gen_ai.usage.*"literals – use the constants to avoid driftThese assertions use bare strings for the input / output token counters.
Now thatSpanAttributesexposes
GEN_AI_USAGE_INPUT_TOKENS/GEN_AI_USAGE_OUTPUT_TOKENS, switch to:-assert anthropic_span.attributes["gen_ai.usage.input_tokens"] == 514 +assert anthropic_span.attributes[SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 514Besides eliminating magic strings, it guarantees the tests fail when the
spec keys change again.Also applies to: 1468-1474, 1604-1610
433-468: Verbose prompt/completion assertions – extractor helper would shrink test noiseThe repeated pattern
assert span.attributes[f"{SpanAttributes.GEN_AI_PROMPT}.{n}.role"] == ... assert span.attributes[f"{SpanAttributes.GEN_AI_PROMPT}.{n}.content"] == ...appears dozens of times. A small helper such as
assert_prompt(span, idx, role, content)would make the intentions
clearer and the diff noise smaller when the convention inevitably
evolves again.
.../opentelemetry-instrumentation-vertexai/opentelemetry/instrumentation/vertexai/span_utils.py
Outdated
Show resolved
Hide resolved
packages/opentelemetry-instrumentation-openai/tests/traces/test_assistant.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (47)
packages/opentelemetry-instrumentation-alephalpha/opentelemetry/instrumentation/alephalpha/__init__.py(3 hunks)packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py(8 hunks)packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py(7 hunks)packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py(3 hunks)packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py(22 hunks)packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py(23 hunks)packages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.py(3 hunks)packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py(15 hunks)packages/opentelemetry-instrumentation-bedrock/tests/traces/test_meta.py(12 hunks)packages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.py(17 hunks)packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py(14 hunks)packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py(6 hunks)packages/opentelemetry-instrumentation-cohere/tests/test_chat.py(3 hunks)packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/span_utils.py(5 hunks)packages/opentelemetry-instrumentation-google-generativeai/tests/test_generate_content.py(3 hunks)packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/span_utils.py(9 hunks)packages/opentelemetry-instrumentation-groq/tests/traces/test_chat_tracing.py(9 hunks)packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py(5 hunks)packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py(9 hunks)packages/opentelemetry-instrumentation-langchain/tests/test_llms.py(16 hunks)packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/span_utils.py(5 hunks)packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py(6 hunks)packages/opentelemetry-instrumentation-llamaindex/tests/test_chroma_vector_store.py(1 hunks)packages/opentelemetry-instrumentation-mistralai/opentelemetry/instrumentation/mistralai/__init__.py(7 hunks)packages/opentelemetry-instrumentation-mistralai/tests/test_chat.py(12 hunks)packages/opentelemetry-instrumentation-ollama/opentelemetry/instrumentation/ollama/span_utils.py(8 hunks)packages/opentelemetry-instrumentation-ollama/tests/test_chat.py(16 hunks)packages/opentelemetry-instrumentation-ollama/tests/test_generation.py(12 hunks)packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py(5 hunks)packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py(2 hunks)packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py(6 hunks)packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py(5 hunks)packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/event_handler_wrapper.py(2 hunks)packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py(1 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/test_assistant.py(15 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py(13 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py(19 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/test_completions.py(10 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/test_embeddings.py(9 hunks)packages/opentelemetry-instrumentation-together/opentelemetry/instrumentation/together/span_utils.py(5 hunks)packages/opentelemetry-instrumentation-vertexai/opentelemetry/instrumentation/vertexai/span_utils.py(5 hunks)packages/opentelemetry-instrumentation-vertexai/tests/disabled_test_bison.py(6 hunks)packages/opentelemetry-instrumentation-vertexai/tests/disabled_test_gemini.py(3 hunks)packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/__init__.py(13 hunks)packages/traceloop-sdk/tests/test_manual.py(1 hunks)packages/traceloop-sdk/tests/test_privacy_no_prompts.py(1 hunks)packages/traceloop-sdk/traceloop/sdk/tracing/manual.py(2 hunks)
✅ Files skipped from review due to trivial changes (5)
- packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
- packages/opentelemetry-instrumentation-google-generativeai/tests/test_generate_content.py
- packages/opentelemetry-instrumentation-alephalpha/opentelemetry/instrumentation/alephalpha/init.py
- packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py
- packages/opentelemetry-instrumentation-ollama/tests/test_generation.py
🚧 Files skipped from review as they are similar to previous changes (41)
- packages/opentelemetry-instrumentation-llamaindex/tests/test_chroma_vector_store.py
- packages/opentelemetry-instrumentation-vertexai/tests/disabled_test_bison.py
- packages/traceloop-sdk/tests/test_privacy_no_prompts.py
- packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/span_utils.py
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py
- packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py
- packages/opentelemetry-instrumentation-ollama/opentelemetry/instrumentation/ollama/span_utils.py
- packages/traceloop-sdk/tests/test_manual.py
- packages/opentelemetry-instrumentation-groq/tests/traces/test_chat_tracing.py
- packages/opentelemetry-instrumentation-together/opentelemetry/instrumentation/together/span_utils.py
- packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py
- packages/opentelemetry-instrumentation-openai/tests/traces/test_embeddings.py
- packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/span_utils.py
- packages/opentelemetry-instrumentation-vertexai/tests/disabled_test_gemini.py
- packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/event_handler_wrapper.py
- packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py
- packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/init.py
- packages/opentelemetry-instrumentation-openai/tests/traces/test_completions.py
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_meta.py
- packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/init.py
- packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py
- packages/opentelemetry-instrumentation-mistralai/tests/test_chat.py
- packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py
- packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py
- packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py
- packages/opentelemetry-instrumentation-mistralai/opentelemetry/instrumentation/mistralai/init.py
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.py
- packages/opentelemetry-instrumentation-cohere/tests/test_chat.py
- packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
- packages/opentelemetry-instrumentation-vertexai/opentelemetry/instrumentation/vertexai/span_utils.py
- packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/span_utils.py
- packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py
- packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/init.py
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py
- packages/traceloop-sdk/traceloop/sdk/tracing/manual.py
- packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
- packages/opentelemetry-instrumentation-openai/tests/traces/test_assistant.py
- packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/init.py
- packages/opentelemetry-instrumentation-ollama/tests/test_chat.py
- packages/opentelemetry-instrumentation-langchain/tests/test_llms.py
- packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.py (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
SpanAttributes(37-249)
🔇 Additional comments (2)
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.py (2)
99-109: Token attribute migration consistently applied.The migration pattern is consistently applied here, matching the changes in the first test function. The same consideration about
LLM_USAGE_TOTAL_TOKENSapplies here as well.
158-168: Token attribute migration consistently applied across all test functions.The migration is consistently implemented across all three test functions, maintaining the same logical assertions while updating to the new semantic convention attribute names.
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.py
Outdated
Show resolved
Hide resolved
|
Hi @nirga! Gentle ping on this PR that updates gen_ai fields to current standards. Happy to adjust anything if needed - just let me know how I can help! |
nirga
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @martimfasantos! Thanks :) but if you're doing this work I'd rather switch to the official semantic convention in favor of ours. When we began this repo there wasn't any but now there are.
|
Here's an example of how to reference them - https://github.com/traceloop/openllmetry/blob/main/packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/utils.py#L19 |
| @@ -1,4 +1,5 @@ | |||
| from enum import Enum | |||
| import opentelemetry.semconv._incubating.attributes.gen_ai_attributes as otel_gen_ai_attributes | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi again @nirga, I’m not sure I’m following. Isn’t that what’s already being done here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes @martimfasantos - but ideally we should directly use the opentelemetry.semconv._incubating.attributes.gen_ai_attributes in the code - we want to remove our custom semconv AI long term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes @martimfasantos - but ideally we should directly use the
opentelemetry.semconv._incubating.attributes.gen_ai_attributesin the code - we want to remove our custom semconv AI long term.
@nirga I've just refactored the code to use opentelemetry.semconv._incubating.attributes.gen_ai_attributes directly. This aligns with the long-term goal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 10
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (1)
1-616: Remove double-qualification ofGenAIAttributes
The only instance ofGenAIAttributes.GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENSwas found inresponses_wrappers.py. Update it to a single qualification to prevent attribute lookup errors.
- File:
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
• Replace the double‐qualified constant:- GenAIAttributes.GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS + GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENSNo other occurrences detected.
packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/__init__.py (2)
353-362: Bug: _emit_response_events signature mismatch and missing event_logger propagationDefinition takes only
response, but callers pass(responses, event_logger)in one place and justresponsein another. Also,emit_eventcalls here don't passevent_logger. This will raise a TypeError and drop events.Apply this diff to fix the signature and both call sites:
-def _emit_response_events(response: dict): +def _emit_response_events(response: dict, event_logger): for i, message in enumerate(response.get("results", [])): - emit_event( + emit_event( ChoiceEvent( index=i, message={"content": message.get("generated_text"), "role": "assistant"}, finish_reason=message.get("stop_reason", "unknown"), ) - ) + , event_logger)- if should_emit_events() and event_logger: - _emit_response_events(responses, event_logger) + if should_emit_events() and event_logger: + _emit_response_events(responses, event_logger)- if should_emit_events() and event_logger: - _emit_response_events( + if should_emit_events() and event_logger: + _emit_response_events( { "results": [ { "stop_reason": stream_stop_reason, "generated_text": stream_generated_text, } ] - }, + }, + event_logger )Also applies to: 507-509, 526-537
481-494: Bug: _handle_input arity mismatch (missingresponse_counterat call site)Definition expects
response_counterbut the call omits it. This will raise a TypeError at runtime.Minimal fix: pass the available
response_counterfrom_wrap:- _handle_input(span, event_logger, name, instance, args, kwargs) + _handle_input(span, event_logger, name, instance, response_counter, args, kwargs)Alternatively, remove the unused parameter from the function signature if not needed.
Also applies to: 572-574
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (1)
599-617: Guard metric recording against None values to prevent runtime errors
Histogram.recordexpects a numeric value. Ifinput_tokensoroutput_tokensisNone, this can raise at runtime.Apply:
- if token_histogram: - token_histogram.record( - input_tokens, - attributes={ - GenAIAttributes.GEN_AI_SYSTEM: "openai", - GenAIAttributes.GEN_AI_TOKEN_TYPE: "input", - GenAIAttributes.GEN_AI_RESPONSE_MODEL: model_name, - "gen_ai.agent.name": agent_name, - }, - ) - token_histogram.record( - output_tokens, - attributes={ - GenAIAttributes.GEN_AI_SYSTEM: "openai", - GenAIAttributes.GEN_AI_TOKEN_TYPE: "output", - GenAIAttributes.GEN_AI_RESPONSE_MODEL: model_name, - "gen_ai.agent.name": agent_name, - }, - ) + if token_histogram: + base_attrs = { + GenAIAttributes.GEN_AI_SYSTEM: "openai", + GenAIAttributes.GEN_AI_RESPONSE_MODEL: model_name, + } + if agent_name: + base_attrs["gen_ai.agent.name"] = agent_name + if input_tokens is not None: + token_histogram.record( + int(input_tokens), + attributes={**base_attrs, GenAIAttributes.GEN_AI_TOKEN_TYPE: "input"}, + ) + if output_tokens is not None: + token_histogram.record( + int(output_tokens), + attributes={**base_attrs, GenAIAttributes.GEN_AI_TOKEN_TYPE: "output"}, + )
♻️ Duplicate comments (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
2-4: Long-term direction: prefer using incubating attributes directly in instrumentation.You’re already moving callers to import
gen_ai_attributesdirectly (good). Keeping these re-exports for now is fine, but plan to deprecate/remove once all instrumentations rely on the incubating package.Also applies to: 40-78
🧹 Nitpick comments (40)
packages/opentelemetry-instrumentation-vertexai/tests/disabled_test_gemini.py (3)
35-36: Migration to GEN_AI_ keys looks correct; consider referencing SpanAttributes for stability*The key renames to GEN_AI_* align with the PR objective and the semconv exports. To reduce direct coupling to the incubating OTel module and be consistent with the repository convention referenced by @nirga, prefer accessing these keys via SpanAttributes. This keeps tests resilient if the underlying source of the constants changes.
Apply this diff within the selected ranges:
- in vertexai_span.attributes[f"{GenAIAttributes.GEN_AI_PROMPT}.0.user"] + in vertexai_span.attributes[f"{SpanAttributes.GEN_AI_PROMPT}.0.user"] @@ - vertexai_span.attributes[GenAIAttributes.GEN_AI_REQUEST_MODEL] + vertexai_span.attributes[SpanAttributes.GEN_AI_REQUEST_MODEL] @@ - vertexai_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] + vertexai_span.attributes[SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS] @@ - vertexai_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] + vertexai_span.attributes[SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] @@ - vertexai_span.attributes[f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content"] + vertexai_span.attributes[f"{SpanAttributes.GEN_AI_COMPLETION}.0.content"]Also applies to: 38-39, 46-48, 50-52, 54-56
86-89: Keep attribute access consistent via SpanAttributesSame rationale as earlier: swap direct GenAIAttributes usage for SpanAttributes to avoid tying tests to the incubating import path.
- vertexai_span.attributes[GenAIAttributes.GEN_AI_REQUEST_MODEL] + vertexai_span.attributes[SpanAttributes.GEN_AI_REQUEST_MODEL] @@ - vertexai_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] + vertexai_span.attributes[SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS] @@ - vertexai_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] + vertexai_span.attributes[SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS]Also applies to: 95-97, 99-101
146-149: Use SpanAttributes for GEN_AI_ keys in the third test as well*For consistency across the file, mirror the same change here.
- vertexai_span.attributes[GenAIAttributes.GEN_AI_REQUEST_MODEL] + vertexai_span.attributes[SpanAttributes.GEN_AI_REQUEST_MODEL] @@ - vertexai_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] + vertexai_span.attributes[SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS] @@ - vertexai_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] + vertexai_span.attributes[SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS]Also applies to: 155-157, 159-161
packages/opentelemetry-instrumentation-ollama/tests/test_generation.py (2)
23-36: Drop redundant f-strings around constant keysf"{GenAIAttributes.GEN_AI_SYSTEM}" and similar are already strings; f-strings add noise without benefit. Prefer passing constants directly for clarity and to avoid accidental formatting bugs.
Example for this block:
- assert ollama_span.attributes.get(f"{GenAIAttributes.GEN_AI_SYSTEM}") == "ollama" + assert ollama_span.attributes.get(GenAIAttributes.GEN_AI_SYSTEM) == "ollama" @@ - assert ( - ollama_span.attributes.get(f"{SpanAttributes.LLM_REQUEST_TYPE}") == "completion" - ) + assert ollama_span.attributes.get(SpanAttributes.LLM_REQUEST_TYPE) == "completion" - assert not ollama_span.attributes.get(f"{SpanAttributes.LLM_IS_STREAMING}") + assert not ollama_span.attributes.get(SpanAttributes.LLM_IS_STREAMING) - assert ollama_span.attributes.get(f"{GenAIAttributes.GEN_AI_REQUEST_MODEL}") == "llama3" + assert ollama_span.attributes.get(GenAIAttributes.GEN_AI_REQUEST_MODEL) == "llama3" - assert ( - ollama_span.attributes.get(f"{GenAIAttributes.GEN_AI_PROMPT}.0.content") - == "Tell me a joke about OpenTelemetry" - ) + assert ollama_span.attributes.get(f"{GenAIAttributes.GEN_AI_PROMPT}.0.content") \ + == "Tell me a joke about OpenTelemetry" - assert ( - ollama_span.attributes.get(f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content") - == response["response"] - ) + assert ollama_span.attributes.get(f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content") \ + == response["response"]Apply the same pattern across the file.
74-92: Reduce flakiness: don’t assume log ordering; assert by event name insteadRelying on logs[0]/logs[1] ordering can be brittle across exporters. Resolve by selecting logs by EventAttributes.EVENT_NAME.
For example:
logs = log_exporter.get_finished_logs() by_name = {log.log_record.attributes.get(EventAttributes.EVENT_NAME): log for log in logs} assert_message_in_logs(by_name["gen_ai.user.message"], "gen_ai.user.message", {...}) assert_message_in_logs(by_name["gen_ai.choice"], "gen_ai.choice", choice_event)Also applies to: 116-129, 202-219, 250-263, 332-349, 378-391, 468-485, 518-531
packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/span_utils.py (2)
9-14: Consolidate GenAI imports for consistencyMinor nit: Importing
GEN_AI_RESPONSE_IDfrom the submodule while importinggen_ai_attributesfrom the package root is slightly inconsistent. If the aggregator re-exports the constant, prefer a single import source for readability; otherwise, this is fine to keep as-is.
50-58: Support multiple max-tokens parameter names (Groq typically usesmax_tokens)Using only
max_tokens_to_sampleis Anthropic-specific. Groq/OpenAI-style clients generally usemax_tokens. Consider a tolerant fallback to capture whichever the upstream SDK provides.Apply this diff:
- set_span_attribute( - span, GenAIAttributes.GEN_AI_REQUEST_MAX_TOKENS, kwargs.get("max_tokens_to_sample") - ) + set_span_attribute( + span, + GenAIAttributes.GEN_AI_REQUEST_MAX_TOKENS, + kwargs.get("max_tokens") or kwargs.get("max_output_tokens") or kwargs.get("max_tokens_to_sample"), + )packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/custom_llm_instrumentor.py (6)
10-12: Avoid depending on incubating internals; prefer SpanAttributes for stabilityUsing
opentelemetry.semconv._incubating.attributestightly couples us to an internal path that can change. We already importSpanAttributes, which exposes the same constants and is the stable facade used across the repo.Apply this diff to drop the direct incubating import:
-from opentelemetry.semconv._incubating.attributes import ( - gen_ai_attributes as GenAIAttributes, -)I’ve proposed replacements below to use
SpanAttributes.*whereGenAIAttributes.*is referenced.
148-151: Use SpanAttributes. instead of GenAIAttributes.**Stay consistent with the rest of the codebase and reduce direct dependency on incubating internals.
- _set_span_attribute(span, GenAIAttributes.GEN_AI_SYSTEM, instance.__class__.__name__) + _set_span_attribute(span, SpanAttributes.GEN_AI_SYSTEM, instance.__class__.__name__) _set_span_attribute(span, SpanAttributes.LLM_REQUEST_TYPE, llm_request_type.value) _set_span_attribute( - span, GenAIAttributes.GEN_AI_REQUEST_MODEL, instance.metadata.model_name + span, SpanAttributes.GEN_AI_REQUEST_MODEL, instance.metadata.model_name )
176-178: Use SpanAttributes for response model attributeConsistency with the rest of the instrumentation and avoidance of incubating internals.
_set_span_attribute( - span, GenAIAttributes.GEN_AI_RESPONSE_MODEL, instance.metadata.model_name + span, SpanAttributes.GEN_AI_RESPONSE_MODEL, instance.metadata.model_name )
180-184: Flatten nested condition and use SpanAttributes in completion contentMinor readability nit and consistency improvement.
- if should_send_prompts(): - if llm_request_type == LLMRequestTypeValues.COMPLETION: - _set_span_attribute( - span, f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content", response.text - ) + if should_send_prompts() and llm_request_type == LLMRequestTypeValues.COMPLETION: + _set_span_attribute( + span, f"{SpanAttributes.GEN_AI_COMPLETION}.0.content", response.text + )
160-163: Flatten nested condition (ruff SIM102)Combine the conditions to reduce nesting and match ruff’s SIM102 recommendation.
- if should_send_prompts(): - # TODO: add support for chat - if llm_request_type == LLMRequestTypeValues.COMPLETION: + # TODO: add support for chat + if should_send_prompts() and llm_request_type == LLMRequestTypeValues.COMPLETION:
167-169: Use SpanAttributes for prompt attribute; keep.0.usersuffixThe
.0.usersuffix is consistent with how COMPLETION requests are instrumented elsewhere. Swap in the stableSpanAttributesconstant for the prefix:Locations to update:
packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/custom_llm_instrumentor.pyaround lines 167–169Recommended diff:
- f"{GenAIAttributes.GEN_AI_PROMPT}.0.user", + f"{SpanAttributes.GEN_AI_PROMPT}.0.user",packages/opentelemetry-instrumentation-mistralai/tests/test_chat.py (4)
43-48: Make total token assertion resilient when a usage attribute is missingIf either input or output token usage is absent (common in some async/streaming providers), the current sum will raise a TypeError (None + int). Default missing token attributes to 0 in the equality check to keep tests robust while still validating the accounting.
Apply this pattern at each occurrence:
- ) == mistral_span.attributes.get( - GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS - ) + mistral_span.attributes.get(GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS) + ) == ( + (mistral_span.attributes.get(GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS) or 0) + + (mistral_span.attributes.get(GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS) or 0) + )Also applies to: 80-86, 133-139, 195-200, 236-242, 293-299, 353-358, 393-398, 447-452, 506-511, 546-551, 602-607
496-496: Use the GenAI constant for the request model key for consistencyAvoid hardcoded strings for attribute keys; use the same GenAIAttributes constant you use elsewhere in this file.
- assert mistral_span.attributes.get("gen_ai.request.model") == "mistral-tiny" + assert mistral_span.attributes.get(GenAIAttributes.GEN_AI_REQUEST_MODEL) == "mistral-tiny"Apply at all three locations.
Also applies to: 544-544, 600-600
351-353: Fix copy/paste in comment and clarify providerThese async tests are for Mistral, not Ollama. Also, consider keeping the comment but ensure the total-token assertion won’t fail when prompt tokens are missing (see prior suggestion).
- # For some reason, async ollama chat doesn't report prompt token usage back + # For some reason, async Mistral chat sometimes doesn't report prompt (input) token usage backAlso applies to: 391-393, 445-447
27-27: Avoid unnecessary f-strings around attribute key constantsf"{GenAIAttributes.GEN_AI_SYSTEM}" is identical to GenAIAttributes.GEN_AI_SYSTEM. Using the constant directly is cleaner and avoids accidental formatting bugs.
Example edit:
- assert mistral_span.attributes.get(f"{GenAIAttributes.GEN_AI_SYSTEM}") == "MistralAI" + assert mistral_span.attributes.get(GenAIAttributes.GEN_AI_SYSTEM) == "MistralAI"Apply similarly to GEN_AI_REQUEST_MODEL, GEN_AI_PROMPT, GEN_AI_COMPLETION, etc.
Also applies to: 31-31, 35-35, 39-39, 73-73, 77-77, 183-183, 187-187, 191-191, 229-229, 233-233, 286-286, 290-290, 336-336, 340-340, 343-343, 348-348, 384-384, 388-388, 438-438, 442-442, 493-493, 498-498, 502-502, 541-541, 597-597
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py (2)
151-163: Prefer enum value for system attribute to avoid magic stringsUse the GenAI system enum for consistency with other tests and modules.
Apply this diff in both places:
- _set_span_attribute( - span, - GenAIAttributes.GEN_AI_SYSTEM, - "openai", - ) + _set_span_attribute( + span, + GenAIAttributes.GEN_AI_SYSTEM, + GenAIAttributes.GenAiSystemValues.OPENAI.value, + )Also applies to: 271-279
176-185: Inconsistent event emission for run instructions; guard with should_emit_events()Unlike the assistant-instructions block above, this unconditionally emits an event and also sets span attributes. Align with the pattern: emit events when enabled, otherwise set span attributes, not both.
Apply this refactor:
- _set_span_attribute( - span, f"{GenAIAttributes.GEN_AI_PROMPT}.{prompt_index}.role", "system" - ) - _set_span_attribute( - span, - f"{GenAIAttributes.GEN_AI_PROMPT}.{prompt_index}.content", - run["instructions"], - ) - emit_event(MessageEvent(content=run["instructions"], role="system")) - prompt_index += 1 + if should_emit_events(): + emit_event(MessageEvent(content=run["instructions"], role="system")) + else: + _set_span_attribute( + span, f"{GenAIAttributes.GEN_AI_PROMPT}.{prompt_index}.role", "system" + ) + _set_span_attribute( + span, + f"{GenAIAttributes.GEN_AI_PROMPT}.{prompt_index}.content", + run["instructions"], + ) + prompt_index += 1packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (1)
180-186: Guardusageattribute access for older SDKs
WhenRESPONSES_AVAILABLEisFalse,traced_response.usageis a plainDict[str, Any], so expressions likeusage.input_tokenswill raise an attribute error. To ensure compatibility, normalizeusageviamodel_as_dictor switch to dict-style access.Pay special attention to these lines in
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
- 182:
usage.input_tokens- 183:
usage.output_tokens- 185:
usage.total_tokens- 187:
usage.input_tokens_details- 191:
usage.input_tokens_details.cached_tokensFor example:
if usage := traced_response.usage: usage_dict = model_as_dict(usage) _set_span_attribute(span, GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS, usage_dict.get("input_tokens")) …packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (1)
736-874: Minor duplication: consider extracting shared post-stream handling.
_build_from_streaming_responseand_abuild_from_streaming_responseduplicate logic for shared attributes, token/choice/duration metrics, response attribute setting, and span closure. Optional: extract a small helper to reduce duplication and future drift.packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (1)
115-131: Set GEN_AI_SYSTEM using canonical enum value (when available).Right now
vendorvalues like "openai" (lowercase) or "AWS" are set directly. The spec exposes canonical system enum values (e.g.,GenAiSystemValues.OPENAI). To avoid casing drift and improve consistency across providers, normalize to the enum value when present and fall back to the raw string.Apply this diff in-place:
- _set_span_attribute(span, GenAIAttributes.GEN_AI_SYSTEM, vendor) + # Normalize to spec enum value when available; fallback to string + system_enum = getattr(getattr(GenAIAttributes, "GenAiSystemValues", object()), vendor.upper(), None) + _set_span_attribute( + span, + GenAIAttributes.GEN_AI_SYSTEM, + system_enum.value if system_enum is not None else vendor, + )Optionally, to keep the call-site lean, add a small helper and use it here:
# Define once near other helpers in this module (outside this diff hunk) def _normalize_gen_ai_system(vendor: str) -> str: enum_cls = getattr(GenAIAttributes, "GenAiSystemValues", None) if enum_cls: maybe = getattr(enum_cls, vendor.upper(), None) if maybe: return maybe.value return vendorThen change the set call to:
_set_span_attribute(span, GenAIAttributes.GEN_AI_SYSTEM, _normalize_gen_ai_system(vendor))packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py (1)
142-145: Inline the generations selection (Ruff SIM108).Minor readability nit: prefer a ternary expression.
- if hasattr(response, "generations"): - generations = response.generations # Cohere v5 - else: - generations = response # Cohere v4 + generations = response.generations if hasattr(response, "generations") else response # Cohere v5 vs v4packages/opentelemetry-instrumentation-cohere/tests/test_rerank.py (1)
41-45: Prefer enum value for GEN_AI_SYSTEM in assertions.Use the spec enum to avoid casing drift and keep tests resilient to normalization changes.
- assert cohere_span.attributes.get(GenAIAttributes.GEN_AI_SYSTEM) == "Cohere" + assert cohere_span.attributes.get(GenAIAttributes.GEN_AI_SYSTEM) == GenAIAttributes.GenAiSystemValues.COHERE.valueRepeat similarly for the other two occurrences in this file.
Also applies to: 121-126, 185-190
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
42-78: Style: normalize spacing around '=' for readability (PEP 8).Several assignments have inconsistent spacing. Not a blocker, but easy to tidy.
- GEN_AI_AGENT_DESCRIPTION=GenAIAttributes.GEN_AI_AGENT_DESCRIPTION - GEN_AI_AGENT_ID= GenAIAttributes.GEN_AI_AGENT_ID - GEN_AI_AGENT_NAME= GenAIAttributes.GEN_AI_AGENT_NAME - GEN_AI_COMPLETION=GenAIAttributes.GEN_AI_COMPLETION - GEN_AI_CONVERSATION_ID=GenAIAttributes.GEN_AI_CONVERSATION_ID - GEN_AI_DATA_SOURCE_ID=GenAIAttributes.GEN_AI_DATA_SOURCE_ID - GEN_AI_OPENAI_REQUEST_SERVICE_TIER=GenAIAttributes.GEN_AI_OPENAI_REQUEST_SERVICE_TIER - GEN_AI_OPENAI_RESPONSE_SERVICE_TIER=GenAIAttributes.GEN_AI_OPENAI_RESPONSE_SERVICE_TIER - GEN_AI_OPENAI_RESPONSE_SYSTEM_FINGERPRINT=GenAIAttributes.GEN_AI_OPENAI_RESPONSE_SYSTEM_FINGERPRINT - GEN_AI_OUTPUT_TYPE=GenAIAttributes.GEN_AI_OUTPUT_TYPE - GEN_AI_PROMPT=GenAIAttributes.GEN_AI_PROMPT - GEN_AI_REQUEST_CHOICE_COUNT=GenAIAttributes.GEN_AI_REQUEST_CHOICE_COUNT - GEN_AI_REQUEST_ENCODING_FORMATS= GenAIAttributes.GEN_AI_REQUEST_ENCODING_FORMATS - GEN_AI_REQUEST_FREQUENCY_PENALTY= GenAIAttributes.GEN_AI_REQUEST_FREQUENCY_PENALTY - GEN_AI_REQUEST_MAX_TOKENS=GenAIAttributes.GEN_AI_REQUEST_MAX_TOKENS - GEN_AI_REQUEST_MODEL=GenAIAttributes.GEN_AI_REQUEST_MODEL - GEN_AI_REQUEST_PRESENCE_PENALTY=GenAIAttributes.GEN_AI_REQUEST_PRESENCE_PENALTY - GEN_AI_REQUEST_SEED=GenAIAttributes.GEN_AI_REQUEST_SEED - GEN_AI_REQUEST_STOP_SEQUENCES=GenAIAttributes.GEN_AI_REQUEST_STOP_SEQUENCES - GEN_AI_REQUEST_TEMPERATURE=GenAIAttributes.GEN_AI_REQUEST_TEMPERATURE - GEN_AI_REQUEST_TOP_K=GenAIAttributes.GEN_AI_REQUEST_TOP_K - GEN_AI_REQUEST_TOP_P=GenAIAttributes.GEN_AI_REQUEST_TOP_P + GEN_AI_AGENT_DESCRIPTION = GenAIAttributes.GEN_AI_AGENT_DESCRIPTION + GEN_AI_AGENT_ID = GenAIAttributes.GEN_AI_AGENT_ID + GEN_AI_AGENT_NAME = GenAIAttributes.GEN_AI_AGENT_NAME + GEN_AI_COMPLETION = GenAIAttributes.GEN_AI_COMPLETION + GEN_AI_CONVERSATION_ID = GenAIAttributes.GEN_AI_CONVERSATION_ID + GEN_AI_DATA_SOURCE_ID = GenAIAttributes.GEN_AI_DATA_SOURCE_ID + GEN_AI_OPENAI_REQUEST_SERVICE_TIER = GenAIAttributes.GEN_AI_OPENAI_REQUEST_SERVICE_TIER + GEN_AI_OPENAI_RESPONSE_SERVICE_TIER = GenAIAttributes.GEN_AI_OPENAI_RESPONSE_SERVICE_TIER + GEN_AI_OPENAI_RESPONSE_SYSTEM_FINGERPRINT = GenAIAttributes.GEN_AI_OPENAI_RESPONSE_SYSTEM_FINGERPRINT + GEN_AI_OUTPUT_TYPE = GenAIAttributes.GEN_AI_OUTPUT_TYPE + GEN_AI_PROMPT = GenAIAttributes.GEN_AI_PROMPT + GEN_AI_REQUEST_CHOICE_COUNT = GenAIAttributes.GEN_AI_REQUEST_CHOICE_COUNT + GEN_AI_REQUEST_ENCODING_FORMATS = GenAIAttributes.GEN_AI_REQUEST_ENCODING_FORMATS + GEN_AI_REQUEST_FREQUENCY_PENALTY = GenAIAttributes.GEN_AI_REQUEST_FREQUENCY_PENALTY + GEN_AI_REQUEST_MAX_TOKENS = GenAIAttributes.GEN_AI_REQUEST_MAX_TOKENS + GEN_AI_REQUEST_MODEL = GenAIAttributes.GEN_AI_REQUEST_MODEL + GEN_AI_REQUEST_PRESENCE_PENALTY = GenAIAttributes.GEN_AI_REQUEST_PRESENCE_PENALTY + GEN_AI_REQUEST_SEED = GenAIAttributes.GEN_AI_REQUEST_SEED + GEN_AI_REQUEST_STOP_SEQUENCES = GenAIAttributes.GEN_AI_REQUEST_STOP_SEQUENCES + GEN_AI_REQUEST_TEMPERATURE = GenAIAttributes.GEN_AI_REQUEST_TEMPERATURE + GEN_AI_REQUEST_TOP_K = GenAIAttributes.GEN_AI_REQUEST_TOP_K + GEN_AI_REQUEST_TOP_P = GenAIAttributes.GEN_AI_REQUEST_TOP_Ppackages/opentelemetry-instrumentation-replicate/opentelemetry/instrumentation/replicate/__init__.py (1)
124-132: Use canonical enum for GEN_AI_SYSTEM (if available).Minor consistency tweak: prefer the spec enum over a raw string.
- GenAIAttributes.GEN_AI_SYSTEM: "Replicate", + GenAIAttributes.GEN_AI_SYSTEM: GenAIAttributes.GenAiSystemValues.REPLICATE.value,packages/opentelemetry-instrumentation-langchain/tests/test_tool_calls.py (1)
59-67: Nit: reduce repetition when building attribute keys.The repeated f-strings for keys like f"{GenAIAttributes.GEN_AI_PROMPT}.N.role/content" and f"{GenAIAttributes.GEN_AI_COMPLETION}.M.tool_calls.K.*" can be slightly DRYed in tests with small helpers (e.g., prompt_key(i, suffix), completion_tool_call_key(i, j, suffix)). Not required, but would improve readability.
packages/opentelemetry-instrumentation-anthropic/tests/utils.py (1)
60-63: Optional: use the constant for system attribute for consistency.You assert against the literal "gen_ai.system" key; consider using GenAIAttributes.GEN_AI_SYSTEM as elsewhere in the repo.
Apply this diff:
- assert all( - data_point.attributes.get("gen_ai.system") == "anthropic" - for data_point in metric.data.data_points - ) + assert all( + data_point.attributes.get(GenAIAttributes.GEN_AI_SYSTEM) == "anthropic" + for data_point in metric.data.data_points + )packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_guardrails_metrics.py (1)
126-129: Safer assertion on first datapoint access.Indexing metric.data.data_points[0] assumes at least one datapoint; guard it to avoid potential IndexError in edge cases.
Apply this diff:
- assert ( - metric.data.data_points[0].attributes[GenAIAttributes.GEN_AI_SYSTEM] - == "bedrock" - ) + # Ensure at least one datapoint exists before indexing + assert metric.data.data_points, f"No datapoints for metric {metric.name}" + assert ( + metric.data.data_points[0].attributes[GenAIAttributes.GEN_AI_SYSTEM] + == "bedrock" + )packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (2)
185-191: Use SpanAttributes for cache token attributes to avoid potential AttributeErrorGEN_AI_USAGE_CACHE_READ_INPUT_TOKENS and GEN_AI_USAGE_CACHE_CREATION_INPUT_TOKENS are guaranteed on SpanAttributes (string constants). Depending on your semconv version, they may not exist on GenAIAttributes. For consistency with streaming.py and to avoid runtime attribute errors, use SpanAttributes here too.
Proposed change:
- set_span_attribute( - span, GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS, cache_read_tokens - ) + set_span_attribute( + span, SpanAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS, cache_read_tokens + ) - set_span_attribute( - span, - GenAIAttributes.GEN_AI_USAGE_CACHE_CREATION_INPUT_TOKENS, - cache_creation_tokens, - ) + set_span_attribute( + span, + SpanAttributes.GEN_AI_USAGE_CACHE_CREATION_INPUT_TOKENS, + cache_creation_tokens, + )And likewise for the synchronous path:
- set_span_attribute( - span, GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS, cache_read_tokens - ) + set_span_attribute( + span, SpanAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS, cache_read_tokens + ) - set_span_attribute( - span, - GenAIAttributes.GEN_AI_USAGE_CACHE_CREATION_INPUT_TOKENS, - cache_creation_tokens, - ) + set_span_attribute( + span, + SpanAttributes.GEN_AI_USAGE_CACHE_CREATION_INPUT_TOKENS, + cache_creation_tokens, + )Also applies to: 276-283
101-110: Nit: avoid mutable default argumentsmetric_attributes: dict = {} creates a shared default across calls. Prefer None default and initialize inside.
Minimal patch:
-async def _aset_token_usage(..., metric_attributes: dict = {}, token_histogram: Histogram = None, choice_counter: Counter = None): +async def _aset_token_usage(..., metric_attributes: dict | None = None, token_histogram: Histogram | None = None, choice_counter: Counter | None = None): + metric_attributes = metric_attributes or {} -def _set_token_usage(..., metric_attributes: dict = {}, token_histogram: Histogram = None, choice_counter: Counter = None): +def _set_token_usage(..., metric_attributes: dict | None = None, token_histogram: Histogram | None = None, choice_counter: Counter | None = None): + metric_attributes = metric_attributes or {}Also applies to: 194-203
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py (1)
105-112: Prefer isinstance over direct type comparison for intstype(x) is int is brittle; use isinstance(x, int). Same for completion_tokens.
Apply:
- if token_histogram and type(input_tokens) is int and input_tokens >= 0: + if token_histogram and isinstance(input_tokens, int) and input_tokens >= 0: ... - if token_histogram and type(completion_tokens) is int and completion_tokens >= 0: + if token_histogram and isinstance(completion_tokens, int) and completion_tokens >= 0:Also applies to: 114-121
packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py (3)
1294-1299: Prefer constants over string literals for usage keys (and consider aligning total_tokens constant)Use GenAIAttributes/SpanAttributes constants instead of raw strings to reduce drift and ease refactors. Also prefer the SpanAttributes constant over "llm.usage.total_tokens" literal.
Apply these diffs:
@@ - assert anthropic_span.attributes["gen_ai.usage.input_tokens"] == 514 - assert ( - anthropic_span.attributes["gen_ai.usage.output_tokens"] - + anthropic_span.attributes["gen_ai.usage.input_tokens"] - == anthropic_span.attributes["llm.usage.total_tokens"] - ) + assert anthropic_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 514 + assert ( + anthropic_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] + + anthropic_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] + == anthropic_span.attributes[SpanAttributes.LLM_USAGE_TOTAL_TOKENS] + )@@ - assert anthropic_span.attributes["gen_ai.usage.input_tokens"] == 514 - assert ( - anthropic_span.attributes["gen_ai.usage.output_tokens"] - + anthropic_span.attributes["gen_ai.usage.input_tokens"] - == anthropic_span.attributes["llm.usage.total_tokens"] - ) + assert anthropic_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 514 + assert ( + anthropic_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] + + anthropic_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] + == anthropic_span.attributes[SpanAttributes.LLM_USAGE_TOTAL_TOKENS] + )@@ - assert anthropic_span.attributes["gen_ai.usage.input_tokens"] == 514 - assert ( - anthropic_span.attributes["gen_ai.usage.output_tokens"] - + anthropic_span.attributes["gen_ai.usage.input_tokens"] - == anthropic_span.attributes["llm.usage.total_tokens"] - ) + assert anthropic_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 514 + assert ( + anthropic_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] + + anthropic_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] + == anthropic_span.attributes[SpanAttributes.LLM_USAGE_TOTAL_TOKENS] + )If verification (see prior comment) confirms a GenAI total tokens attribute is available and set by the instrumentation, switch the right-hand side to GenAIAttributes.GEN_AI_USAGE_TOTAL_TOKENS instead of SpanAttributes.LLM_USAGE_TOTAL_TOKENS.
Also applies to: 1469-1474, 1605-1610
64-66: Nit: Use direct indexing instead of .get for required attributes to fail fastUsing dict-style indexing will surface missing attributes as KeyError instead of silently returning None, making failures more actionable in tests.
Example change (apply similarly to other occurrences):
- assert ( - anthropic_span.attributes.get(f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content") - == response.content[0].text - ) + assert ( + anthropic_span.attributes[f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content"] + == response.content[0].text + )Also applies to: 247-249, 528-529, 713-714, 900-901, 1078-1079
71-76: Optional: factor usage-token sum assertion into a small helper to reduce duplicationThis assertion pattern repeats throughout the file. A tiny helper improves readability and maintenance.
Add once near the bottom (outside current ranges):
def assert_usage_sums(span): assert span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] >= 0 assert span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] >= 0 assert ( span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] + span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == span.attributes[SpanAttributes.LLM_USAGE_TOTAL_TOKENS] )And replace here as:
- assert ( - anthropic_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] - + anthropic_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] - == anthropic_span.attributes[SpanAttributes.LLM_USAGE_TOTAL_TOKENS] - ) + assert_usage_sums(anthropic_span)If we later migrate total_tokens to GenAI, the change is centralized.
packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/__init__.py (3)
135-146: Prompt attribute keys migrated to GEN_AI_PROMPT — avoid shadowing built-ininputThe rename to gen-ai prompt keys is correct. Minor nit: the loop variable name
inputshadows the built-in. Rename to avoid confusion.Apply this diff:
- for index, input in enumerate(prompt): + for index, message in enumerate(prompt): _set_span_attribute( span, - f"{GenAIAttributes.GEN_AI_PROMPT}.{index}.user", - input, + f"{GenAIAttributes.GEN_AI_PROMPT}.{index}.user", + message, )
563-570: Use consistent system casing ("watsonx") to match metric_shared_attributesStart span sets GEN_AI_SYSTEM to "Watsonx" (capitalized), while metrics use "watsonx". Suggest using lowercase across the board.
Apply this diff:
- attributes={ - GenAIAttributes.GEN_AI_SYSTEM: "Watsonx", + attributes={ + GenAIAttributes.GEN_AI_SYSTEM: "watsonx", SpanAttributes.LLM_REQUEST_TYPE: LLMRequestTypeValues.COMPLETION.value, },
442-445: Unify GEN_AI_SYSTEM value casingConfirmed inconsistent values in this file:
- packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/init.py:443 uses
"watsonx"- packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/init.py:567 uses
"Watsonx"To avoid metric/attribute fragmentation, pick a single canonical casing (e.g.
"Watsonx") and update the lowercase occurrence:--- a/packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/__init__.py +++ b/packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/__init__.py @@ -440,7 +440,7 @@ def _build_attributes(...): GenAIAttributes.GEN_AI_RESPONSE_MODEL: response_model, - GenAIAttributes.GEN_AI_SYSTEM: "watsonx", + GenAIAttributes.GEN_AI_SYSTEM: "Watsonx", "stream": is_streaming, }packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (2)
517-523: Confirm prompt attribute structure vs. semconv; indexing may cause high cardinalityYou emit
...GEN_AI_PROMPT.{i}.role/content. Many semconvs prefer arrays (e.g.,...prompt.roles: [],...prompt.contents: []) over dynamic keys. Verify this matches the GenAI spec you’re targeting; if not, consider array attributes to reduce attribute-cardinality.Would you like me to propose an alternative that emits arrays instead of index-suffixed keys?
426-439: Use the set_span_attribute helper for GenAI tool fieldsAll direct calls to
span.set_attributeon tool-specific completion attributes should use theset_span_attribute(span, key, value)helper so that null or empty values are skipped.• File: packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/init.py
– Lines 426–439 (FunctionTool fields)
– Lines 471–501 (WebSearchTool, FileSearchTool, ComputerTool fields)Example diff:
@@ -426,7 +426,7 @@ - span.set_attribute( - f"{GenAIAttributes.GEN_AI_COMPLETION}.tool.name", tool_name - ) + set_span_attribute( + span, f"{GenAIAttributes.GEN_AI_COMPLETION}.tool.name", tool_name + ) @@ -430,7 +430,7 @@ - span.set_attribute( - f"{GenAIAttributes.GEN_AI_COMPLETION}.tool.type", "FunctionTool" - ) + set_span_attribute( + span, f"{GenAIAttributes.GEN_AI_COMPLETION}.tool.type", "FunctionTool" + ) ...and so on for `.tool.description` and `.tool.strict_json_schema`, and likewise in the 471–501 block.Note: the current semconv package does not yet define dedicated constants for these tool-specific attributes (e.g.
GEN_AI_COMPLETION_TOOL_NAME), so formatted strings are required for now.
.../opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py
Show resolved
Hide resolved
...strumentation-llamaindex/opentelemetry/instrumentation/llamaindex/custom_llm_instrumentor.py
Show resolved
Hide resolved
packages/opentelemetry-instrumentation-ollama/tests/test_generation.py
Outdated
Show resolved
Hide resolved
...emetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py
Outdated
Show resolved
Hide resolved
...lemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
Outdated
Show resolved
Hide resolved
...ages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/__init__.py
Show resolved
Hide resolved
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (1)
225-238: Bug: type check usesis strinstead ofisinstance(will mis-handle content)
if generation.message.content is strwill almost always be false and cause strings to be JSON-encoded (quoted) incorrectly. Flake8 E721 also flags this.Apply this fix:
- if generation.message.content is str: - _set_span_attribute( - span, - f"{prefix}.content", - generation.message.content, - ) - else: - _set_span_attribute( - span, - f"{prefix}.content", - json.dumps( - generation.message.content, cls=CallbackFilteredJSONEncoder - ), - ) + content = generation.message.content + if isinstance(content, str): + _set_span_attribute( + span, + f"{prefix}.content", + content, + ) + else: + _set_span_attribute( + span, + f"{prefix}.content", + json.dumps(content, cls=CallbackFilteredJSONEncoder), + )
♻️ Duplicate comments (2)
packages/opentelemetry-instrumentation-anthropic/tests/utils.py (1)
1-3: Import fix resolves prior F821 (undefined name) for GenAIAttributesGood catch adding the incubating GenAIAttributes import; this unblocks lint/tests and aligns with PR goals.
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (1)
181-193: Back-compat bug: usage fields may be dicts; attribute access will raise; also support legacy prompt/completion token namesOn older OpenAI SDKs (RESPONSES_AVAILABLE=False),
usageandinput_tokens_detailsare dicts. Accessingusage.input_tokens/usage.input_tokens_details.cached_tokenswill raise. Also, some providers still emitprompt_tokens/completion_tokens. Handle both object and dict shapes and map legacy names.Apply this diff:
- if usage := traced_response.usage: - _set_span_attribute(span, GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS, usage.input_tokens) - _set_span_attribute(span, GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, usage.output_tokens) - _set_span_attribute( - span, SpanAttributes.LLM_USAGE_TOTAL_TOKENS, usage.total_tokens - ) - if usage.input_tokens_details: - _set_span_attribute( - span, - GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS, - usage.input_tokens_details.cached_tokens, - ) + if usage := traced_response.usage: + # usage can be a pydantic model or a plain dict (older SDKs) + is_dict = isinstance(usage, dict) + + input_tokens = (usage.get("input_tokens") if is_dict else getattr(usage, "input_tokens", None)) + # Back-compat: some SDKs used 'prompt_tokens' + if input_tokens is None and is_dict: + input_tokens = usage.get("prompt_tokens") + _set_span_attribute(span, GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS, input_tokens) + + output_tokens = (usage.get("output_tokens") if is_dict else getattr(usage, "output_tokens", None)) + # Back-compat: some SDKs used 'completion_tokens' + if output_tokens is None and is_dict: + output_tokens = usage.get("completion_tokens") + _set_span_attribute(span, GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, output_tokens) + + total_tokens = (usage.get("total_tokens") if is_dict else getattr(usage, "total_tokens", None)) + _set_span_attribute(span, SpanAttributes.LLM_USAGE_TOTAL_TOKENS, total_tokens) + + input_tokens_details = (usage.get("input_tokens_details") if is_dict else getattr(usage, "input_tokens_details", None)) + if input_tokens_details: + details_is_dict = isinstance(input_tokens_details, dict) + cached_tokens = ( + input_tokens_details.get("cached_tokens") if details_is_dict + else getattr(input_tokens_details, "cached_tokens", None) + ) + _set_span_attribute( + span, + GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS, + cached_tokens, + )
🧹 Nitpick comments (5)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
2-4: Fix F401: GenAIAttributes imported but unused (make it an explicit re-export).Linters (Ruff/Flake8) flag this as unused. If the intent is to re-export GenAIAttributes from this package, annotate the import with a noqa or add it to all.
Apply this minimal diff to silence F401 while keeping the public alias:
-from opentelemetry.semconv._incubating.attributes import ( - gen_ai_attributes as GenAIAttributes, -) +from opentelemetry.semconv._incubating.attributes import ( + gen_ai_attributes as GenAIAttributes, # noqa: F401 - re-exported +)Optionally, also define all to make the re-export explicit:
__all__ = [ "Meters", "SpanAttributes", "Events", "EventAttributes", "LLMRequestTypeValues", "TraceloopSpanKindValues", "GenAIAttributes", ]packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (3)
76-79: Question: Should response model be set at request time?You set both request and response model to the same value during request parameter extraction. If the response model can differ (provider-side routing), consider deferring setting the response model until a response is available to avoid misleading spans.
Apply this minimal change if you prefer to avoid setting response model at request time:
_set_span_attribute(span, GenAIAttributes.GEN_AI_REQUEST_MODEL, model) # response is not available for LLM requests (as opposed to chat) -_set_span_attribute(span, GenAIAttributes.GEN_AI_RESPONSE_MODEL, model) +# Consider setting GEN_AI_RESPONSE_MODEL when a response is received.
97-114: Add a TODO to migrate function metadata to GenAIAttributesWe couldn’t find any GenAIAttributes constant for request/functions metadata, so continue using
SpanAttributes.LLM_REQUEST_FUNCTIONSfor now. To avoid mixing conventions once an equivalent lands, please:
- In packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (at lines 97–114 and 151–161), add a TODO comment above each usage of
SpanAttributes.LLM_REQUEST_FUNCTIONSnoting to switch to the new GenAI attribute once it’s available.
343-361: Use GenAI token attributes instead of LLM ones for consistency
In packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py replace the mixed families in your histogram calls:• At the input-tokens record
• At the output-tokens recordExample diff:
--- a/packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py +++ b/packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py @@ -346,7 +346,7 @@ def _record_token_usage(span: Span, model_name: str, record_token_usage: boo - token_histogram.record( - input_tokens, - attributes={ - GenAIAttributes.GEN_AI_SYSTEM: vendor, - SpanAttributes.LLM_TOKEN_TYPE: "input", - SpanAttributes.LLM_RESPONSE_MODEL: model_name, - }, - ) + token_histogram.record( + input_tokens, + attributes={ + GenAIAttributes.GEN_AI_SYSTEM: vendor, + GenAIAttributes.GEN_AI_TOKEN_TYPE: "input", + GenAIAttributes.GEN_AI_RESPONSE_MODEL: model_name, + }, + ) @@ -356,7 +356,7 @@ def _record_token_usage(span: Span, model_name: str, record_token_usage: boo - token_histogram.record( - output_tokens, - attributes={ - GenAIAttributes.GEN_AI_SYSTEM: vendor, - SpanAttributes.LLM_TOKEN_TYPE: "output", - SpanAttributes.LLM_RESPONSE_MODEL: model_name, - }, - ) + token_histogram.record( + output_tokens, + attributes={ + GenAIAttributes.GEN_AI_SYSTEM: vendor, + GenAIAttributes.GEN_AI_TOKEN_TYPE: "output", + GenAIAttributes.GEN_AI_RESPONSE_MODEL: model_name, + }, + )This aligns with other instrumentation (e.g. OpenAI-agents) and avoids mixing GEN_AI_* and LLM_* families.
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (1)
181-187: Optional: unify total tokens key with GenAI namespace (if available)You’re mixing
GenAIAttributes.*for input/output tokens andSpanAttributes.LLM_USAGE_TOTAL_TOKENSfor totals. If the incubating GenAI namespace defines a total tokens key, consider switching for consistency; otherwise keeping the stable key is fine.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (9)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py(9 hunks)packages/opentelemetry-instrumentation-anthropic/tests/utils.py(4 hunks)packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py(5 hunks)packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py(11 hunks)packages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.py(6 hunks)packages/opentelemetry-instrumentation-langchain/tests/test_llms.py(17 hunks)packages/opentelemetry-instrumentation-ollama/tests/test_generation.py(12 hunks)packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py(7 hunks)packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- packages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.py
- packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/init.py
- packages/opentelemetry-instrumentation-langchain/tests/test_llms.py
- packages/opentelemetry-instrumentation-ollama/tests/test_generation.py
- packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit Inference Engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.pypackages/opentelemetry-instrumentation-anthropic/tests/utils.pypackages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.pypackages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py
🧬 Code Graph Analysis (2)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (2)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (1)
_set_span_attribute(33-40)packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
SpanAttributes(39-212)
packages/opentelemetry-instrumentation-anthropic/tests/utils.py (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
Meters(9-36)
🪛 Ruff (0.12.2)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py
3-3: opentelemetry.semconv._incubating.attributes.gen_ai_attributes imported but unused
(F401)
🪛 Flake8 (7.2.0)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py
[error] 2-2: 'opentelemetry.semconv._incubating.attributes.gen_ai_attributes as GenAIAttributes' imported but unused
(F401)
🔇 Additional comments (16)
packages/opentelemetry-instrumentation-anthropic/tests/utils.py (4)
22-25: Token type attribute migrated to GEN_AI_TOKEN_TYPE — LGTMUsing GenAIAttributes.GEN_AI_TOKEN_TYPE with values ["output", "input"] matches the new semconv naming.
27-29: Response model attribute migration — LGTMGenAIAttributes.GEN_AI_RESPONSE_MODEL usage is correct.
38-40: Choices metric now keyed by GEN_AI_RESPONSE_MODEL — LGTM
51-55: Duration metric attribute lookup updated — LGTMUsing .get with GenAIAttributes.GEN_AI_RESPONSE_MODEL while allowing error.type fallback is appropriate.
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (7)
18-20: Good move: centralized alias for GenAI attributesImporting the incubating GenAI attributes under a single alias keeps usages consistent and makes future migrations easier.
89-96: LGTM: request parameter keys migrated to GenAImax_tokens/temperature/top_p mappings to GEN_AI_REQUEST_* look correct and backward-compatible with common provider param names.
130-136: LGTM: prompt attributes migrated to GenAIUsing GEN_AI_PROMPT.{i}.role/content is aligned with the incubating GenAI semconv.
168-171: LGTM: chat prompt message mapping + tool calls
- Roles/content now under GEN_AI_PROMPT.
- Tool calls recorded under the same GEN_AI_PROMPT prefix, keeping request context cohesive.
Also applies to: 179-180, 189-192, 197-199
211-213: LGTM: completion prefix migrated to GenAISwitch to GEN_AI_COMPLETION prefix is consistent with the rest of the migration.
147-148: Verify params source for set_request_params (serialized.kwargs vs kwargs)You’re passing serialized.get("kwargs", {}) into set_request_params instead of the function’s kwargs argument. If providers set request controls under the actual kwargs param, this could miss model/params. If intentional (LangChain specific), consider a short comment.
Would you like me to search the repo for other call sites to confirm whether serialized["kwargs"] is always present and preferred here?
323-341: No GenAI total-tokens attribute found—keeping LLM_USAGE_TOTAL_TOKENS is correct
Verified thatGenAIAttributesdoes not define aGEN_AI_USAGE_TOTAL_TOKENSconstant and thatSpanAttributes.LLM_USAGE_TOTAL_TOKENSremains the standard total-tokens key across all instrumentations and tests. No change required.packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (5)
41-43: Adopts incubating GenAI attributes alias — LGTMUsing a single namespace alias for incubating GenAI attributes keeps the call sites consistent and readable.
177-180: System/model/ID attributes migrated to GenAI namespace — looks goodThese keys align with the GenAI semantic conventions and keep the span payload coherent.
224-227: Prompt attribute migration to GenAI. — LGTM*All prompt content/role writes are correctly moved under
GEN_AI_PROMPTwith consistent indexing and gating byshould_send_prompts().Also applies to: 231-234, 254-259, 265-271, 283-295
300-304: Completion and tool_calls attribute migration to GenAI. — LGTM*Completion role/content and tool_calls are consistently emitted under
GEN_AI_COMPLETION. Arguments are serialized; IDs and names are well-scoped. Reasoning summary mapped as expected.Also applies to: 315-327, 332-340, 344-352, 356-368, 371-372
190-191: Verify incubating constant existence and implement fallbackI couldn’t locate a definition for
GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENSin the incubating semantic-conventions package. Please confirm that this constant is published inopentelemetry-semantic-conventions; if it’s not present, fall back to usingSpanAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENSto prevent runtime errors.• File: packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
Lines: 190–191GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS, usage.input_tokens_details.cached_tokens,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (7)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)
340-356: Missing usage attribute tests for OpenAI Agents instrumentation
We didn’t find any tests inpackages/opentelemetry-instrumentation-openai-agentsthat assert the span attributes for prompt, completion, or total tokens (GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS,GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS,SpanAttributes.LLM_USAGE_TOTAL_TOKENS). Without these, changes to token extraction in_hooks.pymay regress unnoticed.Please add coverage in
tests/test_openai_agents.py, for example:
- Mock an OpenAI Agents response with a
.usageobject- Instrument and capture spans
- Assert
agent_span.attributes[...]for input_tokens, output_tokens, and total_tokenspackages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py (1)
231-242: Swap and use correct input/output token attributes in assistant_wrappers.pyThe current implementation in
assistant_wrappers.pyinverts prompt/completion and still emits the legacy names. To align with the rest of the OpenAI instrumentation (seeresponses_wrappers.pyandevent_handler_wrapper.py) and with GenAI conventions—and to satisfy the existing assistant tests that assert onGEN_AI_USAGE_INPUT_TOKENSandGEN_AI_USAGE_OUTPUT_TOKENS—please replace the two span attribute calls as follows:--- a/packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py +++ b/packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py @@ -232,9 +232,9 @@ def _wrap_run(run: Dict[str, Any], span: Span, extra: Context) -> Any: if run.get("usage"): usage_dict = model_as_dict(run.get("usage")) - _set_span_attribute( - span, - GenAIAttributes.GEN_AI_USAGE_PROMPT_TOKENS, - usage_dict.get("completion_tokens"), - ) - _set_span_attribute( - span, - GenAIAttributes.GEN_AI_USAGE_COMPLETION_TOKENS, - usage_dict.get("prompt_tokens"), - ) + _set_span_attribute( + span, + GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS, + usage_dict.get("prompt_tokens"), + ) + _set_span_attribute( + span, + GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, + usage_dict.get("completion_tokens"), + )• Confirmed that
GEN_AI_USAGE_INPUT_TOKENSandGEN_AI_USAGE_OUTPUT_TOKENSare already used inresponses_wrappers.py(lines 182–183) and in tests undertests/traces/test_assistant.py.
• This swap prevents broken spans in assistant traces and brings consistency across all GenAI attribute emissions.packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (1)
57-68: Align LangChain tests with GenAI attribute keys for completions and usageThe instrumentation now emits completions under
GenAIAttributes.GEN_AI_COMPLETIONand usage tokens underGenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS,GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS(and retains total tokens asSpanAttributes.LLM_USAGE_TOTAL_TOKENS). Update the assertions inpackages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py(lines 57–68):• Replace
SpanAttributes.LLM_COMPLETIONS→GenAIAttributes.GEN_AI_COMPLETION
• ReplaceSpanAttributes.LLM_USAGE_PROMPT_TOKENS→GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS
• ReplaceSpanAttributes.LLM_USAGE_COMPLETION_TOKENS→GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS
• ReplaceSpanAttributes.LLM_USAGE_CACHE_READ_INPUT_TOKENS→GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS
• KeepSpanAttributes.LLM_USAGE_TOTAL_TOKENSas-isExample diff:
- assert openai_span.attributes[f"{SpanAttributes.LLM_COMPLETIONS}.0.content"] == response - assert openai_span.attributes[f"{SpanAttributes.LLM_COMPLETIONS}.0.role"] == "assistant" + assert openai_span.attributes[f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content"] == response + assert openai_span.attributes[f"{GenAIAttributes.GEN_AI_COMPLETION}.0.role"] == "assistant" - assert openai_span.attributes[SpanAttributes.LLM_USAGE_PROMPT_TOKENS] == 24 - assert openai_span.attributes[SpanAttributes.LLM_USAGE_COMPLETION_TOKENS] == 11 - assert openai_span.attributes[SpanAttributes.LLM_USAGE_TOTAL_TOKENS] == 35 - assert openai_span.attributes[SpanAttributes.LLM_USAGE_CACHE_READ_INPUT_TOKENS] == 0 + assert openai_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 24 + assert openai_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] == 11 + assert openai_span.attributes[SpanAttributes.LLM_USAGE_TOTAL_TOKENS] == 35 + assert openai_span.attributes[GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS] == 0packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (1)
211-239: Bug: incorrect type check usesisinstead ofisinstancefor message contentThis causes string content to be treated as non-string, leading to unnecessary JSON encoding and potentially incorrect behavior.
- if generation.message.content is str: + if isinstance(generation.message.content, str): _set_span_attribute( span, f"{prefix}.content", generation.message.content, )Additionally, guard generation_info which can be None to avoid AttributeError:
- if generation.generation_info.get("finish_reason"): + if getattr(generation, "generation_info", None) and generation.generation_info.get("finish_reason"): _set_span_attribute( span, f"{prefix}.finish_reason", generation.generation_info.get("finish_reason"), )packages/opentelemetry-instrumentation-openai/tests/traces/test_streaming_with_api_usage.py (3)
1-4: Import GenAIAttributes and drop legacy SpanAttributes to align with PR semanticsMove off SpanAttributes and import GenAIAttributes directly; this also resolves the NameError on Line 44.
-from opentelemetry.semconv_ai import SpanAttributes +from opentelemetry.semconv._incubating.attributes.gen_ai_attributes import GenAIAttributes
37-41: Update token usage assertions to GenAI semantics (prompt→input, completion→output)These are still using legacy LLM_* keys. Align to GenAI usage keys per PR objectives.
- assert span.attributes.get(SpanAttributes.LLM_USAGE_PROMPT_TOKENS) > 0 - assert span.attributes.get(SpanAttributes.LLM_USAGE_COMPLETION_TOKENS) > 0 - assert span.attributes.get(SpanAttributes.LLM_USAGE_TOTAL_TOKENS) > 0 + assert span.attributes.get(GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS) > 0 + assert span.attributes.get(GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS) > 0 + assert span.attributes.get(GenAIAttributes.GEN_AI_USAGE_TOTAL_TOKENS) > 0
66-69: Also update token usage assertions in the second testBring these in line with GenAI usage attributes.
- assert span.attributes.get(SpanAttributes.LLM_USAGE_PROMPT_TOKENS) > 0 - assert span.attributes.get(SpanAttributes.LLM_USAGE_COMPLETION_TOKENS) > 0 + assert span.attributes.get(GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS) > 0 + assert span.attributes.get(GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS) > 0
♻️ Duplicate comments (4)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (1)
39-41: Resolved: GenAIAttributes import correctly addedThis addresses the prior undefined-name (F821) issue flagged in earlier reviews. Linting/runtime should now pass for GenAIAttributes references in this module.
packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py (1)
122-130: Same undefined name issue applies here.The references to
GenAIAttributesin this test will also error without the import fix suggested above.After applying the import change, please run the tests locally to ensure no remaining references to
SpanAttributesexist in this file.packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py (2)
111-115: Same fixes as above apply here.Import
GenAIAttributesand update completion/usage keys toGEN_AI_*in this test section as well.
162-166: Same fixes as above apply here (beta variant).Ensure the import is present and migrate completion/usage keys to
GEN_AI_*.
🧹 Nitpick comments (14)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (1)
178-191: Avoid mutable default arguments and tighten type hintsmetric_attributes defaults to {} in both paths. While not mutated here, mutable defaults are best avoided. Also, token_histogram and choice_counter are annotated as non-optional but default to None.
Proposed refactor:
- Use Optional[...] for the counters/histogram and metric_attributes.
- Normalize metric_attributes to {} inside the function.
Apply these diffs:
Async path:
async def _aset_token_usage( span, anthropic, request, response, - metric_attributes: dict = {}, - token_histogram: Histogram = None, - choice_counter: Counter = None, + metric_attributes: Optional[dict] = None, + token_histogram: Optional[Histogram] = None, + choice_counter: Optional[Counter] = None, ): import inspect + if metric_attributes is None: + metric_attributes = {}Sync path:
def _set_token_usage( span, anthropic, request, response, - metric_attributes: dict = {}, - token_histogram: Histogram = None, - choice_counter: Counter = None, + metric_attributes: Optional[dict] = None, + token_histogram: Optional[Histogram] = None, + choice_counter: Optional[Counter] = None, ): import inspect + if metric_attributes is None: + metric_attributes = {}Also applies to: 295-309
packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py (1)
151-156: LGTM after adding the missing import.Assertions correctly validate content presence and the absence of tool_calls for assistant-only content scenarios. With the
GenAIAttributesimport in place, this block looks good.Optionally, mirror the
call.argspattern here for readability, as suggested earlier.packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (5)
340-356: Use GenAI usage token attributes (input/output/total) instead of legacy LLM_USAGE_ here.*The PR objective is to replace prompt_tokens/completion_tokens with input_tokens/output_tokens; however, this block still writes LLM_USAGE_* attributes. Recommend migrating to GenAI usage attributes for consistency.
Apply this diff:
- if hasattr(usage, 'input_tokens') and usage.input_tokens is not None: - otel_span.set_attribute(SpanAttributes.LLM_USAGE_PROMPT_TOKENS, usage.input_tokens) - elif hasattr(usage, 'prompt_tokens') and usage.prompt_tokens is not None: - otel_span.set_attribute(SpanAttributes.LLM_USAGE_PROMPT_TOKENS, usage.prompt_tokens) + if hasattr(usage, 'input_tokens') and usage.input_tokens is not None: + otel_span.set_attribute(GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS, usage.input_tokens) + elif hasattr(usage, 'prompt_tokens') and usage.prompt_tokens is not None: + otel_span.set_attribute(GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS, usage.prompt_tokens) - if hasattr(usage, 'output_tokens') and usage.output_tokens is not None: - otel_span.set_attribute(SpanAttributes.LLM_USAGE_COMPLETION_TOKENS, usage.output_tokens) - elif hasattr(usage, 'completion_tokens') and usage.completion_tokens is not None: - otel_span.set_attribute(SpanAttributes.LLM_USAGE_COMPLETION_TOKENS, usage.completion_tokens) + if hasattr(usage, 'output_tokens') and usage.output_tokens is not None: + otel_span.set_attribute(GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, usage.output_tokens) + elif hasattr(usage, 'completion_tokens') and usage.completion_tokens is not None: + otel_span.set_attribute(GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, usage.completion_tokens) - if hasattr(usage, 'total_tokens') and usage.total_tokens is not None: - otel_span.set_attribute(SpanAttributes.LLM_USAGE_TOTAL_TOKENS, usage.total_tokens) + if hasattr(usage, 'total_tokens') and usage.total_tokens is not None: + otel_span.set_attribute(GenAIAttributes.GEN_AI_USAGE_TOTAL_TOKENS, usage.total_tokens)Note: This requires the import suggested above:
from opentelemetry.semconv_ai import GenAIAttributes
443-459: Mirror the GenAI usage token migration in the legacy fallback block.The legacy path still writes LLM_USAGE_* attributes. Align it with the GenAI usage attributes for consistency.
Apply this diff:
- if hasattr(usage, 'input_tokens') and usage.input_tokens is not None: - otel_span.set_attribute(SpanAttributes.LLM_USAGE_PROMPT_TOKENS, usage.input_tokens) - elif hasattr(usage, 'prompt_tokens') and usage.prompt_tokens is not None: - otel_span.set_attribute(SpanAttributes.LLM_USAGE_PROMPT_TOKENS, usage.prompt_tokens) + if hasattr(usage, 'input_tokens') and usage.input_tokens is not None: + otel_span.set_attribute(GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS, usage.input_tokens) + elif hasattr(usage, 'prompt_tokens') and usage.prompt_tokens is not None: + otel_span.set_attribute(GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS, usage.prompt_tokens) - if hasattr(usage, 'output_tokens') and usage.output_tokens is not None: - otel_span.set_attribute(SpanAttributes.LLM_USAGE_COMPLETION_TOKENS, usage.output_tokens) - elif hasattr(usage, 'completion_tokens') and usage.completion_tokens is not None: - otel_span.set_attribute(SpanAttributes.LLM_USAGE_COMPLETION_TOKENS, usage.completion_tokens) + if hasattr(usage, 'output_tokens') and usage.output_tokens is not None: + otel_span.set_attribute(GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, usage.output_tokens) + elif hasattr(usage, 'completion_tokens') and usage.completion_tokens is not None: + otel_span.set_attribute(GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, usage.completion_tokens) - if hasattr(usage, 'total_tokens') and usage.total_tokens is not None: - otel_span.set_attribute(SpanAttributes.LLM_USAGE_TOTAL_TOKENS, usage.total_tokens) + if hasattr(usage, 'total_tokens') and usage.total_tokens is not None: + otel_span.set_attribute(GenAIAttributes.GEN_AI_USAGE_TOTAL_TOKENS, usage.total_tokens)
304-334: Migrate completion payload attributes from LLM_COMPLETIONS → GenAI completion.You already use GEN_AI_* for tools; keep completions consistent with GenAI by switching off SpanAttributes.LLM_COMPLETIONS here.
Apply this diff:
- otel_span.set_attribute( - f"{SpanAttributes.LLM_COMPLETIONS}.{i}.content", content_text) - otel_span.set_attribute( - f"{SpanAttributes.LLM_COMPLETIONS}.{i}.role", getattr( - output, 'role', 'assistant')) + otel_span.set_attribute( + f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.content", content_text) + otel_span.set_attribute( + f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.role", getattr( + output, 'role', 'assistant')) @@ - otel_span.set_attribute(f"{SpanAttributes.LLM_COMPLETIONS}.{i}.role", "assistant") - otel_span.set_attribute( - f"{SpanAttributes.LLM_COMPLETIONS}.{i}.finish_reason", "tool_calls") - otel_span.set_attribute( - f"{SpanAttributes.LLM_COMPLETIONS}.{i}.tool_calls.0.name", tool_name) - otel_span.set_attribute( - f"{SpanAttributes.LLM_COMPLETIONS}.{i}.tool_calls.0.arguments", arguments) - otel_span.set_attribute( - f"{SpanAttributes.LLM_COMPLETIONS}.{i}.tool_calls.0.id", tool_call_id) + otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.role", "assistant") + otel_span.set_attribute( + f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.finish_reason", "tool_calls") + otel_span.set_attribute( + f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.tool_calls.0.name", tool_name) + otel_span.set_attribute( + f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.tool_calls.0.arguments", arguments) + otel_span.set_attribute( + f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.tool_calls.0.id", tool_call_id) @@ - otel_span.set_attribute(f"{SpanAttributes.LLM_COMPLETIONS}.{i}.content", output.text) - otel_span.set_attribute( - f"{SpanAttributes.LLM_COMPLETIONS}.{i}.role", getattr( - output, 'role', 'assistant')) + otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.content", output.text) + otel_span.set_attribute( + f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.role", getattr( + output, 'role', 'assistant')) @@ - otel_span.set_attribute( - f"{SpanAttributes.LLM_COMPLETIONS}.{i}.finish_reason", response.finish_reason) + otel_span.set_attribute( + f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.finish_reason", response.finish_reason)Note: Ensure the alias import exists:
from opentelemetry.semconv_ai import GenAIAttributes
409-441: Do the same GenAI completion migration in the legacy block.Mirror the change for the second completions block to keep attributes consistent.
Apply this diff:
- otel_span.set_attribute( - f"{SpanAttributes.LLM_COMPLETIONS}.{i}.content", content_text) - otel_span.set_attribute( - f"{SpanAttributes.LLM_COMPLETIONS}.{i}.role", getattr( - output, 'role', 'assistant')) + otel_span.set_attribute( + f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.content", content_text) + otel_span.set_attribute( + f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.role", getattr( + output, 'role', 'assistant')) @@ - otel_span.set_attribute(f"{SpanAttributes.LLM_COMPLETIONS}.{i}.role", "assistant") - otel_span.set_attribute( - f"{SpanAttributes.LLM_COMPLETIONS}.{i}.finish_reason", "tool_calls") - otel_span.set_attribute( - f"{SpanAttributes.LLM_COMPLETIONS}.{i}.tool_calls.0.name", tool_name) - otel_span.set_attribute( - f"{SpanAttributes.LLM_COMPLETIONS}.{i}.tool_calls.0.arguments", arguments) - otel_span.set_attribute( - f"{SpanAttributes.LLM_COMPLETIONS}.{i}.tool_calls.0.id", tool_call_id) + otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.role", "assistant") + otel_span.set_attribute( + f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.finish_reason", "tool_calls") + otel_span.set_attribute( + f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.tool_calls.0.name", tool_name) + otel_span.set_attribute( + f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.tool_calls.0.arguments", arguments) + otel_span.set_attribute( + f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.tool_calls.0.id", tool_call_id) @@ - otel_span.set_attribute(f"{SpanAttributes.LLM_COMPLETIONS}.{i}.content", output.text) - otel_span.set_attribute( - f"{SpanAttributes.LLM_COMPLETIONS}.{i}.role", getattr( - output, 'role', 'assistant')) + otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.content", output.text) + otel_span.set_attribute( + f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.role", getattr( + output, 'role', 'assistant')) @@ - otel_span.set_attribute( - f"{SpanAttributes.LLM_COMPLETIONS}.{i}.finish_reason", response.finish_reason) + otel_span.set_attribute( + f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.finish_reason", response.finish_reason)
10-11: Avoid private incubating module imports where possible.You import GEN_AI_COMPLETION from a private incubating path. Prefer stable exports to reduce churn and align with the PR’s GenAIAttributes usage.
Two options:
- Switch to the alias everywhere and drop the incubating import:
- Import:
from opentelemetry.semconv_ai import GenAIAttributes- Replace usages of
GEN_AI_COMPLETIONwithGenAIAttributes.GEN_AI_COMPLETION.- Or, if you keep the constant import, ensure all GenAI keys consistently use either the alias or constants, not a mix.
Happy to provide a follow-up diff if you want to unify the file one way or the other.
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py (1)
206-213: LGTM with a consistency suggestion: user/system message prompts use GEN_AI_PROMPT; consider migrating completions tooFor user/system messages, writing role/content under GEN_AI_PROMPT is correct. However, completion messages in this function still use the legacy LLM_COMPLETIONS prefix and a raw "gen_ai.response" string for IDs, which mixes namespaces.
Consider migrating completions to the GenAI response path for consistency, e.g.:
- Use a GEN_AI response base instead of LLM_COMPLETIONS (if available in GenAIAttributes, e.g., GenAIAttributes.GEN_AI_RESPONSE).
- Write role/content/id under f"{GenAIAttributes.GEN_AI_RESPONSE}.{completion_index}.*".
This avoids mixing old and new conventions in a single span.
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
2-4: Fix lint failure: imported alias is unused (F401) — explicitly re-export or silence.Ruff/Flake8 flags
GenAIAttributesas an unused import. If the intent is to re-export the incubating attributes through this module (recommended for callers), add an inline noqa or explicitly reference it.Apply one of the following:
Option A (noqa on import):
-from opentelemetry.semconv._incubating.attributes import ( - gen_ai_attributes as GenAIAttributes, -) +from opentelemetry.semconv._incubating.attributes import ( + gen_ai_attributes as GenAIAttributes, # noqa: F401 (public re-export) +)Option B (explicit re-export; also keeps linters happy):
from opentelemetry.semconv._incubating.attributes import ( - gen_ai_attributes as GenAIAttributes, + gen_ai_attributes as GenAIAttributes, ) +# Public re-export for downstreams to import from opentelemetry.semconv_ai +GenAIAttributes # touch symbol to satisfy F401packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py (3)
116-118: Inconsistent prompt schema: use.role/.contentinstead of.user.Everywhere else you use
...prompt.{i}.roleand...prompt.{i}.content. Here it writes to...prompt.0.user, which is inconsistent and breaks consumers expectingrole/content.-def _set_prompt_span_attributes(span, request_body): - _set_span_attribute( - span, f"{GenAIAttributes.GEN_AI_PROMPT}.0.user", request_body.get("prompt") - ) +def _set_prompt_span_attributes(span, request_body): + _set_span_attribute( + span, f"{GenAIAttributes.GEN_AI_PROMPT}.0.content", request_body.get("prompt") + ) + _set_span_attribute(span, f"{GenAIAttributes.GEN_AI_PROMPT}.0.role", "user")
441-444: Amazon input prompt: align withrole/content(avoid.user).Same inconsistency as above: prefer
...prompt.0.content+...prompt.0.role = "user".- _set_span_attribute( - span, - f"{GenAIAttributes.GEN_AI_PROMPT}.0.user", - request_body.get("inputText"), - ) + _set_span_attribute( + span, + f"{GenAIAttributes.GEN_AI_PROMPT}.0.content", + request_body.get("inputText"), + ) + _set_span_attribute(span, f"{GenAIAttributes.GEN_AI_PROMPT}.0.role", "user")
546-558: Guard against None when summing token usage.If either
prompt_tokensorcompletion_tokensis None (possible in some provider responses),prompt_tokens + completion_tokensraises TypeError. Upstream callsites largely avoid this, but_set_imported_model_span_attributescan pass None.Consider normalizing to ints (default 0) before summing:
# outside selected range; illustrative patch pt = int(prompt_tokens or 0) ct = int(completion_tokens or 0) _set_span_attribute(span, SpanAttributes.LLM_USAGE_TOTAL_TOKENS, pt + ct)Alternatively, skip the total attribute if either is missing.
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (1)
76-96: LGTM: migrate request model and request params to GEN_AI_ keys*The move to GEN_AI_REQUEST_MODEL/RESPONSE_MODEL/MAX_TOKENS/TEMPERATURE/TOP_P looks correct. Minor note: if no model is found, span_holder.request_model remains None; consider defaulting it to "unknown" for consistency.
else: - model = "unknown" + model = "unknown" + span_holder.request_model = modelpackages/opentelemetry-instrumentation-openai/tests/traces/test_streaming_with_api_usage.py (1)
9-12: Avoid hardcoding API keys in code, even in testsUse an env var (e.g., OPENAI_API_KEY="test-api-key") to prevent accidental leakage if the value changes in the future.
Example:
import os return OpenAI( api_key=os.getenv("OPENAI_API_KEY", "test-api-key"), base_url="https://api.deepseek.com/beta", )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
packages/opentelemetry-instrumentation-alephalpha/poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (31)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py(9 hunks)packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py(11 hunks)packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py(6 hunks)packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py(3 hunks)packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py(3 hunks)packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py(24 hunks)packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py(24 hunks)packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/__init__.py(3 hunks)packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/__init__.py(3 hunks)packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py(5 hunks)packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py(10 hunks)packages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.py(7 hunks)packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py(1 hunks)packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py(3 hunks)packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py(2 hunks)packages/opentelemetry-instrumentation-llamaindex/tests/test_chroma_vector_store.py(2 hunks)packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py(1 hunks)packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py(5 hunks)packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py(7 hunks)packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py(7 hunks)packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py(3 hunks)packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py(6 hunks)packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py(7 hunks)packages/opentelemetry-instrumentation-openai/tests/metrics/test_openai_metrics.py(5 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py(13 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py(20 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/test_completions.py(10 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/test_streaming_with_api_usage.py(1 hunks)packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py(1 hunks)packages/opentelemetry-semantic-conventions-ai/pyproject.toml(1 hunks)packages/traceloop-sdk/tests/test_workflows.py(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (19)
- packages/opentelemetry-semantic-conventions-ai/pyproject.toml
- packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/init.py
- packages/opentelemetry-instrumentation-llamaindex/tests/test_chroma_vector_store.py
- packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py
- packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py
- packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py
- packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py
- packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
- packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
- packages/opentelemetry-instrumentation-openai/tests/metrics/test_openai_metrics.py
- packages/opentelemetry-instrumentation-openai/tests/traces/test_completions.py
- packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py
- packages/traceloop-sdk/tests/test_workflows.py
- packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
- packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py
- packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py
- packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py
- packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py
- packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/init.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit Inference Engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.pypackages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/__init__.pypackages/opentelemetry-instrumentation-langchain/tests/test_langgraph.pypackages/opentelemetry-instrumentation-openai/tests/traces/test_streaming_with_api_usage.pypackages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.pypackages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.pypackages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.pypackages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.pypackages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.pypackages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.pypackages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.pypackages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py
🧠 Learnings (1)
📚 Learning: 2025-08-17T15:06:48.109Z
Learnt from: CR
PR: traceloop/openllmetry#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Semantic conventions must follow the OpenTelemetry GenAI specification
Applied to files:
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py
🧬 Code Graph Analysis (4)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (2)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py (1)
set_span_attribute(21-25)packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
SpanAttributes(67-240)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py (2)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (1)
_set_span_attribute(55-57)packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (2)
SpanAttributes(67-240)LLMRequestTypeValues(272-277)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py (2)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (1)
_set_span_attribute(30-37)packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (1)
should_emit_events(174-181)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (2)
packages/opentelemetry-instrumentation-alephalpha/opentelemetry/instrumentation/alephalpha/__init__.py (1)
_set_span_attribute(63-67)packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (1)
_set_span_attribute(30-37)
🪛 Ruff (0.12.2)
packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py
56-56: Undefined name GenAIAttributes
(F821)
59-59: Undefined name GenAIAttributes
(F821)
111-111: Undefined name GenAIAttributes
(F821)
114-114: Undefined name GenAIAttributes
(F821)
162-162: Undefined name GenAIAttributes
(F821)
165-165: Undefined name GenAIAttributes
(F821)
packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py
49-49: Undefined name GenAIAttributes
(F821)
51-51: Undefined name GenAIAttributes
(F821)
53-53: Undefined name GenAIAttributes
(F821)
55-55: Undefined name GenAIAttributes
(F821)
packages/opentelemetry-instrumentation-openai/tests/traces/test_streaming_with_api_usage.py
44-44: Undefined name GenAIAttributes
(F821)
packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py
52-52: Undefined name GenAIAttributes
(F821)
53-53: Undefined name GenAIAttributes
(F821)
54-54: Undefined name GenAIAttributes
(F821)
56-56: Undefined name GenAIAttributes
(F821)
59-59: Undefined name GenAIAttributes
(F821)
60-60: Undefined name GenAIAttributes
(F821)
61-61: Undefined name GenAIAttributes
(F821)
63-63: Undefined name GenAIAttributes
(F821)
66-66: Undefined name GenAIAttributes
(F821)
68-68: Undefined name GenAIAttributes
(F821)
71-71: Undefined name GenAIAttributes
(F821)
73-73: Undefined name GenAIAttributes
(F821)
76-76: Undefined name GenAIAttributes
(F821)
77-77: Undefined name GenAIAttributes
(F821)
78-78: Undefined name GenAIAttributes
(F821)
80-80: Undefined name GenAIAttributes
(F821)
82-82: Undefined name GenAIAttributes
(F821)
84-84: Undefined name GenAIAttributes
(F821)
87-87: Undefined name GenAIAttributes
(F821)
88-88: Undefined name GenAIAttributes
(F821)
89-89: Undefined name GenAIAttributes
(F821)
91-91: Undefined name GenAIAttributes
(F821)
122-122: Undefined name GenAIAttributes
(F821)
123-123: Undefined name GenAIAttributes
(F821)
124-124: Undefined name GenAIAttributes
(F821)
125-125: Undefined name GenAIAttributes
(F821)
126-126: Undefined name GenAIAttributes
(F821)
127-127: Undefined name GenAIAttributes
(F821)
129-129: Undefined name GenAIAttributes
(F821)
151-151: Undefined name GenAIAttributes
(F821)
152-152: Undefined name GenAIAttributes
(F821)
153-153: Undefined name GenAIAttributes
(F821)
155-155: Undefined name GenAIAttributes
(F821)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
238-238: Undefined name GenAIAttributes
(F821)
239-239: Undefined name GenAIAttributes
(F821)
242-242: Undefined name GenAIAttributes
(F821)
243-243: Undefined name GenAIAttributes
(F821)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py
3-3: opentelemetry.semconv._incubating.attributes.gen_ai_attributes imported but unused
(F401)
🪛 Flake8 (7.2.0)
packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py
[error] 56-56: undefined name 'GenAIAttributes'
(F821)
[error] 59-59: undefined name 'GenAIAttributes'
(F821)
[error] 111-111: undefined name 'GenAIAttributes'
(F821)
[error] 114-114: undefined name 'GenAIAttributes'
(F821)
[error] 162-162: undefined name 'GenAIAttributes'
(F821)
[error] 165-165: undefined name 'GenAIAttributes'
(F821)
packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py
[error] 49-49: undefined name 'GenAIAttributes'
(F821)
[error] 51-51: undefined name 'GenAIAttributes'
(F821)
[error] 53-53: undefined name 'GenAIAttributes'
(F821)
[error] 55-55: undefined name 'GenAIAttributes'
(F821)
packages/opentelemetry-instrumentation-openai/tests/traces/test_streaming_with_api_usage.py
[error] 44-44: undefined name 'GenAIAttributes'
(F821)
packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py
[error] 52-52: undefined name 'GenAIAttributes'
(F821)
[error] 53-53: undefined name 'GenAIAttributes'
(F821)
[error] 54-54: undefined name 'GenAIAttributes'
(F821)
[error] 56-56: undefined name 'GenAIAttributes'
(F821)
[error] 59-59: undefined name 'GenAIAttributes'
(F821)
[error] 60-60: undefined name 'GenAIAttributes'
(F821)
[error] 61-61: undefined name 'GenAIAttributes'
(F821)
[error] 63-63: undefined name 'GenAIAttributes'
(F821)
[error] 66-66: undefined name 'GenAIAttributes'
(F821)
[error] 68-68: undefined name 'GenAIAttributes'
(F821)
[error] 71-71: undefined name 'GenAIAttributes'
(F821)
[error] 73-73: undefined name 'GenAIAttributes'
(F821)
[error] 76-76: undefined name 'GenAIAttributes'
(F821)
[error] 77-77: undefined name 'GenAIAttributes'
(F821)
[error] 78-78: undefined name 'GenAIAttributes'
(F821)
[error] 80-80: undefined name 'GenAIAttributes'
(F821)
[error] 82-82: undefined name 'GenAIAttributes'
(F821)
[error] 84-84: undefined name 'GenAIAttributes'
(F821)
[error] 87-87: undefined name 'GenAIAttributes'
(F821)
[error] 88-88: undefined name 'GenAIAttributes'
(F821)
[error] 89-89: undefined name 'GenAIAttributes'
(F821)
[error] 91-91: undefined name 'GenAIAttributes'
(F821)
[error] 122-122: undefined name 'GenAIAttributes'
(F821)
[error] 123-123: undefined name 'GenAIAttributes'
(F821)
[error] 124-124: undefined name 'GenAIAttributes'
(F821)
[error] 125-125: undefined name 'GenAIAttributes'
(F821)
[error] 126-126: undefined name 'GenAIAttributes'
(F821)
[error] 127-127: undefined name 'GenAIAttributes'
(F821)
[error] 129-129: undefined name 'GenAIAttributes'
(F821)
[error] 151-151: undefined name 'GenAIAttributes'
(F821)
[error] 152-152: undefined name 'GenAIAttributes'
(F821)
[error] 153-153: undefined name 'GenAIAttributes'
(F821)
[error] 155-155: undefined name 'GenAIAttributes'
(F821)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
[error] 238-238: undefined name 'GenAIAttributes'
(F821)
[error] 239-239: undefined name 'GenAIAttributes'
(F821)
[error] 242-242: undefined name 'GenAIAttributes'
(F821)
[error] 243-243: undefined name 'GenAIAttributes'
(F821)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py
[error] 2-2: 'opentelemetry.semconv._incubating.attributes.gen_ai_attributes as GenAIAttributes' imported but unused
(F401)
🔇 Additional comments (22)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (8)
229-231: Token type metric attribute migrated to GenAI keyUsing GenAIAttributes.GEN_AI_TOKEN_TYPE with "input"/"output" aligns with the GenAI semantic conventions and keeps metric attributes consistent with the rest of the PR.
Also applies to: 256-258
279-282: Span attributes: input/output token usage keys migratedSetting GEN_AI_USAGE_INPUT_TOKENS and GEN_AI_USAGE_OUTPUT_TOKENS on the span looks good and matches the new terminology.
286-292: Span attributes: cache token usage keys migratedGEN_AI_USAGE_CACHE_READ_INPUT_TOKENS and GEN_AI_USAGE_CACHE_CREATION_INPUT_TOKENS updates are consistent and correct.
343-345: Sync path: token_type metric attribute migrated to GenAI keyMirrors the async path; no issues spotted.
Also applies to: 370-372
393-396: Sync path: input/output token usage span attributes migratedConsistent with async path; looks good.
400-406: Sync path: cache token usage span attributes migratedConsistent with the async path and PR intent.
538-540: Set GEN_AI_SYSTEM attribute to 'Anthropic'Correct usage of GenAI system identity for both sync and async wrappers.
Also applies to: 662-664
279-284: No GEN_AI_USAGE_TOTAL_TOKENS constant found—no changes needed
I searched the repo forGEN_AI_USAGE_TOTAL_TOKENSand didn’t find any definition. Since that constant doesn’t exist, the optional addition isn’t applicable.packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/__init__.py (1)
33-35: GenAI attributes import aligns with the migration — LGTMImporting
gen_ai_attributes as GenAIAttributesmatches the GenAI semconv migration and keeps usage ergonomic at call sites.packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py (6)
23-25: LGTM: Using GenAIAttributes alias from semconv incubating attributesImport aligns with the GenAI semantic conventions migration and keeps usage readable across the file.
154-169: LGTM: Migrated system/model attributes to GenAI keysSetting GEN_AI_SYSTEM, GEN_AI_REQUEST_MODEL, and GEN_AI_RESPONSE_MODEL on the span follows the updated conventions and matches the assistant model source.
172-188: LGTM: Per-prompt attributes migrated to GEN_AI_PROMPT pathStoring role/content under f"{GenAIAttributes.GEN_AI_PROMPT}.{index}.*" for assistant and run instructions looks correct and consistent with the GenAI schema.
275-287: LGTM: Streaming wrapper uses GenAI system/model attributesUsing GEN_AI_REQUEST_MODEL, GEN_AI_SYSTEM, and GEN_AI_RESPONSE_MODEL here mirrors the non-streaming flow and maintains consistency.
296-303: LGTM: Streaming wrapper emits/records assistant instructions under GEN_AI_PROMPTPer-prompt role/content storage for the assistant instructions is correct.
307-310: LGTM: Streaming wrapper records run instructions under GEN_AI_PROMPTRole/content mapping for the run instructions under the GenAI prompt path looks good.
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py (1)
85-89: LGTM: system/model/response attributes migrated to GenAI namespace.Correct mapping to
GEN_AI_SYSTEM,GEN_AI_REQUEST_MODEL,GEN_AI_RESPONSE_MODEL, andGEN_AI_RESPONSE_ID.packages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.py (2)
7-9: LGTM: importing GenAIAttributes aliasImporting the incubating GenAIAttributes alias is correct and aligns the tests with the new semantic conventions.
51-58: LGTM: token_type and system attributes migrated to GEN_AI_ in metrics assertions*The updates to use GenAIAttributes.GEN_AI_TOKEN_TYPE and GenAIAttributes.GEN_AI_SYSTEM (and keeping the value checks) look good and are consistent across helpers and streaming variants.
Also applies to: 71-73, 103-112, 123-126, 133-139, 145-146
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (3)
18-20: LGTM: adopt GenAIAttributes aliasSwitching to the incubating GenAIAttributes alias is correct and keeps this module aligned with the new GenAI semantic conventions.
130-136: LGTM: prompt role/content and tool_call_id migrated under GEN_AI_PROMPT pathPrompt attributes and tool call prefixing under GEN_AI_PROMPT are consistent with the rest of the migration. No functional issues spotted.
Also applies to: 168-199
321-340: Check GenAI total-tokens attribute for consistencyWe currently emit input/output/cache-read tokens under
GenAIAttributes.GEN_AI_USAGE_*but keep total tokens underSpanAttributes.LLM_USAGE_TOTAL_TOKENS. Please:
- Verify whether the OpenTelemetry GenAI semconv defines a
GEN_AI_USAGE_TOTAL_TOKENSconstant.
- If it exists, migrate for consistency:
_set_span_attribute( span,SpanAttributes.LLM_USAGE_TOTAL_TOKENS,
)GenAIAttributes.GEN_AI_USAGE_TOTAL_TOKENS, total_tokens,- Otherwise, add a brief comment explaining why `LLM_USAGE_TOTAL_TOKENS` is intentionally retained.• Apply the same adjustment in the block at lines 342–361.
packages/opentelemetry-instrumentation-openai/tests/traces/test_streaming_with_api_usage.py (1)
16-18: No change needed — instrument_legacy already emits GenAI attributesShort reason: instrument_legacy instruments OpenAI with OpenAIInstrumentor and the instrumentation code sets GenAI attributes (GenAIAttributes.*) alongside the LLM SpanAttributes, so the test assertions are valid.
Evidence (key locations):
- packages/opentelemetry-instrumentation-openai/tests/conftest.py — instrument_legacy creates and instruments OpenAIInstrumentor.
- packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/config.py — Config.use_legacy_attributes exists (default True) but does not prevent GenAI attribute emission.
- packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/init.py — _set_span_attribute writes attributes (used with GenAIAttributes keys).
- packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py — sets GenAIAttributes.GEN_AI_RESPONSE_MODEL and GEN_AI_USAGE_* (and also SpanAttributes.LLM_USAGE_TOTAL_TOKENS).
- packages/opentelemetry-instrumentation-openai/tests/traces/test_streaming_with_api_usage.py — the test asserts both SpanAttributes.LLM_USAGE_* and GenAIAttributes.GEN_AI_RESPONSE_MODEL.
Conclusion: the original concern is incorrect for this codebase; no fixture change is required.
Likely an incorrect or invalid review comment.
packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py
Show resolved
Hide resolved
...es/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py
Show resolved
Hide resolved
...es/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py
Show resolved
Hide resolved
packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/__init__.py
Outdated
Show resolved
Hide resolved
packages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.py
Show resolved
Hide resolved
packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py
Show resolved
Hide resolved
packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py
Show resolved
Hide resolved
...elemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
Outdated
Show resolved
Hide resolved
packages/opentelemetry-instrumentation-openai/tests/traces/test_streaming_with_api_usage.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py (6)
1661-1666: Leftover legacy usage keys: prompt_tokens/completion_tokens should be input_tokens/output_tokens.These assertions still reference gen_ai.usage.prompt_tokens/completion_tokens and will diverge from instrumentation emitting input_tokens/output_tokens.
Apply this diff:
- assert anthropic_span.attributes["gen_ai.usage.prompt_tokens"] == 568 + assert anthropic_span.attributes["gen_ai.usage.input_tokens"] == 568 assert ( - anthropic_span.attributes["gen_ai.usage.completion_tokens"] - + anthropic_span.attributes["gen_ai.usage.prompt_tokens"] + anthropic_span.attributes["gen_ai.usage.output_tokens"] + + anthropic_span.attributes["gen_ai.usage.input_tokens"] == anthropic_span.attributes["llm.usage.total_tokens"] )
1821-1826: Same issue: replace prompt/completion_tokens with input/output_tokens.- assert anthropic_span.attributes["gen_ai.usage.prompt_tokens"] == 568 + assert anthropic_span.attributes["gen_ai.usage.input_tokens"] == 568 assert ( - anthropic_span.attributes["gen_ai.usage.completion_tokens"] - + anthropic_span.attributes["gen_ai.usage.prompt_tokens"] + anthropic_span.attributes["gen_ai.usage.output_tokens"] + + anthropic_span.attributes["gen_ai.usage.input_tokens"] == anthropic_span.attributes["llm.usage.total_tokens"] )
2007-2012: Same issue: replace prompt/completion_tokens with input/output_tokens (streaming tools legacy).- assert anthropic_span.attributes["gen_ai.usage.prompt_tokens"] == 506 + assert anthropic_span.attributes["gen_ai.usage.input_tokens"] == 506 assert ( - anthropic_span.attributes["gen_ai.usage.completion_tokens"] - + anthropic_span.attributes["gen_ai.usage.prompt_tokens"] + anthropic_span.attributes["gen_ai.usage.output_tokens"] + + anthropic_span.attributes["gen_ai.usage.input_tokens"] == anthropic_span.attributes["llm.usage.total_tokens"] )
2134-2139: Same issue: replace prompt/completion_tokens with input/output_tokens (tools streaming with events).- assert anthropic_span.attributes["gen_ai.usage.prompt_tokens"] == 514 + assert anthropic_span.attributes["gen_ai.usage.input_tokens"] == 514 assert ( - anthropic_span.attributes["gen_ai.usage.completion_tokens"] - + anthropic_span.attributes["gen_ai.usage.prompt_tokens"] + anthropic_span.attributes["gen_ai.usage.output_tokens"] + + anthropic_span.attributes["gen_ai.usage.input_tokens"] == anthropic_span.attributes["llm.usage.total_tokens"] )
2247-2252: Same issue: replace prompt/completion_tokens with input/output_tokens (tools streaming with events/no-content).- assert anthropic_span.attributes["gen_ai.usage.prompt_tokens"] == 514 + assert anthropic_span.attributes["gen_ai.usage.input_tokens"] == 514 assert ( - anthropic_span.attributes["gen_ai.usage.completion_tokens"] - + anthropic_span.attributes["gen_ai.usage.prompt_tokens"] + anthropic_span.attributes["gen_ai.usage.output_tokens"] + + anthropic_span.attributes["gen_ai.usage.input_tokens"] == anthropic_span.attributes["llm.usage.total_tokens"] )
2469-2471: Update legacy usage keys in testsThe test file
packages/opentelemetry-instrumentation-anthropic/tests/test_messages.pystill contains assertions against the oldgen_ai.usage.*keys. Please replace all occurrences of:
gen_ai.usage.prompt_tokensgen_ai.usage.completion_tokenswith the new
llm.usage.*equivalents. Specifically, update the assertions at:
- Lines 1661–1665
- Lines 1821–1825
- Lines 1924–1928
- Lines 2007–2011
- Lines 2134–2138
- Lines 2247–2251
Suggested diff for each block (using the first occurrence as an example):
- assert anthropic_span.attributes["gen_ai.usage.prompt_tokens"] == 568 + assert anthropic_span.attributes["llm.usage.prompt_tokens"] == 568 - assert ( - anthropic_span.attributes["gen_ai.usage.completion_tokens"] - + anthropic_span.attributes["gen_ai.usage.prompt_tokens"] - == anthropic_span.attributes["llm.usage.total_tokens"] - ) + assert ( + anthropic_span.attributes["llm.usage.completion_tokens"] + + anthropic_span.attributes["llm.usage.prompt_tokens"] + == anthropic_span.attributes["llm.usage.total_tokens"] + )Apply the same replacement at the other five locations to fully remove legacy keys.
♻️ Duplicate comments (6)
packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py (1)
12-14: Fix import: aliasing the module as GenAIAttributes breaks constant accessYou’re importing the module
gen_ai_attributesand aliasing it asGenAIAttributes, but the code expects the class/symbolGenAIAttributes(so thatGenAIAttributes.GEN_AI_PROMPTresolves). Import the symbol directly to avoid runtime/name errors and to align with the rest of the repo.Apply one of the following diffs (Option A recommended as the stable alias used across the repo):
Option A (preferred):
-from opentelemetry.semconv._incubating.attributes import ( - gen_ai_attributes as GenAIAttributes, -) +from opentelemetry.semconv_ai import GenAIAttributesOption B (if you want to stick to incubating paths):
-from opentelemetry.semconv._incubating.attributes import ( - gen_ai_attributes as GenAIAttributes, -) +from opentelemetry.semconv._incubating.attributes.gen_ai_attributes import GenAIAttributesTo confirm the canonical import used elsewhere and catch this pattern if it appears again, run:
#!/bin/bash # Inspect imports of GenAIAttributes across the repo rg -n -C2 -P --type=py 'from\s+opentelemetry\.(semconv_ai|semconv\._incubating\.attributes(\.gen_ai_attributes)?)\s+import\s+GenAIAttributes' # Find any remaining module-alias imports that could be incorrect rg -n -C2 -P --type=py 'from\s+opentelemetry\.semconv\._incubating\.attributes\s+import\s*\(\s*gen_ai_attributes\s+as\s+GenAIAttributes'packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (1)
6-8: Undefined name fixed: GenAIAttributes import added — LGTMThis resolves the prior F821 (Ruff/Flake8) by defining GenAIAttributes via the incubating semconv alias.
packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py (1)
3-5: LGTM: GenAIAttributes migration is correct and resolves earlier F821.
- Importing GenAIAttributes and switching to GEN_AI_PROMPT/GEN_AI_COMPLETION plus GEN_AI_USAGE_INPUT_TOKENS/GEN_AI_USAGE_OUTPUT_TOKENS is consistent with the PR objective.
- Keeping SpanAttributes.LLM_USAGE_TOTAL_TOKENS for totals is acceptable.
Also applies to: 57-63, 70-81, 114-132, 164-189
packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/__init__.py (2)
248-250: Resolved: GEN_AI_SYSTEM set via GenAIAttributes with canonical provider id "groq"Lowercased provider id and switch to GenAIAttributes look correct and align with the repo-wide migration.
330-332: Async path mirrors the fix — LGTMThe async wrapper now sets GEN_AI_SYSTEM to "groq" via GenAIAttributes consistently with the sync path.
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)
10-12: Resolved: GenAIAttributes import added (addresses prior undefined symbol).This fixes the earlier F821 lint/runtime errors flagged in the previous review. The alias usage below compiles.
🧹 Nitpick comments (7)
packages/opentelemetry-instrumentation-openai/tests/traces/test_streaming_with_api_usage.py (1)
41-45: Also migrate total tokens to GenAI for consistencyYou migrated input/output token keys to GenAI, but total tokens still uses the legacy LLM key. For consistency with the PR objective and to avoid mixed semantics, switch to the GenAI total tokens attribute.
Apply:
- assert span.attributes.get(SpanAttributes.LLM_USAGE_TOTAL_TOKENS) > 0 + assert span.attributes.get(GenAIAttributes.GEN_AI_USAGE_TOTAL_TOKENS) > 0Additionally, once updated, remove the now-unused import:
-from opentelemetry.semconv_ai import SpanAttributesPlease confirm your installed semconv version defines GEN_AI_USAGE_TOTAL_TOKENS under gen_ai_attributes.
packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (1)
67-69: No GenAI total_tokens attribute emitted yet – continue using LLM_USAGE_TOTAL_TOKENS with a future plan to migrateThe instrumentation currently only sets
SpanAttributes.LLM_USAGE_TOTAL_TOKENS(there is noGenAIAttributes.GEN_AI_USAGE_TOTAL_TOKENSin span_utils). If you’d like to adopt the new GenAI namespace later, you can:• Add support for
GenAIAttributes.GEN_AI_USAGE_TOTAL_TOKENSin your instrumentation code
• Update tests to prefer the GenAI total with an LLM fallback, for example in
packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py Lines 67–69- assert openai_span.attributes[SpanAttributes.LLM_USAGE_TOTAL_TOKENS] == 35 + # Prefer GenAI total tokens once implemented, fallback to LLM total: + assert openai_span.attributes.get( + GenAIAttributes.GEN_AI_USAGE_TOTAL_TOKENS, + openai_span.attributes.get(SpanAttributes.LLM_USAGE_TOTAL_TOKENS), + ) == 35This is an optional, forward-looking refactor to align with GenAI semantic conventions once total_tokens is emitted under that namespace.
packages/opentelemetry-instrumentation-anthropic/tests/test_completion.py (1)
41-43: Optional: Prefer constants over string literals for response IDs.Consider using GenAIAttributes.GEN_AI_RESPONSE_ID instead of the string "gen_ai.response.id" to avoid drift if keys change again.
packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/__init__.py (1)
247-251: Optional: factor shared span attributes to avoid duplicationBoth wrappers build the same attributes dict. Consider a tiny helper to keep them in one place.
Apply this localized change in each start_span call:
- span = tracer.start_span( - name, - kind=SpanKind.CLIENT, - attributes={ - GenAIAttributes.GEN_AI_SYSTEM: "groq", - SpanAttributes.LLM_REQUEST_TYPE: LLMRequestTypeValues.COMPLETION.value, - }, - ) + span = tracer.start_span( + name, + kind=SpanKind.CLIENT, + attributes=_groq_span_base_attributes(), + )Add this helper once in the module (e.g., near other small helpers):
def _groq_span_base_attributes() -> dict: return { GenAIAttributes.GEN_AI_SYSTEM: "groq", SpanAttributes.LLM_REQUEST_TYPE: LLMRequestTypeValues.COMPLETION.value, }Also applies to: 326-333
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (3)
158-161: Avoid writing completion. tool attributes on a tool span.*These attributes are prefixed under completion.* but are being emitted on the tool/function span. This is semantically confusing. The tool span already carries top-level tool fields (gen_ai.tool.). Completion tool calls belong under the response span as gen_ai.completion.{i}.tool_calls. (which you already handle later).
Recommend removing these lines from the tool span:
- f"{GenAIAttributes.GEN_AI_COMPLETION}.tool.name": tool_name, - f"{GenAIAttributes.GEN_AI_COMPLETION}.tool.type": "FunctionTool", - f"{GenAIAttributes.GEN_AI_COMPLETION}.tool.strict_json_schema": True @@ - tool_attributes[f"{GenAIAttributes.GEN_AI_COMPLETION}.tool.description"] = descAlso applies to: 167-167
279-294: Model settings attributes migrated correctly; consider using the model alias for consistency.Temperature, max_tokens, and top_p use GenAIAttributes. The model field still uses a string literal ("gen_ai.request.model"). For consistency with the rest of the repo and the second file, consider switching to the alias.
- otel_span.set_attribute("gen_ai.request.model", response.model) + otel_span.set_attribute(GenAIAttributes.GEN_AI_REQUEST_MODEL, response.model)Apply similarly where model is set later (Line 396 and Line 480).
474-479: Propagating model settings to agent spans is consistent with GenAI aliases.Temperature, max_tokens, and top_p use GenAIAttributes; consider also switching the model literal to the alias, as noted above.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (13)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py(12 hunks)packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py(4 hunks)packages/opentelemetry-instrumentation-anthropic/tests/test_completion.py(1 hunks)packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py(28 hunks)packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/__init__.py(3 hunks)packages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.py(9 hunks)packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py(2 hunks)packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py(4 hunks)packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py(10 hunks)packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py(7 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/test_completions.py(10 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/test_streaming_with_api_usage.py(3 hunks)packages/traceloop-sdk/tests/test_prompt_management.py(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- packages/traceloop-sdk/tests/test_prompt_management.py
- packages/opentelemetry-instrumentation-openai/tests/traces/test_completions.py
- packages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit Inference Engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/__init__.pypackages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.pypackages/opentelemetry-instrumentation-anthropic/tests/test_completion.pypackages/opentelemetry-instrumentation-anthropic/tests/test_messages.pypackages/opentelemetry-instrumentation-openai/tests/traces/test_streaming_with_api_usage.pypackages/opentelemetry-instrumentation-langchain/tests/test_langgraph.pypackages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.pypackages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.pypackages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.pypackages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py
🧠 Learnings (1)
📚 Learning: 2025-08-17T15:06:48.109Z
Learnt from: CR
PR: traceloop/openllmetry#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Semantic conventions must follow the OpenTelemetry GenAI specification
Applied to files:
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py
🧬 Code Graph Analysis (2)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py (2)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (1)
_set_span_attribute(30-37)packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (1)
should_emit_events(174-181)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py (1)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py (1)
set_span_attribute(21-25)
🔇 Additional comments (23)
packages/opentelemetry-instrumentation-openai/tests/traces/test_streaming_with_api_usage.py (3)
3-5: Resolved: GenAI attributes import added correctlyThis addresses the earlier NameError complaint and unblocks the test from import-time failure. The aliasing pattern is consistent with the broader migration.
48-48: LGTM: response model assertion updated to GenAI keyUsing GEN_AI_RESPONSE_MODEL aligns with the new naming and the provider’s model id.
70-73: LGTM: usage assertions migrated to GenAI keys in second testBoth input and output tokens now assert against the new GenAI attributes. Matches the migration.
packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py (3)
54-96: LGTM: Assertions correctly migrated to GEN_AI_PROMPT and validate content/tool_callsThe checks cover role, content, and tool_call metadata on the assistant flow and match the updated GenAI attribute namespace.
124-132: LGTM: Tool-only assistant message behavior is asserted under GEN_AI_PROMPTGood coverage for the “no content, only tool_calls” case. Keys and values match expected naming.
153-158: LGTM: Content-only assistant message behavior is asserted under GEN_AI_PROMPTAssertions align with the GenAI naming and validate that no tool_calls attributes are emitted.
packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (3)
50-58: GenAI request.model and prompt attributes migration — LGTMUsing GenAIAttributes for request model and prompt list indices looks correct and consistent with the new semconv.
60-66: GenAI completion attributes migration — LGTMCompletion content and role assertions correctly use GenAIAttributes.
49-54: Keep LLM semconv for request type
The GenAI semantic conventions package does not define aGEN_AI_REQUEST_TYPEattribute—onlyllm.request.typeis emitted by the instrumentation. Leave this assertion unchanged:- assert openai_span.attributes[SpanAttributes.LLM_REQUEST_TYPE] == "chat" + assert openai_span.attributes[SpanAttributes.LLM_REQUEST_TYPE] == "chat"Likely an incorrect or invalid review comment.
packages/opentelemetry-instrumentation-anthropic/tests/test_completion.py (1)
35-40: LGTM: Assertions updated to new GEN_AI_ keys.*Span assertions now use GenAIAttributes.GEN_AI_PROMPT and GEN_AI_COMPLETION. This aligns with the new semconv mapping. Nice cleanup.
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py (1)
73-91: LGTM: Request attributes migrated to GenAI request keys.GEN_AI_REQUEST_MODEL, GEN_AI_REQUEST_MAX_TOKENS, GEN_AI_REQUEST_TEMPERATURE, and GEN_AI_REQUEST_TOP_P usage looks consistent. Leaving LLM_* penalties and is_streaming as-is is fine given current semconv coverage.
packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py (1)
108-113: LGTM (partial): Updated assertions to input_tokens/output_tokens and GEN_AI_ keys.*The migrated sections use GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS/OUTPUT_TOKENS and GEN_AI_PROMPT/COMPLETION correctly, while preserving LLM_USAGE_TOTAL_TOKENS. Good consistency across sync/async and streaming paths.
Also applies to: 155-160, 219-225, 291-296, 345-350, 406-411, 572-577, 627-632, 689-694, 757-762, 811-816, 876-881, 944-949, 991-996, 1049-1054, 1122-1127, 1176-1181, 1240-1245, 1297-1302, 1437-1442, 1539-1544
packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/__init__.py (1)
33-35: Confirm & Pin Minimum opentelemetry-semantic-conventions Version for GenAI AttributesThe import
from opentelemetry.semconv._incubating.attributes import gen_ai_attributes as GenAIAttributesrequires a semconv release that actually ships the
opentelemetry/semconv/_incubating/attributes.pymodule. In yourpyproject.tomlyou currently have:
opentelemetry-semantic-conventions = ">=0.50b0"opentelemetry-semantic-conventions-ai = "^0.4.12"Please verify which
opentelemetry-semantic-conventionsversion first introduced theincubating.attributesmodule (and itsgen_ai_attributesfunction) and bump the minimum version accordingly to avoid runtime ImportErrors in older environments.• File to update:
packages/opentelemetry-instrumentation-groq/pyproject.toml
• Dependency entry:
opentelemetry-semantic-conventions = ">=<version-that-includes-gen_ai_attributes>"
• Code location relying on it:
packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/init.pypackages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (5)
240-246: Prompt attributes migration looks correct.Recording role/content under GEN_AI_PROMPT.{i}.* aligns with GenAI semconv.
306-341: Completion attributes migration looks correct.Content, role, and finish_reason are emitted under GEN_AI_COMPLETION.{i}.*, and tool_calls are structured under the correct prefix.
346-355: Usage tokens: correct input/output mapping and names.Handles both input_tokens/output_tokens and legacy prompt/completion fields, mapping to GEN_AI_USAGE_INPUT_TOKENS and GEN_AI_USAGE_OUTPUT_TOKENS.
411-444: Legacy path: completion attributes aligned with GenAI.Even in the fallback branch, completions are emitted via GEN_AI_COMPLETION.{i}.*, consistent with the new naming.
450-458: Legacy path: usage tokens mapping correct.Maintains input→input_tokens/prompt_tokens and output→output_tokens/completion_tokens mapping.
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py (5)
23-25: GenAIAttributes import added — good.Consistent with other instrumentations in this PR.
156-168: System and model attributes correctly migrated to GenAI.GEN_AI_SYSTEM, GEN_AI_REQUEST_MODEL, and GEN_AI_RESPONSE_MODEL usage aligns with the spec.
172-178: Per-prompt attributes correctly migrated.Recording role/content under GEN_AI_PROMPT.{index}.* is correct, both for assistant config and run instructions.
Also applies to: 182-187
195-212: Completion prefix usage is correct.Using prefix = f"{GenAIAttributes.GEN_AI_COMPLETION}.{completion_index}" and writing role/content under it is consistent with the GenAI conventions.
276-287: Stream wrapper: attributes migrated correctly.System/model attributes and per-prompt system messages use the GenAI aliases consistently.
Also applies to: 296-310
...pentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py
Show resolved
Hide resolved
...lemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py
Show resolved
Hide resolved
No worries :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
packages/opentelemetry-instrumentation-langchain/tests/test_structured_output.py (1)
92-104: IndexError risk: asserting 1 log but accessing logs[1]
assert len(logs) == 1conflicts withlogs[1]. Pick one behavior and align.Option A: Expect both events (user + choice):
- assert len(logs) == 1 + assert len(logs) == 2Option B: Only the user message is emitted when content is empty:
- assert len(logs) == 1 + assert len(logs) == 1 @@ - # Validate AI choice Event - choice_event = { - "index": 0, - "finish_reason": "stop", - "message": {}, - } - assert_message_in_logs(logs[1], "gen_ai.choice", choice_event) + # No choice event expected when content is emptypackages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py (1)
195-205: Test no longer validates issue #3350; restore serialization checksThe docstring says we’re guarding against an “Invalid type dict” warning and expect serialized reasoning. Current assertions skip that and even assert absence in Line 197, which defeats the regression test.
Apply:
- # Verify the reasoning attributes are properly set without causing warnings - # assert span.attributes["gen_ai.request.reasoning_effort"] == "medium" - # assert span.attributes["gen_ai.request.reasoning_summary"] == "auto" - # This should not cause an "Invalid type dict" warning and should contain serialized reasoning - assert "gen_ai.completion.0.reasoning" not in span.attributes - # The reasoning should be serialized as JSON since it contains complex data - # reasoning_attr = span.attributes["gen_ai.completion.0.reasoning"] - # assert isinstance(reasoning_attr, str) - # Should be valid JSON containing reasoning summary data - # import json - # parsed_reasoning = json.loads(reasoning_attr) - # assert isinstance(parsed_reasoning, (dict, list)) # Could be dict or list depending on response structure + # Verify the reasoning attributes are properly set without causing warnings + # Request fields (optional, depending on instrumentation contract): + # assert span.attributes.get("gen_ai.request.reasoning_effort") == "medium" + # assert span.attributes.get("gen_ai.request.reasoning_summary") == "auto" + # This should not cause an "Invalid type dict" warning and should contain serialized reasoning + assert "gen_ai.completion.0.reasoning" in span.attributes + # The reasoning should be serialized as JSON since it contains complex data + reasoning_attr = span.attributes["gen_ai.completion.0.reasoning"] + assert isinstance(reasoning_attr, str) + # Should be valid JSON containing reasoning summary data + import json + parsed_reasoning = json.loads(reasoning_attr) + assert isinstance(parsed_reasoning, (dict, list))packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py (1)
291-304: Avoid token-counting fallbacks returning 0 due to None/non-text blocks.Filter to text blocks and skip falsy values before counting.
- else: - raw_messages.extend([content.get("text") for content in message]) + else: + raw_messages.extend( + [ + c.get("text") + for c in message + if isinstance(c, dict) and c.get("type") == "text" and c.get("text") + ] + ) @@ - completion_tokens = _count_anthropic_tokens( - [content.get("text") for content in response_body.get("content")] - ) + completion_tokens = _count_anthropic_tokens( + [ + c.get("text") + for c in response_body.get("content") + if isinstance(c, dict) and c.get("type") == "text" and c.get("text") + ] + ) @@ - for message in messages: - count += anthropic_client.count_tokens(text=message) + for message in messages: + if not message: + continue + count += anthropic_client.count_tokens(text=message)Also applies to: 323-329
♻️ Duplicate comments (6)
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (1)
627-632: Fixed: use GenAIAttributes for completion pathThis addresses the prior review about using the wrong constant source for completion paths. Good catch.
Also applies to: 803-807
packages/opentelemetry-instrumentation-openai/tests/traces/test_assistant.py (1)
86-87: Previous concern resolved: legacy LLM_USAGE_ assertions have been migrated.*Also applies to: 169-170, 249-250, 319-320, 387-388, 461-462, 541-542, 620-621, 715-716, 798-799, 866-867, 969-970, 1047-1048, 1115-1116, 1203-1204
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py (4)
470-476: Amazon inputText: use role/content keys, not ".user".- _set_span_attribute( - span, - f"{GenAIAttributes.GEN_AI_PROMPT}.0.user", - request_body.get("inputText"), - ) + _set_span_attribute( + span, + f"{GenAIAttributes.GEN_AI_PROMPT}.0.content", + request_body.get("inputText"), + ) + _set_span_attribute(span, f"{GenAIAttributes.GEN_AI_PROMPT}.0.role", "user")
41-50: Anthropic messages: index bug and potential serialization error.Content is always written to index 0; also json.dumps without default may raise on non-serializable blocks.
Apply:
_set_span_attribute( span, - f"{GenAIAttributes.GEN_AI_PROMPT}.0.content", - json.dumps(message.get("content")), + f"{GenAIAttributes.GEN_AI_PROMPT}.{idx}.content", + json.dumps(message.get("content"), default=str), )
246-254: Anthropic content branch: writing role into .content; also add safe JSON default.First write should target .role; keep payload in .content with default=str to avoid serialization errors.
- _set_span_attribute( - span, f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content", "assistant" - ) + _set_span_attribute( + span, f"{GenAIAttributes.GEN_AI_COMPLETION}.0.role", "assistant" + ) _set_span_attribute( span, - f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content", - json.dumps(response_body.get("content")), + f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content", + json.dumps(response_body.get("content"), default=str), )
147-151: Use role/content keys instead of deprecated ".user" for prompts.- _set_span_attribute( - span, f"{GenAIAttributes.GEN_AI_PROMPT}.0.user", request_body.get("prompt") - ) + _set_span_attribute( + span, f"{GenAIAttributes.GEN_AI_PROMPT}.0.content", request_body.get("prompt") + ) + _set_span_attribute(span, f"{GenAIAttributes.GEN_AI_PROMPT}.0.role", "user")
🧹 Nitpick comments (39)
packages/opentelemetry-instrumentation-langchain/tests/test_structured_output.py (1)
29-29: Avoid unused assignment; call directlyNo need to bind to
_—just invoke.- _ = model_with_structured_output.invoke(query) + model_with_structured_output.invoke(query)packages/opentelemetry-instrumentation-openai/tests/traces/test_prompt_caching.py (6)
70-71: Prefer semconv constants over raw strings for token keysUse
GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENSandGEN_AI_USAGE_OUTPUT_TOKENSto avoid typos (like the one above) and track future renames centrally.- assert cache_creation_span.attributes["gen_ai.usage.input_tokens"] == 1149 - assert cache_creation_span.attributes["gen_ai.usage.output_tokens"] == 315 + assert cache_creation_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 1149 + assert cache_creation_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] == 315 - assert cache_read_span.attributes["gen_ai.usage.input_tokens"] == 1149 - assert cache_read_span.attributes["gen_ai.usage.output_tokens"] == 353 + assert cache_read_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 1149 + assert cache_read_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] == 353Also applies to: 74-75
131-134: Use semconv constants for usage keysKeeps tests resilient to renames and avoids string typos.
- assert cache_creation_span.attributes["gen_ai.usage.input_tokens"] == 1149 - assert cache_creation_span.attributes["gen_ai.usage.output_tokens"] == 315 - assert cache_read_span.attributes["gen_ai.usage.input_tokens"] == 1149 - assert cache_read_span.attributes["gen_ai.usage.output_tokens"] == 353 + assert cache_creation_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 1149 + assert cache_creation_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] == 315 + assert cache_read_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 1149 + assert cache_read_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] == 353
221-224: Same constant usage recommendation hereConsistent with other tests.
- assert cache_creation_span.attributes["gen_ai.usage.input_tokens"] == 1149 - assert cache_creation_span.attributes["gen_ai.usage.output_tokens"] == 315 - assert cache_read_span.attributes["gen_ai.usage.input_tokens"] == 1149 - assert cache_read_span.attributes["gen_ai.usage.output_tokens"] == 353 + assert cache_creation_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 1149 + assert cache_creation_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] == 315 + assert cache_read_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 1149 + assert cache_read_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] == 353
315-316: Async variant: switch to constants for usage keysPrevents drift and copy/paste errors.
- assert cache_creation_span.attributes["gen_ai.usage.input_tokens"] == 1150 - assert cache_creation_span.attributes["gen_ai.usage.output_tokens"] == 293 + assert cache_creation_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 1150 + assert cache_creation_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] == 293 - assert cache_read_span.attributes["gen_ai.usage.input_tokens"] == 1150 - assert cache_read_span.attributes["gen_ai.usage.output_tokens"] == 307 + assert cache_read_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 1150 + assert cache_read_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] == 307Also applies to: 318-319
376-380: Async-with-content: use constantsSame rationale as above.
- assert cache_creation_span.attributes["gen_ai.usage.input_tokens"] == 1150 - assert cache_creation_span.attributes["gen_ai.usage.output_tokens"] == 293 - assert cache_read_span.attributes["gen_ai.usage.input_tokens"] == 1150 - assert cache_read_span.attributes["gen_ai.usage.output_tokens"] == 307 + assert cache_creation_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 1150 + assert cache_creation_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] == 293 + assert cache_read_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 1150 + assert cache_read_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] == 307
468-471: Async-without-content: use constantsKeeps all test blocks consistent.
- assert cache_creation_span.attributes["gen_ai.usage.input_tokens"] == 1150 - assert cache_creation_span.attributes["gen_ai.usage.output_tokens"] == 293 - assert cache_read_span.attributes["gen_ai.usage.input_tokens"] == 1150 - assert cache_read_span.attributes["gen_ai.usage.output_tokens"] == 307 + assert cache_creation_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 1150 + assert cache_creation_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] == 293 + assert cache_read_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 1150 + assert cache_read_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] == 307packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (2)
38-43: Switch to GenAI constants looks good; consider a safer presence checkUsing GenAIAttributes for prompt/completion is correct. For existence checks, prefer explicit key membership to avoid false positives on empty strings.
- assert open_ai_span.attributes.get( - f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content") + assert f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content" in open_ai_span.attributes
682-686: Ruff S101 in testsRuff flags assert usage (S101) in these test files. If your lint config doesn’t already ignore S101 for tests, add a per-file ignore (e.g., in ruff.toml: per-file-ignores = { "packages//tests//test_*.py" = ["S101"] }).
Also applies to: 748-753, 751-756
packages/traceloop-sdk/tests/test_prompt_management.py (2)
241-241: Avoid bare assert for presence check (ruff S101) and be explicitPrevents Bandit/Ruff complaints and gives a clearer failure message.
- assert open_ai_span.attributes.get(f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content") + if f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content" not in open_ai_span.attributes: + pytest.fail("Missing completion content attribute on span")
272-279: Harden JSON validation and drop redundant assertionCatch None/TypeError and remove no-op
assert True.- completion = open_ai_span.attributes.get( - f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content" - ) - try: - json.loads(completion) - except json.JSONDecodeError: - pytest.fail("Response is not valid JSON") - assert True + completion = open_ai_span.attributes.get( + f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content" + ) + if completion is None: + pytest.fail("Missing completion content attribute on span") + try: + json.loads(completion) + except (TypeError, json.JSONDecodeError): + pytest.fail("Response is not valid JSON")packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py (2)
101-103: Uncomment input_tokens assertion for consistencyOnly this test leaves the input_tokens check commented, while all others assert it. Either re-enable it or document why this legacy path differs. Prefer enabling to avoid regressions.
Apply:
- # assert cache_creation_span.attributes["gen_ai.usage.input_tokens"] == 1167 + assert cache_creation_span.attributes["gen_ai.usage.input_tokens"] == 1167
190-201: Use GenAIAttributes constants instead of string-literals for usage keys
Replace raw"gen_ai.usage.*"keys with the correspondingGenAIAttributes.GEN_AI_USAGE_*constants (e.g.GEN_AI_USAGE_INPUT_TOKENS,GEN_AI_USAGE_OUTPUT_TOKENS,GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS,GEN_AI_USAGE_CACHE_CREATION_INPUT_TOKENS) in:
- packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py (190–201)
- packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py (1661–1664, 1821–1824, 1924–1927, 2007–2010, 2134–2137, 2247–2250)
packages/opentelemetry-instrumentation-langchain/tests/test_tool_calls.py (6)
36-74: Re-enable span assertions (don’t leave commented checks) and bind result.This test currently only checks logs; the core span validations are commented out. Either delete the dead code or reinstate minimal span assertions using GenAI attributes. Suggest reinstating with a bound
result:- _ = model_with_tools.invoke(query) - # spans = span_exporter.get_finished_spans() - - # span_names = set(span.name for span in spans) - # expected_spans = {"ChatOpenAI.chat"} - # assert expected_spans.issubset(span_names) - - # chat_span = next( - # span for span in spans if span.name == "ChatOpenAI.chat" - # ) - - # assert chat_span.attributes[f"{SpanAttributes.LLM_REQUEST_FUNCTIONS}.0.name"] == "food_analysis" - # assert json.loads(chat_span.attributes[f"{SpanAttributes.LLM_REQUEST_FUNCTIONS}.0.parameters"]) == { - # "properties": { - # "name": {"type": "string"}, - # "healthy": {"type": "boolean"}, - # "calories": {"type": "integer"}, - # "taste_profile": {"type": "array", "items": {"type": "string"}}, - # }, - # "required": ["name", "healthy", "calories", "taste_profile"], - # "type": "object", - # } - - # assert chat_span.attributes[f"{GenAIAttributes.GEN_AI_PROMPT}.0.content"] == query_text - # assert ( - # chat_span.attributes[f"{GenAIAttributes.GEN_AI_COMPLETION}.0.tool_calls.0.name"] - # == "food_analysis" - # ) - - # arguments = chat_span.attributes[ - # f"{GenAIAttributes.GEN_AI_COMPLETION}.0.tool_calls.0.arguments" - # ] - # assert json.loads(arguments) == json.loads( - # result.model_dump() - # .get("additional_kwargs") - # .get("tool_calls")[0] - # .get("function") - # .get("arguments") - # ) + result = model_with_tools.invoke(query) + spans = span_exporter.get_finished_spans() + chat_span = next(span for span in spans if span.name == "ChatOpenAI.chat") + assert chat_span.attributes[f"{GenAIAttributes.GEN_AI_PROMPT}.0.content"] == query_text + assert ( + chat_span.attributes[f"{GenAIAttributes.GEN_AI_COMPLETION}.0.tool_calls.0.name"] + == "food_analysis" + ) + assert chat_span.attributes.get( + f"{GenAIAttributes.GEN_AI_COMPLETION}.0.tool_calls.0.arguments" + )
213-256: LGTM: migrated to GenAI prompt attributes and roles. Also remove duplicate function-attributes block above.Assertions on
GEN_AI_PROMPTand roles look correct. There’s a duplicate assertions block forSpanAttributes.LLM_REQUEST_FUNCTIONSright above (Lines 204–212) that repeats Lines 195–203. Remove the duplicate to cut noise.- assert chat_span.attributes[f"{SpanAttributes.LLM_REQUEST_FUNCTIONS}.0.name"] == "get_weather" - assert json.loads(chat_span.attributes[f"{SpanAttributes.LLM_REQUEST_FUNCTIONS}.0.parameters"]) == { - "properties": { - "location": {"type": "string"}, - }, - "required": ["location"], - "type": "object", - }
496-523: Avoid brittle hard-coded Anthropic tool_call id; derive from result.Provider IDs can change; prefer comparing to the returned result’s id to keep tests stable.
- assert ( - chat_span.attributes[f"{GenAIAttributes.GEN_AI_COMPLETION}.0.tool_calls.0.id"] - == "toolu_016q9vtSd8CY2vnZSpEp1j4o" - ) + expected_id = result.model_dump()["content"][1]["id"] + assert ( + chat_span.attributes[f"{GenAIAttributes.GEN_AI_COMPLETION}.0.tool_calls.0.id"] + == expected_id + )
707-759: Same here: don’t pin Anthropic tool_call id; use the response’s id.Stabilize by asserting against
resultrather than a literal.- assert ( - chat_span.attributes[f"{GenAIAttributes.GEN_AI_COMPLETION}.0.tool_calls.0.id"] - == "toolu_012guEZNJ5yH5jxHKWAkzCzh" - ) + expected_id = result.model_dump()["content"][1]["id"] + assert ( + chat_span.attributes[f"{GenAIAttributes.GEN_AI_COMPLETION}.0.tool_calls.0.id"] + == expected_id + )
1048-1083: Relax OpenAI tool_call id equality; assert presence instead of fixed literals.Recorded IDs tend to be unstable across providers/cassettes. Keep names/args exact, but make IDs existence checks non-brittle.
- assert ( - chat_span.attributes[f"{GenAIAttributes.GEN_AI_COMPLETION}.0.tool_calls.0.id"] - == "call_EgULHWKqGjuB36aUeiOSpALZ" - ) + id0 = chat_span.attributes[f"{GenAIAttributes.GEN_AI_COMPLETION}.0.tool_calls.0.id"] + assert isinstance(id0, str) and id0 @@ - assert ( - chat_span.attributes[f"{GenAIAttributes.GEN_AI_COMPLETION}.0.tool_calls.1.id"] - == "call_Xer9QGOTDMG2Bxn9AKGiVM14" - ) + id1 = chat_span.attributes[f"{GenAIAttributes.GEN_AI_COMPLETION}.0.tool_calls.1.id"] + assert isinstance(id1, str) and id1
812-812: Remove debug prints from tests.Stray prints add noise to CI logs; rely on assertions instead.
- print(result.model_dump())Also applies to: 959-959
packages/opentelemetry-instrumentation-ollama/tests/test_ollama_metrics.py (3)
72-73: Strengthen the model assertion for tighter signal validationInstead of only checking non-null, assert the expected model value to catch mislabeling.
Apply:
- assert dp.attributes.get(GenAIAttributes.GEN_AI_RESPONSE_MODEL) is not None + assert dp.attributes.get(GenAIAttributes.GEN_AI_RESPONSE_MODEL) == "gemma3:1b"
49-49: Ruff S101 in tests: either ignore or annotateRuff flags
assert(S101) here. In tests this is fine; consider per-file ignore to keep CI green.Options:
- In pyproject.toml:
[tool.ruff.lint.per-file-ignores]
"packages//tests/.py" = ["S101"]- Or annotate lines with
# noqa: S101.Also applies to: 72-73
96-104: Unify on GenAIAttributes for consistency with the PR directionThese checks still use
SpanAttributes.*. For consistency with the updated tests above and the PR’s move to GenAI keys, switch toGenAIAttributes(same underlying keys), then remove the now-unusedSpanAttributesimport.Apply:
- assert dp.attributes.get(SpanAttributes.LLM_SYSTEM) == "Ollama", \ + assert dp.attributes.get(GenAIAttributes.GEN_AI_SYSTEM) == "Ollama", \ @@ - model_name = dp.attributes.get(SpanAttributes.LLM_RESPONSE_MODEL) + model_name = dp.attributes.get(GenAIAttributes.GEN_AI_RESPONSE_MODEL)And drop the import at the top:
-from opentelemetry.semconv_ai import SpanAttributespackages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py (7)
11-11: Avoid throwaway assignment for side-effect-only callCall
openai_client.responses.create(...)directly instead of assigning to_. It’s a no-op variable and slightly obscures the intent.Apply:
- _ = openai_client.responses.create( + openai_client.responses.create(
22-25: Preserve minimal coverage for prompt attributes (non-brittle)Rather than fully commenting these out, assert presence and basic shape. This keeps signal without coupling to exact text.
Apply:
- # assert ( - # span.attributes["gen_ai.prompt.0.content"] == "What is the capital of France?" - # ) - # assert span.attributes["gen_ai.prompt.0.role"] == "user" + assert "gen_ai.prompt.0.content" in span.attributes + assert isinstance(span.attributes["gen_ai.prompt.0.content"], str) and span.attributes["gen_ai.prompt.0.content"] + assert "gen_ai.prompt.0.role" in span.attributes
35-35: Same nit: remove_ =for clarityThe second call also doesn’t need assignment.
Apply:
- _ = openai_client.responses.create( + openai_client.responses.create(
62-78: Validate conversation-history shape without brittle literalsKeep checks lightweight: presence and serializability where applicable.
Apply:
- # assert ( - # span.attributes["gen_ai.prompt.0.content"] - # == "Come up with an adjective in English. Respond with just one word." - # ) - # assert span.attributes["gen_ai.prompt.0.role"] == "user" - # assert json.loads(span.attributes["gen_ai.prompt.1.content"]) == [ - # { - # "type": "output_text", - # "text": first_response.output[0].content[0].text, - # } - # ] - # assert span.attributes["gen_ai.prompt.1.role"] == "assistant" - # assert ( - # span.attributes["gen_ai.prompt.2.content"] - # == "Can you explain why you chose that word?" - # ) - # assert span.attributes["gen_ai.prompt.2.role"] == "user" + assert "gen_ai.prompt.0.content" in span.attributes + assert "gen_ai.prompt.0.role" in span.attributes + assert "gen_ai.prompt.1.content" in span.attributes + # JSON-encoded assistant content should be parseable: + import json + json.loads(span.attributes["gen_ai.prompt.1.content"]) + assert "gen_ai.prompt.1.role" in span.attributes + assert "gen_ai.prompt.2.content" in span.attributes + assert "gen_ai.prompt.2.role" in span.attributes
137-140: Don’t assert specific tool_call IDs; assert existence and typeIDs vary across recordings/providers. Check that some tool_call id(s) exist and are non-empty strings.
Apply:
- # assert ( - # span.attributes["gen_ai.completion.0.tool_calls.0.id"] - # == "fc_685ff89422ec819a977b2ea385bc9b6601c537ddeff5c2a2" - # ) + ids = [ + v for k, v in span.attributes.items() + if k.startswith("gen_ai.completion.0.tool_calls.") and k.endswith(".id") + ] + assert ids and all(isinstance(v, str) and v for v in ids)
164-168: Reasoning request attributes: consider keeping tolerant presence checksGiven the test is gated by
is_reasoning_supported(), you can safely assert the request effort (and optionally summary) without coupling to provider response structure. Example (optional):Would you like me to add guarded assertions such as:
- If "gen_ai.request.reasoning_effort" in attributes, assert it equals the input value.
- Only assert absence of completion reasoning when summary is None.
I can send a patch if you confirm instrumentation guarantees for these keys.
171-171: Optional: usage token sanity checkIf usage metrics are emitted for reasoning models, prefer a non-brittle lower bound (e.g., total_tokens > 0) over exact names. Skip if not guaranteed by the instrumentation.
packages/opentelemetry-instrumentation-openai/tests/traces/test_assistant.py (4)
88-88: Prefer enum value for gen_ai.system over string literal.Avoid hardcoding "openai"; use the canonical enum to reduce brittleness.
- assert open_ai_span.attributes[GenAIAttributes.GEN_AI_SYSTEM] == "openai" + assert open_ai_span.attributes[GenAIAttributes.GEN_AI_SYSTEM] == GenAIAttributes.GenAiSystemValues.OPENAI.valueAlso applies to: 171-171, 251-251, 321-321, 389-389, 463-463, 543-543, 623-623, 717-717, 801-801, 869-869, 971-971, 1049-1049, 1117-1117, 1205-1205
304-306: Unify attribute access: prefer constants over string keys.Use SpanAttributes/GenAIAttributes constants instead of raw strings for request/response model/type to avoid typos.
- assert open_ai_span.attributes["llm.request.type"] == "chat" - assert open_ai_span.attributes["gen_ai.request.model"] == "gpt-4-turbo-preview" - assert open_ai_span.attributes["gen_ai.response.model"] == "gpt-4-turbo-preview" + assert open_ai_span.attributes[SpanAttributes.LLM_REQUEST_TYPE] == "chat" + assert open_ai_span.attributes[GenAIAttributes.GEN_AI_REQUEST_MODEL] == "gpt-4-turbo-preview" + assert open_ai_span.attributes[GenAIAttributes.GEN_AI_RESPONSE_MODEL] == "gpt-4-turbo-preview"Apply similarly in the other polling/event blocks noted in the line ranges.
Also applies to: 384-386, 458-460
1-1: Silence Ruff Bandit S101 for test asserts (optional).If CI flags S101 in tests, disable it at file level.
+ # ruff: noqa: S101 # allow assert in tests import pytest
142-143: Typos in comments: “cassetes” → “cassettes”, “assistent” → “assistant”.- # Now that we have VCR cassetes recorded, we don't need to wait + # Now that we have VCR cassettes recorded, we don't need to wait - # Validate assistent system message Event + # Validate assistant system message EventApply to all similar occurrences in this file.
Also applies to: 185-188, 289-291, 403-410, 471-473
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py (6)
603-626: Broaden numeric guard for token histograms.type(x) is int is too strict (e.g., numpy ints). Use isinstance.
- if ( - metric_params.token_histogram - and type(prompt_tokens) is int - and prompt_tokens >= 0 - ): + if ( + metric_params.token_histogram + and isinstance(prompt_tokens, int) + and prompt_tokens >= 0 + ): @@ - if ( - metric_params.token_histogram - and type(completion_tokens) is int - and completion_tokens >= 0 - ): + if ( + metric_params.token_histogram + and isinstance(completion_tokens, int) + and completion_tokens >= 0 + ):
185-192: Add completion role = "assistant" for Cohere generations.def _set_generations_span_attributes(span, response_body): for i, generation in enumerate(response_body.get("generations")): + _set_span_attribute( + span, f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.role", "assistant" + ) _set_span_attribute( span, f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.content", generation.get("text"), )
356-362: Add completion role = "assistant" for AI21 completions.def _set_span_completions_attributes(span, response_body): for i, completion in enumerate(response_body.get("completions")): + _set_span_attribute( + span, f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.role", "assistant" + ) _set_span_attribute( span, f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.content", completion.get("data").get("text"), )
505-527: Add completion role = "assistant" for Amazon outputs.if "results" in response_body: for i, result in enumerate(response_body.get("results")): + _set_span_attribute( + span, f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.role", "assistant" + ) _set_span_attribute( span, f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.content", result.get("outputText"), ) elif "outputText" in response_body: + _set_span_attribute( + span, f"{GenAIAttributes.GEN_AI_COMPLETION}.0.role", "assistant" + ) _set_span_attribute( span, f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content", response_body.get("outputText"), ) elif "output" in response_body: msgs = response_body.get("output").get("message", {}).get("content", []) for idx, msg in enumerate(msgs): + _set_span_attribute( + span, f"{GenAIAttributes.GEN_AI_COMPLETION}.{idx}.role", "assistant" + ) _set_span_attribute( span, f"{GenAIAttributes.GEN_AI_COMPLETION}.{idx}.content", msg.get("text"), )
561-566: Add completion role = "assistant" for imported models.def _set_imported_model_response_span_attributes(span, response_body): + _set_span_attribute( + span, f"{GenAIAttributes.GEN_AI_COMPLETION}.0.role", "assistant" + ) _set_span_attribute( span, f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content", response_body.get("generation"), )
132-145: Consider aligning guardrail filter attributes with GenAI prompt/completion keys.You’re mixing LLM_* and GEN_AI_*; for consistency with this PR, consider writing filter metadata under GenAI prompt/completion.
- _set_span_attribute( - span, - f"{SpanAttributes.LLM_PROMPTS}.{PROMPT_FILTER_KEY}", - json.dumps(input_filters, default=str) - ) + _set_span_attribute( + span, + f"{GenAIAttributes.GEN_AI_PROMPT}.{PROMPT_FILTER_KEY}", + json.dumps(input_filters, default=str) + ) @@ - _set_span_attribute( - span, - f"{SpanAttributes.LLM_COMPLETIONS}.{CONTENT_FILTER_KEY}", - json.dumps(output_filters, default=str) - ) + _set_span_attribute( + span, + f"{GenAIAttributes.GEN_AI_COMPLETION}.{CONTENT_FILTER_KEY}", + json.dumps(output_filters, default=str) + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (15)
packages/opentelemetry-instrumentation-alephalpha/opentelemetry/instrumentation/alephalpha/__init__.py(4 hunks)packages/opentelemetry-instrumentation-alephalpha/opentelemetry/instrumentation/alephalpha/span_utils.py(3 hunks)packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py(12 hunks)packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/guardrail.py(5 hunks)packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py(24 hunks)packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/span_utils.py(6 hunks)packages/opentelemetry-instrumentation-langchain/tests/test_structured_output.py(3 hunks)packages/opentelemetry-instrumentation-langchain/tests/test_tool_calls.py(6 hunks)packages/opentelemetry-instrumentation-ollama/tests/test_ollama_metrics.py(3 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/test_assistant.py(15 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py(20 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/test_prompt_caching.py(6 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py(7 hunks)packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py(0 hunks)packages/traceloop-sdk/tests/test_prompt_management.py(4 hunks)
💤 Files with no reviewable changes (1)
- packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/init.py
🚧 Files skipped from review as they are similar to previous changes (4)
- packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/guardrail.py
- packages/opentelemetry-instrumentation-alephalpha/opentelemetry/instrumentation/alephalpha/init.py
- packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/span_utils.py
- packages/opentelemetry-instrumentation-alephalpha/opentelemetry/instrumentation/alephalpha/span_utils.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/traceloop-sdk/tests/test_prompt_management.pypackages/opentelemetry-instrumentation-openai/tests/traces/test_responses.pypackages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.pypackages/opentelemetry-instrumentation-langchain/tests/test_structured_output.pypackages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.pypackages/opentelemetry-instrumentation-langchain/tests/test_tool_calls.pypackages/opentelemetry-instrumentation-ollama/tests/test_ollama_metrics.pypackages/opentelemetry-instrumentation-openai/tests/traces/test_assistant.pypackages/opentelemetry-instrumentation-openai/tests/traces/test_chat.pypackages/opentelemetry-instrumentation-openai/tests/traces/test_prompt_caching.py
🧬 Code graph analysis (4)
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py (2)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (1)
_(1094-1124)packages/traceloop-sdk/tests/test_prompt_management.py (1)
openai_client(222-223)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py (2)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
SpanAttributes(64-237)packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/utils.py (1)
should_send_prompts(36-39)
packages/opentelemetry-instrumentation-ollama/tests/test_ollama_metrics.py (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (2)
SpanAttributes(64-237)Meters(36-61)
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
SpanAttributes(64-237)
🪛 Ruff (0.12.2)
packages/traceloop-sdk/tests/test_prompt_management.py
241-241: Use of assert detected
(S101)
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py
169-169: Use of assert detected
(S101)
197-197: Use of assert detected
(S101)
packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py
102-102: Use of assert detected
(S101)
105-105: Use of assert detected
(S101)
106-106: Use of assert detected
(S101)
109-109: Use of assert detected
(S101)
110-110: Use of assert detected
(S101)
111-111: Use of assert detected
(S101)
190-190: Use of assert detected
(S101)
191-191: Use of assert detected
(S101)
194-194: Use of assert detected
(S101)
195-195: Use of assert detected
(S101)
198-198: Use of assert detected
(S101)
199-199: Use of assert detected
(S101)
200-200: Use of assert detected
(S101)
377-377: Use of assert detected
(S101)
378-378: Use of assert detected
(S101)
381-381: Use of assert detected
(S101)
382-382: Use of assert detected
(S101)
385-385: Use of assert detected
(S101)
386-386: Use of assert detected
(S101)
387-387: Use of assert detected
(S101)
506-506: Use of assert detected
(S101)
507-507: Use of assert detected
(S101)
510-510: Use of assert detected
(S101)
511-511: Use of assert detected
(S101)
514-514: Use of assert detected
(S101)
515-515: Use of assert detected
(S101)
516-516: Use of assert detected
(S101)
596-596: Use of assert detected
(S101)
597-597: Use of assert detected
(S101)
600-600: Use of assert detected
(S101)
601-601: Use of assert detected
(S101)
604-604: Use of assert detected
(S101)
605-605: Use of assert detected
(S101)
606-606: Use of assert detected
(S101)
789-789: Use of assert detected
(S101)
790-790: Use of assert detected
(S101)
793-793: Use of assert detected
(S101)
794-794: Use of assert detected
(S101)
797-797: Use of assert detected
(S101)
798-798: Use of assert detected
(S101)
799-799: Use of assert detected
(S101)
921-921: Use of assert detected
(S101)
922-922: Use of assert detected
(S101)
925-925: Use of assert detected
(S101)
926-926: Use of assert detected
(S101)
929-929: Use of assert detected
(S101)
930-930: Use of assert detected
(S101)
931-931: Use of assert detected
(S101)
1014-1014: Use of assert detected
(S101)
1015-1015: Use of assert detected
(S101)
1018-1018: Use of assert detected
(S101)
1019-1019: Use of assert detected
(S101)
1022-1022: Use of assert detected
(S101)
1023-1023: Use of assert detected
(S101)
1024-1024: Use of assert detected
(S101)
1212-1212: Use of assert detected
(S101)
1213-1213: Use of assert detected
(S101)
1216-1216: Use of assert detected
(S101)
1217-1217: Use of assert detected
(S101)
1220-1220: Use of assert detected
(S101)
1221-1221: Use of assert detected
(S101)
1222-1222: Use of assert detected
(S101)
1345-1345: Use of assert detected
(S101)
1346-1346: Use of assert detected
(S101)
1349-1349: Use of assert detected
(S101)
1350-1350: Use of assert detected
(S101)
1353-1353: Use of assert detected
(S101)
1354-1354: Use of assert detected
(S101)
1355-1355: Use of assert detected
(S101)
1439-1439: Use of assert detected
(S101)
1440-1440: Use of assert detected
(S101)
1443-1443: Use of assert detected
(S101)
1444-1444: Use of assert detected
(S101)
1447-1447: Use of assert detected
(S101)
1448-1448: Use of assert detected
(S101)
1449-1449: Use of assert detected
(S101)
1648-1648: Use of assert detected
(S101)
1649-1649: Use of assert detected
(S101)
1652-1652: Use of assert detected
(S101)
1653-1653: Use of assert detected
(S101)
1656-1656: Use of assert detected
(S101)
1657-1657: Use of assert detected
(S101)
1658-1658: Use of assert detected
(S101)
packages/opentelemetry-instrumentation-langchain/tests/test_structured_output.py
38-38: Use of assert detected
(S101)
92-92: Use of assert detected
(S101)
packages/opentelemetry-instrumentation-langchain/tests/test_tool_calls.py
217-217: Use of assert detected
(S101)
219-219: Use of assert detected
(S101)
223-223: Use of assert detected
(S101)
225-225: Use of assert detected
(S101)
229-229: Use of assert detected
(S101)
237-237: Use of assert detected
(S101)
241-241: Use of assert detected
(S101)
243-243: Use of assert detected
(S101)
247-247: Use of assert detected
(S101)
248-248: Use of assert detected
(S101)
250-250: Use of assert detected
(S101)
254-254: Use of assert detected
(S101)
256-256: Use of assert detected
(S101)
500-500: Use of assert detected
(S101)
502-502: Use of assert detected
(S101)
506-506: Use of assert detected
(S101)
510-510: Use of assert detected
(S101)
514-514: Use of assert detected
(S101)
518-518: Use of assert detected
(S101)
711-711: Use of assert detected
(S101)
712-712: Use of assert detected
(S101)
714-714: Use of assert detected
(S101)
718-718: Use of assert detected
(S101)
726-726: Use of assert detected
(S101)
731-731: Use of assert detected
(S101)
732-732: Use of assert detected
(S101)
736-736: Use of assert detected
(S101)
738-738: Use of assert detected
(S101)
742-742: Use of assert detected
(S101)
746-746: Use of assert detected
(S101)
750-750: Use of assert detected
(S101)
754-754: Use of assert detected
(S101)
903-903: Use of assert detected
(S101)
904-904: Use of assert detected
(S101)
905-905: Use of assert detected
(S101)
1051-1051: Use of assert detected
(S101)
1053-1053: Use of assert detected
(S101)
1057-1057: Use of assert detected
(S101)
1061-1061: Use of assert detected
(S101)
1065-1065: Use of assert detected
(S101)
1070-1070: Use of assert detected
(S101)
1074-1074: Use of assert detected
(S101)
1078-1078: Use of assert detected
(S101)
packages/opentelemetry-instrumentation-ollama/tests/test_ollama_metrics.py
49-49: Use of assert detected
(S101)
72-72: Use of assert detected
(S101)
73-73: Use of assert detected
(S101)
packages/opentelemetry-instrumentation-openai/tests/traces/test_assistant.py
67-67: Use of assert detected
(S101)
71-71: Use of assert detected
(S101)
75-75: Use of assert detected
(S101)
76-76: Use of assert detected
(S101)
80-80: Use of assert detected
(S101)
81-81: Use of assert detected
(S101)
82-82: Use of assert detected
(S101)
86-86: Use of assert detected
(S101)
87-87: Use of assert detected
(S101)
88-88: Use of assert detected
(S101)
94-94: Use of assert detected
(S101)
100-100: Use of assert detected
(S101)
164-164: Use of assert detected
(S101)
169-169: Use of assert detected
(S101)
170-170: Use of assert detected
(S101)
171-171: Use of assert detected
(S101)
245-245: Use of assert detected
(S101)
249-249: Use of assert detected
(S101)
250-250: Use of assert detected
(S101)
251-251: Use of assert detected
(S101)
319-319: Use of assert detected
(S101)
320-320: Use of assert detected
(S101)
321-321: Use of assert detected
(S101)
327-327: Use of assert detected
(S101)
333-333: Use of assert detected
(S101)
387-387: Use of assert detected
(S101)
388-388: Use of assert detected
(S101)
389-389: Use of assert detected
(S101)
461-461: Use of assert detected
(S101)
462-462: Use of assert detected
(S101)
463-463: Use of assert detected
(S101)
525-525: Use of assert detected
(S101)
529-529: Use of assert detected
(S101)
533-533: Use of assert detected
(S101)
534-534: Use of assert detected
(S101)
538-538: Use of assert detected
(S101)
539-539: Use of assert detected
(S101)
540-540: Use of assert detected
(S101)
541-541: Use of assert detected
(S101)
542-542: Use of assert detected
(S101)
543-543: Use of assert detected
(S101)
549-549: Use of assert detected
(S101)
555-555: Use of assert detected
(S101)
615-615: Use of assert detected
(S101)
620-620: Use of assert detected
(S101)
621-621: Use of assert detected
(S101)
622-622: Use of assert detected
(S101)
710-710: Use of assert detected
(S101)
715-715: Use of assert detected
(S101)
716-716: Use of assert detected
(S101)
717-717: Use of assert detected
(S101)
783-783: Use of assert detected
(S101)
787-787: Use of assert detected
(S101)
791-791: Use of assert detected
(S101)
792-792: Use of assert detected
(S101)
796-796: Use of assert detected
(S101)
798-798: Use of assert detected
(S101)
799-799: Use of assert detected
(S101)
800-800: Use of assert detected
(S101)
803-803: Use of assert detected
(S101)
807-807: Use of assert detected
(S101)
861-861: Use of assert detected
(S101)
866-866: Use of assert detected
(S101)
867-867: Use of assert detected
(S101)
868-868: Use of assert detected
(S101)
964-964: Use of assert detected
(S101)
969-969: Use of assert detected
(S101)
970-970: Use of assert detected
(S101)
971-971: Use of assert detected
(S101)
1033-1033: Use of assert detected
(S101)
1037-1037: Use of assert detected
(S101)
1041-1041: Use of assert detected
(S101)
1042-1042: Use of assert detected
(S101)
1046-1046: Use of assert detected
(S101)
1047-1047: Use of assert detected
(S101)
1048-1048: Use of assert detected
(S101)
1049-1049: Use of assert detected
(S101)
1052-1052: Use of assert detected
(S101)
1056-1056: Use of assert detected
(S101)
1110-1110: Use of assert detected
(S101)
1115-1115: Use of assert detected
(S101)
1116-1116: Use of assert detected
(S101)
1117-1117: Use of assert detected
(S101)
1198-1198: Use of assert detected
(S101)
1203-1203: Use of assert detected
(S101)
1204-1204: Use of assert detected
(S101)
1205-1205: Use of assert detected
(S101)
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py
41-41: Use of assert detected
(S101)
199-199: Use of assert detected
(S101)
200-200: Use of assert detected
(S101)
204-204: Use of assert detected
(S101)
211-211: Use of assert detected
(S101)
215-215: Use of assert detected
(S101)
264-264: Use of assert detected
(S101)
343-343: Use of assert detected
(S101)
417-417: Use of assert detected
(S101)
418-418: Use of assert detected
(S101)
422-422: Use of assert detected
(S101)
429-429: Use of assert detected
(S101)
433-433: Use of assert detected
(S101)
490-490: Use of assert detected
(S101)
576-576: Use of assert detected
(S101)
630-630: Use of assert detected
(S101)
806-806: Use of assert detected
(S101)
1378-1378: Use of assert detected
(S101)
1382-1382: Use of assert detected
(S101)
1386-1386: Use of assert detected
(S101)
1391-1391: Use of assert detected
(S101)
1392-1392: Use of assert detected
(S101)
1396-1396: Use of assert detected
(S101)
1400-1400: Use of assert detected
(S101)
1404-1404: Use of assert detected
(S101)
1405-1405: Use of assert detected
(S101)
1409-1409: Use of assert detected
(S101)
1445-1445: Use of assert detected
(S101)
1449-1449: Use of assert detected
(S101)
1453-1453: Use of assert detected
(S101)
1458-1458: Use of assert detected
(S101)
1459-1459: Use of assert detected
(S101)
1463-1463: Use of assert detected
(S101)
1467-1467: Use of assert detected
(S101)
1471-1471: Use of assert detected
(S101)
1472-1472: Use of assert detected
(S101)
1476-1476: Use of assert detected
(S101)
1620-1620: Use of assert detected
(S101)
packages/opentelemetry-instrumentation-openai/tests/traces/test_prompt_caching.py
70-70: Use of assert detected
(S101)
71-71: Use of assert detected
(S101)
72-72: Use of assert detected
(S101)
74-74: Use of assert detected
(S101)
75-75: Use of assert detected
(S101)
131-131: Use of assert detected
(S101)
132-132: Use of assert detected
(S101)
133-133: Use of assert detected
(S101)
134-134: Use of assert detected
(S101)
221-221: Use of assert detected
(S101)
222-222: Use of assert detected
(S101)
223-223: Use of assert detected
(S101)
224-224: Use of assert detected
(S101)
315-315: Use of assert detected
(S101)
316-316: Use of assert detected
(S101)
318-318: Use of assert detected
(S101)
319-319: Use of assert detected
(S101)
376-376: Use of assert detected
(S101)
377-377: Use of assert detected
(S101)
379-379: Use of assert detected
(S101)
380-380: Use of assert detected
(S101)
468-468: Use of assert detected
(S101)
469-469: Use of assert detected
(S101)
470-470: Use of assert detected
(S101)
471-471: Use of assert detected
(S101)
packages/opentelemetry-instrumentation-openai/tests/traces/test_prompt_caching.py
Outdated
Show resolved
Hide resolved
|
Hey @martimfasantos - all tests currently pass in main, and we keep them up to date. I can't merge this if they don't pass. |
|
Looking at some of the test failures @martimfasantos - seems like you're using non existing span attributes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (7)
packages/opentelemetry-instrumentation-mistralai/opentelemetry/instrumentation/mistralai/__init__.py (1)
183-197: Don’t emit new token metrics under legacy keysWe’re still writing
total_tokenstoSpanAttributes.LLM_USAGE_TOTAL_TOKENS, while the other token metrics already moved to the GenAI namespace. This mixes the legacy and GenAI specs on the same span and breaks the goal of this PR. Please switchtotal_tokensto the GenAI attribute (GenAIAttributes.GEN_AI_USAGE_TOTAL_TOKENS) so all usage numbers land under the new semantic convention.Based on learnings
- _set_span_attribute( - span, - SpanAttributes.LLM_USAGE_TOTAL_TOKENS, - total_tokens, - ) + _set_span_attribute( + span, + GenAIAttributes.GEN_AI_USAGE_TOTAL_TOKENS, + total_tokens, + )packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py (2)
243-251: Critical: Undefined nameGEN_AI_RESPONSE_ID- missing importLine 244 references
GEN_AI_RESPONSE_IDdirectly instead of usingGenAIAttributes.GEN_AI_RESPONSE_ID. This will cause a NameError at runtime.Fix the attribute reference to use the imported GenAIAttributes:
- set_span_attribute(self._span, GEN_AI_RESPONSE_ID, self._complete_response.get("id")) + set_span_attribute(self._span, GenAIAttributes.GEN_AI_RESPONSE_ID, self._complete_response.get("id"))
398-399: Critical: Another undefined nameGEN_AI_RESPONSE_IDLine 399 has the same issue as line 244 - using
GEN_AI_RESPONSE_IDdirectly instead ofGenAIAttributes.GEN_AI_RESPONSE_ID.Fix the attribute reference:
- set_span_attribute(self._span, GEN_AI_RESPONSE_ID, self._complete_response.get("id")) + set_span_attribute(self._span, GenAIAttributes.GEN_AI_RESPONSE_ID, self._complete_response.get("id"))packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py (1)
1661-1666: Legacy token attributes still referenced in newer testsLines 1661-1666 and other locations still reference
"gen_ai.usage.prompt_tokens"and"gen_ai.usage.completion_tokens"as string literals. These should be updated to use the new input/output naming.Update all occurrences of the legacy attribute names:
- assert anthropic_span.attributes["gen_ai.usage.prompt_tokens"] == 568 + assert anthropic_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 568 assert ( - anthropic_span.attributes["gen_ai.usage.completion_tokens"] - + anthropic_span.attributes["gen_ai.usage.prompt_tokens"] + anthropic_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] + + anthropic_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == anthropic_span.attributes["llm.usage.total_tokens"] )Apply similar changes to:
- Lines 1821-1826
- Lines 1924-1929
- Lines 2007-2012
- Lines 2134-2139
- Lines 2247-2252
Also applies to: 1821-1826, 1924-1929, 2007-2012, 2134-2139, 2247-2252
packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py (2)
98-111: Critical issue: Undefined variable references.Lines 98 and 109 reference
GEN_AI_PROMPTdirectly instead of using the properGenAIAttributes.GEN_AI_PROMPTreference. This will causeNameErrorat runtime.Fix the undefined variable references:
_set_span_attribute( span, - f"{GEN_AI_PROMPT}.0.role", + f"{GenAIAttributes.GEN_AI_PROMPT}.0.role", "user", )_set_span_attribute( span, - f"{GEN_AI_PROMPT}.0.content", + f"{GenAIAttributes.GEN_AI_PROMPT}.0.content", dump_object(inputs), )
240-241: Critical issue: Undefined variable references in response functions.Several functions reference
GEN_AI_COMPLETION,GEN_AI_RESPONSE_FINISH_REASONS, etc. directly without theGenAIAttributesprefix, which will causeNameErrorat runtime.Fix all undefined variable references:
- prefix = f"{GEN_AI_COMPLETION}.{index}" + prefix = f"{GenAIAttributes.GEN_AI_COMPLETION}.{index}"- _set_span_attribute(span, GEN_AI_RESPONSE_FINISH_REASONS, [finish_reason]) + _set_span_attribute(span, GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS, [finish_reason])And similar fixes for lines 283, 290, 296, and 298.
Also applies to: 283-283, 290-291, 296-298
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)
319-352: Align GenAI attributes to spec-defined names
Replace the flattenedGEN_AI_COMPLETION.*keys—e.g.gen_ai.completion.{i}.content,.role,.finish_reason, and.tool_calls.{j}.{…}—with the structured semantic-convention fields defined in the GenAI spec (for example:gen_ai.response.message.content,gen_ai.response.message.role,gen_ai.response.finish_reasons,gen_ai.tool.call.name/arguments/id), or retain the existingLLM_*fallbacks if necessary.
Location: packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py lines 319–352 (and 341–347)
🧹 Nitpick comments (6)
packages/opentelemetry-instrumentation-mistralai/tests/test_chat.py (3)
509-509: Inconsistent attribute reference - use GenAIAttributesLine 509 uses a string literal
"gen_ai.request.model"instead of theGenAIAttributes.GEN_AI_REQUEST_MODELconstant, which is inconsistent with the pattern used elsewhere in the file.Apply this diff to maintain consistency:
- assert mistral_span.attributes.get("gen_ai.request.model") == "mistral-tiny" + assert mistral_span.attributes.get(f"{GenAIAttributes.GEN_AI_REQUEST_MODEL}") == "mistral-tiny"
557-557: Inconsistent attribute reference - use GenAIAttributesLine 557 uses a string literal
"gen_ai.request.model"instead of theGenAIAttributes.GEN_AI_REQUEST_MODELconstant, which is inconsistent with the pattern used elsewhere in the file.Apply this diff to maintain consistency:
- assert mistral_span.attributes.get("gen_ai.request.model") == "mistral-tiny" + assert mistral_span.attributes.get(f"{GenAIAttributes.GEN_AI_REQUEST_MODEL}") == "mistral-tiny"
616-616: Inconsistent attribute reference - use GenAIAttributesLine 616 uses a string literal
"gen_ai.request.model"instead of theGenAIAttributes.GEN_AI_REQUEST_MODELconstant, which is inconsistent with the pattern used elsewhere in the file.Apply this diff to maintain consistency:
- assert mistral_span.attributes.get("gen_ai.request.model") == "mistral-tiny" + assert mistral_span.attributes.get(f"{GenAIAttributes.GEN_AI_REQUEST_MODEL}") == "mistral-tiny"packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (3)
141-144: Avoid duplicating agent identity keys on handoff spans.You set both gen_ai.handoff.from_agent and GEN_AI_AGENT_NAME to the same value. This can confuse queries/dashboards. Recommend keeping only the explicit handoff field.
Apply this diff:
- handoff_attributes[GenAIAttributes.GEN_AI_AGENT_NAME] = from_agent
245-257: Normalize prompt content serialization for dict/list inputs.In the dict branch you only json-dump when content is a dict. Use the same “not string → json” rule as above to handle lists/objects consistently and avoid invalid attribute types.
Apply this diff:
- content = message['content'] - if isinstance(content, dict): - content = json.dumps(content) + content = message['content'] + if not isinstance(content, str): + content = json.dumps(content)
399-414: DRY up model settings mapping (duplicated in both branches).Temperature/max_tokens/top_p/model are set in two places. Extract a small helper to reduce duplication and drift.
Example (outside this diff):
def _apply_model_settings(otel_span, response): if getattr(response, "temperature", None) is not None: otel_span.set_attribute(GenAIAttributes.GEN_AI_REQUEST_TEMPERATURE, response.temperature) if getattr(response, "max_output_tokens", None) is not None: otel_span.set_attribute(GenAIAttributes.GEN_AI_REQUEST_MAX_TOKENS, response.max_output_tokens) if getattr(response, "top_p", None) is not None: otel_span.set_attribute(GenAIAttributes.GEN_AI_REQUEST_TOP_P, response.top_p) if getattr(response, "model", None): otel_span.set_attribute("gen_ai.request.model", response.model)Then call
_apply_model_settings(otel_span, response)in both locations.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (12)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py(9 hunks)packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py(6 hunks)packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py(28 hunks)packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/__init__.py(2 hunks)packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py(7 hunks)packages/opentelemetry-instrumentation-cohere/tests/test_chat.py(3 hunks)packages/opentelemetry-instrumentation-cohere/tests/test_rerank.py(3 hunks)packages/opentelemetry-instrumentation-mistralai/opentelemetry/instrumentation/mistralai/__init__.py(8 hunks)packages/opentelemetry-instrumentation-mistralai/tests/test_chat.py(12 hunks)packages/opentelemetry-instrumentation-mistralai/tests/test_embeddings.py(6 hunks)packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py(12 hunks)packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py(5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/opentelemetry-instrumentation-mistralai/tests/test_embeddings.pypackages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.pypackages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.pypackages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.pypackages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.pypackages/opentelemetry-instrumentation-anthropic/tests/test_messages.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.pypackages/opentelemetry-instrumentation-mistralai/opentelemetry/instrumentation/mistralai/__init__.pypackages/opentelemetry-instrumentation-cohere/tests/test_rerank.pypackages/opentelemetry-instrumentation-cohere/tests/test_chat.pypackages/opentelemetry-instrumentation-mistralai/tests/test_chat.pypackages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/__init__.py
🧠 Learnings (1)
📚 Learning: 2025-08-17T15:06:48.109Z
Learnt from: CR
PR: traceloop/openllmetry#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Semantic conventions must follow the OpenTelemetry GenAI specification
Applied to files:
packages/opentelemetry-instrumentation-mistralai/opentelemetry/instrumentation/mistralai/__init__.py
🧬 Code graph analysis (8)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (2)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py (1)
set_span_attribute(21-25)packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
SpanAttributes(64-237)
packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py (2)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
LLMRequestTypeValues(269-274)packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/utils.py (2)
to_dict(64-77)dump_object(52-61)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py (1)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py (3)
set_span_attribute(21-25)error_metrics_attributes(216-220)shared_metrics_attributes(179-212)
packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
SpanAttributes(64-237)
packages/opentelemetry-instrumentation-mistralai/opentelemetry/instrumentation/mistralai/__init__.py (2)
packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py (1)
_set_span_attribute(18-22)packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
LLMRequestTypeValues(269-274)
packages/opentelemetry-instrumentation-cohere/tests/test_rerank.py (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
SpanAttributes(64-237)
packages/opentelemetry-instrumentation-cohere/tests/test_chat.py (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
SpanAttributes(64-237)
packages/opentelemetry-instrumentation-mistralai/tests/test_chat.py (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
SpanAttributes(64-237)
🪛 Ruff (0.13.1)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py
228-228: Undefined name exception_counter
(F821)
229-229: Undefined name exception_counter
(F821)
230-230: Use raise without specifying exception name
Remove exception name
(TRY201)
231-231: Undefined name complete_response
(F821)
233-233: Undefined name complete_response
(F821)
234-234: Undefined name span
(F821)
234-234: Undefined name complete_response
(F821)
235-235: Undefined name duration_histogram
(F821)
236-236: Undefined name start_time
(F821)
237-237: Undefined name duration_histogram
(F821)
294-294: Undefined name span
(F821)
294-294: Undefined name complete_response
(F821)
🪛 Flake8 (7.2.0)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py
[error] 228-228: undefined name 'exception_counter'
(F821)
[error] 229-229: undefined name 'exception_counter'
(F821)
[error] 231-231: undefined name 'complete_response'
(F821)
[error] 233-233: undefined name 'complete_response'
(F821)
[error] 234-234: undefined name 'span'
(F821)
[error] 234-234: undefined name 'complete_response'
(F821)
[error] 235-235: undefined name 'duration_histogram'
(F821)
[error] 236-236: undefined name 'start_time'
(F821)
[error] 237-237: undefined name 'duration_histogram'
(F821)
[error] 294-294: undefined name 'span'
(F821)
[error] 294-294: undefined name 'complete_response'
(F821)
🔇 Additional comments (26)
packages/opentelemetry-instrumentation-mistralai/tests/test_chat.py (4)
7-8: LGTM! Consistent use of GenAIAttributes importThe import statement correctly aliases
gen_ai_attributesasGenAIAttributes, following the pattern used across the repository for the migration to OpenTelemetry GenAI semantic conventions.
27-27: System identifier inconsistency persists - needs resolutionThis test still asserts span attribute
GenAIAttributes.GEN_AI_SYSTEM == "MistralAI"while log events expect"mistral_ai"(Line 647). This creates brittle test assertions due to casing inconsistency.
647-647: System identifier inconsistency persists in log assertionsThe log assertions on Line 647 expect
GenAIAttributes.GEN_AI_SYSTEM == "mistral_ai"while span assertions throughout the file expect"MistralAI". This creates the same system identifier inconsistency issue flagged in previous reviews.
35-47: All token attributes are correctly set by instrumentation
The_set_model_response_attributesfunction populatesSpanAttributes.LLM_USAGE_TOTAL_TOKENS,GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS, andGenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, so the test’s assertion that total equals input + output tokens is valid.packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py (2)
91-107: LGTM! Token attribute updates are correctThe updates from
LLM_USAGE_PROMPT_TOKENS/LLM_USAGE_COMPLETION_TOKENStoGEN_AI_USAGE_INPUT_TOKENS/GEN_AI_USAGE_OUTPUT_TOKENSalign with the PR's objective to adopt OpenTelemetry semantic conventions. The cache token attributes on lines 101-107 are also correctly updated.
114-124: LGTM! Token type attribute migration is correctThe token histogram recording correctly uses
GenAIAttributes.GEN_AI_TOKEN_TYPEinstead of the legacyLLM_TOKEN_TYPEfor both input (line 114) and output (line 123) tokens.packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (6)
39-41: LGTM! GenAIAttributes import added correctlyThe import statement properly brings in the GenAIAttributes alias from the incubating semantic conventions, which is necessary for the attribute migrations throughout this file.
229-230: LGTM! Token type attributes updated in async token usageThe token histogram recording correctly uses
GenAIAttributes.GEN_AI_TOKEN_TYPEfor both input and output tokens in the async_aset_token_usagefunction.Also applies to: 256-257
279-293: LGTM! Span attributes updated to GenAI namespaceThe span attribute updates correctly migrate from legacy LLM attributes to GenAI equivalents:
GEN_AI_USAGE_INPUT_TOKENS(line 279)GEN_AI_USAGE_OUTPUT_TOKENS(line 281)GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS(line 286)GEN_AI_USAGE_CACHE_CREATION_INPUT_TOKENS(line 290)
343-344: LGTM! Sync token usage function updated consistentlyThe synchronous
_set_token_usagefunction mirrors the async version's changes, correctly usingGenAIAttributes.GEN_AI_TOKEN_TYPEfor token histogram recording.Also applies to: 370-371
393-406: LGTM! Sync span attributes updated consistentlyThe synchronous function's span attributes are correctly updated to match the async version, maintaining consistency across the codebase.
538-538: LGTM! System attribute updated to GenAI namespaceBoth
_wrap(line 538) and_awrap(line 662) correctly update the system attribute from the legacy name toGenAIAttributes.GEN_AI_SYSTEM.Also applies to: 662-662
packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py (2)
96-113: LGTM! Test assertions correctly updated for prompt and completion attributesThe test correctly validates the migrated attributes:
GenAIAttributes.GEN_AI_PROMPTreplaces legacy prompt attributesGenAIAttributes.GEN_AI_COMPLETIONreplaces legacy completion attributesGenAIAttributes.GEN_AI_USAGE_INPUT_TOKENSandGEN_AI_USAGE_OUTPUT_TOKENSreplace the legacy token attributes
155-160: LGTM! Token attributes consistently updated across test variantsThe test variants (with events, with/without content, multimodal) all correctly use the GenAIAttributes constants for token usage assertions.
Also applies to: 220-225, 291-296
packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/__init__.py (1)
32-34: LGTM! Consistent GenAI attributes import.The import follows the expected pattern used across the codebase for accessing GenAI semantic conventions.
packages/opentelemetry-instrumentation-cohere/tests/test_rerank.py (1)
41-41: LGTM! Test updates correctly validate GenAI attributes.The test assertions have been properly updated to validate the new GenAI attribute keys instead of the legacy LLM attributes. The migration is consistent and follows the expected pattern.
Also applies to: 44-45, 49-50, 55-56, 62-63, 66-67, 72-73, 77-78, 121-121, 124-125, 185-185, 188-189
packages/opentelemetry-instrumentation-cohere/tests/test_chat.py (2)
60-60: LGTM! Selective migration to GenAI attributes in legacy tests.The test updates correctly migrate specific attributes (
GEN_AI_SYSTEM,GEN_AI_REQUEST_MODEL,GEN_AI_PROMPT,GEN_AI_COMPLETION, and token usage attributes) to GenAI conventions while preserving legacy attributes where appropriate for backward compatibility testing.Also applies to: 62-62, 64-65, 68-69, 71-71, 75-77, 527-527, 529-529, 556-556, 558-558
75-77: No changes needed: token calculation remains consistent. Tests in span_utils.py now set LLM_USAGE_TOTAL_TOKENS to input_tokens + output_tokens while also recording GEN_AI_USAGE_INPUT_TOKENS and GEN_AI_USAGE_OUTPUT_TOKENS, matching the previous use of total_tokens from to_dict(token_count). All existing assertions verify this sum.packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py (5)
8-10: LGTM! Proper import of GenAI attributes.The import follows the established pattern for accessing GenAI semantic conventions.
32-35: LGTM! Consistent migration of prompt attributes.All prompt-related attributes have been properly migrated from
LLM_PROMPTStoGEN_AI_PROMPTwith consistent indexing and field naming.Also applies to: 40-43, 45-48, 53-54, 79-83, 87-88, 92-94
134-146: LGTM! Request attributes properly migrated.All request-related attributes have been consistently updated to use GenAI attribute keys while preserving the same functionality.
164-168: LGTM! Response ID attributes properly migrated.The response ID attribute has been correctly updated to use the GenAI convention.
203-204: LGTM! Token usage attributes properly migrated.The migration from
LLM_USAGE_COMPLETION_TOKENSandLLM_USAGE_PROMPT_TOKENStoGEN_AI_USAGE_OUTPUT_TOKENSandGEN_AI_USAGE_INPUT_TOKENSis correct and follows the semantic conventions.Also applies to: 208-210
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (3)
10-12: Stabilize GenAIAttributes import (prefer stable alias; keep incubating as fallback).The private
_incubatingpath is brittle across versions. Prefer the project-wide stable alias if available, and fall back to the incubating layout to avoid breakage in environments with different semconv bundles.Apply this diff:
-from opentelemetry.semconv._incubating.attributes import ( - gen_ai_attributes as GenAIAttributes -) +try: + # Preferred stable alias (when available) + from opentelemetry.semconv_ai import GenAIAttributes # type: ignore[attr-defined] +except Exception: + # Fallback to incubating layout + from opentelemetry.semconv._incubating.attributes import gen_ai_attributes as GenAIAttributes
357-366: Token usage mapping looks correct with backward-compat fallback.input_tokens/prompt_tokens and output_tokens/completion_tokens are mapped to GEN_AI_USAGE_* consistently.
Consider also mapping total_tokens to a GenAI constant if one exists in your version for consistency with the new naming (fallback to LLM_USAGE_TOTAL_TOKENS otherwise).
491-496: Model settings propagation to agent span looks good.Good reuse of the stored model settings with consistent GEN_AI_* keys.
...opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py
Show resolved
Hide resolved
...opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py
Outdated
Show resolved
Hide resolved
packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/__init__.py
Show resolved
Hide resolved
packages/opentelemetry-instrumentation-mistralai/tests/test_embeddings.py
Show resolved
Hide resolved
packages/opentelemetry-instrumentation-mistralai/tests/test_embeddings.py
Show resolved
Hide resolved
...elemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
Show resolved
Hide resolved
packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py (1)
180-188: Prompt/completion token rename is incompleteThe Cohere v4 and v2 paths still emit
SpanAttributes.LLM_USAGE_{PROMPT,COMPLETION}_TOKENS, so we keep producing the legacy attribute names alongside the new GenAI ones. That undercuts the PR goal of aligning everything withgen_ai.usage.{input,output}_tokensand leaves downstream consumers with a mixed schema. Please switch these setters to the GenAI tokens just like you did for the billed-units block above.Here’s the adjustment:
_set_span_attribute( span, SpanAttributes.LLM_USAGE_TOTAL_TOKENS, token_count_dict.get("total_tokens"), ) _set_span_attribute( span, - SpanAttributes.LLM_USAGE_COMPLETION_TOKENS, + GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, token_count_dict.get("response_tokens"), ) _set_span_attribute( span, - SpanAttributes.LLM_USAGE_PROMPT_TOKENS, + GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS, token_count_dict.get("prompt_tokens"), ) @@ _set_span_attribute( span, SpanAttributes.LLM_USAGE_TOTAL_TOKENS, input_tokens + output_tokens, ) _set_span_attribute( span, - SpanAttributes.LLM_USAGE_COMPLETION_TOKENS, + GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, output_tokens, ) _set_span_attribute( span, - SpanAttributes.LLM_USAGE_PROMPT_TOKENS, + GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS, input_tokens, )Also applies to: 229-237
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py (2)
223-225: Bug: calling undefined method_complete_instrumentation(sync path).
AnthropicStreamdefines_handle_completion, not_complete_instrumentation. This raises AttributeError on normal completion.Apply:
- if not self._instrumentation_completed: - self._complete_instrumentation() + if not self._instrumentation_completed: + self._handle_completion()
384-387: Guard span.end() when span might be None (async error path).Calling
self._span.end()unconditionally will throw if_spanis None.Apply:
- if self._span and self._span.is_recording(): - self._span.set_status(Status(StatusCode.ERROR, str(e))) - self._span.end() + if self._span: + if self._span.is_recording(): + self._span.set_status(Status(StatusCode.ERROR, str(e))) + self._span.end()
🧹 Nitpick comments (3)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py (3)
399-401: Typo in comment.“abuild_from_streaming_response” -> “build_from_streaming_response”.
Apply:
- # This mirrors the logic from abuild_from_streaming_response + # This mirrors the logic from build_from_streaming_response
162-165: Avoid mutable default args for kwargs.Use
Noneand normalize inside to prevent accidental cross‑instance mutation.Apply:
- kwargs: dict = {}, + kwargs: Optional[dict] = None, @@ - self._kwargs = kwargs + self._kwargs = kwargs or {}Repeat for the async constructor.
Also applies to: 312-314
22-24: Guard unstable incubating import or pin semconv version
Wrap theopentelemetry.semconv._incubating.attributesimport in a try/except (surfacing a clear ImportError) or pin youropentelemetry-semconvdependency to a compatible release to prevent future breakage.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py(6 hunks)packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py(9 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/test_prompt_caching.py(6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.pypackages/opentelemetry-instrumentation-openai/tests/traces/test_prompt_caching.pypackages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py
🧬 Code graph analysis (2)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py (2)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py (4)
set_span_attribute(21-25)error_metrics_attributes(216-220)shared_metrics_attributes(179-212)count_prompt_tokens_from_request(224-242)packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (1)
_set_token_usage(296-406)
packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py (2)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
LLMRequestTypeValues(269-274)packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/utils.py (2)
to_dict(64-77)dump_object(52-61)
🪛 Ruff (0.13.1)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py
230-230: Use raise without specifying exception name
Remove exception name
(TRY201)
286-286: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (11)
packages/opentelemetry-instrumentation-openai/tests/traces/test_prompt_caching.py (8)
70-71: Token attribute rename to input/output — LGTMAssertions now target gen_ai.usage.input_tokens/output_tokens as per OTel GenAI semconv. Looks good.
Also applies to: 74-75
131-134: Events+content: usage token keys updated — LGTMValues align with the recorded cassette; key names match the new convention.
221-224: No-content events: usage token keys updated — LGTM
315-319: Async legacy path: usage token keys updated — LGTM
376-380: Async + content: usage token keys updated — LGTM
468-471: Async + no-content: usage token keys updated — LGTM
72-72: Remove or guard cache_read_input_tokens assertionThis attribute is not part of the core rename and may not be emitted; direct indexing will KeyError when absent. Either drop it or guard with a default.
Apply one of these:
- assert cache_creation_span.attributes["gen_ai.usage.cache_read_input_tokens"] == 0 + assert cache_creation_span.attributes.get("gen_ai.usage.cache_read_input_tokens", 0) == 0or simply remove the assertion line.
Run to confirm the attribute is produced anywhere:
#!/bin/bash # List occurrences of cache read/creation usage attributes in code (not tests) rg -nP --type=py -g '!**/tests/**' 'gen_ai\.usage\.(cache_(read|creation)_input_tokens)'
70-71: Optional: use semconv constants to avoid key typosTo reduce churn and prevent string-key typos, consider asserting via semconv constants (e.g., GEN_AI_USAGE_INPUT_TOKENS/GEN_AI_USAGE_OUTPUT_TOKENS) instead of string literals.
Please verify the exact constant names available in your semconv version:
Also applies to: 74-75, 131-134, 221-224, 315-319, 376-380, 468-471
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py (3)
109-126: LGTM on token-type histogram attributes.Using
GenAIAttributes.GEN_AI_TOKEN_TYPEwith "input"/"output" looks correct and consistent.
127-137: Useisinstanceand retain the finish-reason constant
- Replace the type check with:
- if type(complete_response.get("events")) is list and choice_counter: + if isinstance(complete_response.get("events"), list) and choice_counter:
- Keep using
SpanAttributes.LLM_RESPONSE_FINISH_REASON: event.get("finish_reason")here—LLM_RESPONSE_STOP_REASONapplies only to non-streaming stop reasons.Likely an incorrect or invalid review comment.
227-232: Exception handling leaks span and loses traceback.
- Use
raiseinstead ofraise e(TRY201).- Mirror async path: set span status and end the span to avoid open spans on errors.
Apply:
- except Exception as e: - attributes = error_metrics_attributes(e) - if self._exception_counter: - self._exception_counter.add(1, attributes=attributes) - raise e + except Exception as e: + # Handle errors during streaming + if not self._instrumentation_completed: + attributes = error_metrics_attributes(e) + if self._exception_counter: + self._exception_counter.add(1, attributes=attributes) + if self._span and self._span.is_recording(): + self._span.set_status(Status(StatusCode.ERROR, str(e))) + self._span.end() + self._instrumentation_completed = True + raise
...opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py
Outdated
Show resolved
Hide resolved
...opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py
Show resolved
Hide resolved
|
@nirga tests and linting are passing now! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
packages/traceloop-sdk/traceloop/sdk/tracing/manual.py (1)
61-62: Switch total tokens to GenAI attributeWe still emit
llm.usage.total_tokensviaSpanAttributes.LLM_USAGE_TOTAL_TOKENS, so the span carries a mix of legacy and GenAI semantic keys. The GenAI spec definesgen_ai.usage.total_tokens; keeping the old key defeats the PR goal and leaves downstream tooling looking for the new field empty. Please update this attribute to the GenAI constant.- self._span.set_attribute( - SpanAttributes.LLM_USAGE_TOTAL_TOKENS, usage.total_tokens - ) + self._span.set_attribute( + GenAIAttributes.GEN_AI_USAGE_TOTAL_TOKENS, usage.total_tokens + )packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
65-78: Address remaining legacy attribute references flagged in previous review.Per the bot's verification, test files still reference legacy
SpanAttributes.LLM_USAGE_PROMPT_TOKENS,SpanAttributes.LLM_USAGE_COMPLETION_TOKENS, andSpanAttributes.LLM_COMPLETIONSconstants that should be updated to the newGEN_AI_*equivalents:
packages/opentelemetry-instrumentation-openai/tests/traces/test_streaming_with_api_usage.py(lines ~38, 39, 67, 68)packages/traceloop-sdk/tests/test_prompt_management.py(lines ~257, 273)Complete the migration by updating these test references.
Based on learnings
packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py (1)
172-239: Migrate token-usage span attributes to GenAI conventions
In packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py (lines 172–239), you’re still setting spans with the legacySpanAttributes.LLM_USAGE_*keys. Replace those with the GenAI semantic convention constants—GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS,GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, andGenAIAttributes.GEN_AI_USAGE_TOTAL_TOKENS—to match the rest of the instrumentation and satisfy existing tests.packages/opentelemetry-instrumentation-langchain/tests/test_documents_chains.py (1)
99-119: Don't drop thegen_ai.choicelog assertionThe LangChain instrumentation still emits both the user message and the choice events; forcing
len(logs) == 1and commenting out thegen_ai.choiceverification means this test no longer catches regressions where the completion log disappears. That log is part of the GenAI trace semantics we rely on downstream. Please keep the expectation at two logs and update thegen_ai.choicepayload assertion (if the payload changed) instead of removing it.packages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.py (1)
49-76: Update test assertions to expect emitted metrics
In test_langchain_metrics.py, found_token_metric and found_duration_metric are initialized to False and asserted to remain False, but callback_handler.py calls token_histogram.record() and duration_histogram.record(). Adjust these flags to True and assert metrics are emitted.packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (1)
104-116: Migrate legacy function attributes to GenAI conventions
Replace uses ofSpanAttributes.LLM_REQUEST_FUNCTIONS.{i}with a dedicatedexecute_toolspan (name="execute_tool {tool.name}", kind=INTERNAL) that populates the standardized attributes:
gen_ai.tool.name→ tool_function.get("name")gen_ai.tool.description→ tool_function.get("description")gen_ai.tool.type→ one of"function"|"extension"|"datastore"gen_ai.tool.call.id→ unique call identifier
Record the function’s parameters ingen_ai.input.messagesper the GenAI messages schema instead of a separate.parametersattribute.
♻️ Duplicate comments (2)
packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py (1)
1297-1302: Use GenAIAttributes constants instead of string literals.Lines 1297-1302 (and similar patterns at 1437-1442, 1539-1544, 1662-1664, 1818-1820, 1920-1922, 2002-2004, 2128-2130, 2240-2242) use string literals for token attribute keys. The rest of the file consistently uses
GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENSandGenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENSconstants.Apply this pattern consistently:
- assert anthropic_span.attributes["gen_ai.usage.input_tokens"] == 514 + assert anthropic_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 514 assert ( - anthropic_span.attributes["gen_ai.usage.output_tokens"] - + anthropic_span.attributes["gen_ai.usage.input_tokens"] + anthropic_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] + + anthropic_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == anthropic_span.attributes["llm.usage.total_tokens"] )As per coding guidelines.
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py (1)
237-246: Remove duplicate metrics recording block.Lines 237-246 duplicate the metrics recording from lines 228-235, causing duration to be recorded twice and
response_idto be set redundantly.Remove the duplicate block:
if self._duration_histogram: duration = time.time() - self._start_time self._duration_histogram.record( duration, attributes=metric_attributes, ) - # This mirrors the logic from build_from_streaming_response - metric_attributes = shared_metrics_attributes(self._complete_response) - set_span_attribute(self._span, GenAIAttributes.GEN_AI_RESPONSE_ID, self._complete_response.get("id")) - - if self._duration_histogram: - duration = time.time() - self._start_time - self._duration_histogram.record( - duration, - attributes=metric_attributes, - ) - # Calculate token usage
🧹 Nitpick comments (6)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py (1)
219-224: Use bareraiseto preserve exception context.Line 222 uses
raise ewhich replaces the original traceback. Use bareraiseto preserve the full exception context.except Exception as e: attributes = error_metrics_attributes(e) if self._exception_counter: self._exception_counter.add(1, attributes=attributes) - raise e + raise _process_response_item(item, self._complete_response)packages/opentelemetry-instrumentation-openai/tests/traces/test_prompt_caching.py (1)
15-500: Ensure test coverage validates actual prompt caching behavior.All six test functions follow the same pattern: they make two identical requests and assert token counts on the resulting spans. However, the tests don't explicitly validate that prompt caching actually occurred (e.g., checking cache hit/miss indicators or verifying that the second request was faster/cheaper due to caching).
Consider adding assertions that:
- Verify cache hit/miss status attributes if exposed by the OpenAI API.
- Check that the second span reflects cached behavior (e.g., lower latency, cache-specific attributes).
- Validate the response IDs differ between cache creation and read spans (currently checked, which is good).
If the OpenAI API exposes cache-related metadata, verify it's being captured:
#!/bin/bash # Description: Check OpenAI API response structure for cache-related fields. # Look for usage, cache, or token-related response handling in the instrumentation ast-grep --pattern $'usage = $$$ $$$ cache$$$ $$$' # Search for response.usage parsing in the instrumentation rg -n 'response.*usage|usage.*cache|\.usage\.' --type=py -g '!test*' packages/opentelemetry-instrumentation-openai/packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (1)
438-456: Consider using the GenAIAttributes alias for consistency.This function imports
GEN_AI_AGENT_NAMEdirectly, while the rest of the file uses theGenAIAttributesmodule alias established at line 14. For consistency, consider usingGenAIAttributes.GEN_AI_AGENT_NAMEinstead of the direct import.Apply this diff to align with the file's import pattern:
def test_agent_name_propagation_to_agent_spans(exporter, test_agent): """Test that agent name is set into agent spans through the context propagation mechanism.""" - from opentelemetry.semconv._incubating.attributes.gen_ai_attributes import GEN_AI_AGENT_NAME query = "What is AI?" Runner.run_sync( test_agent, query, ) spans = exporter.get_finished_spans() # Find the agent span agent_spans = [s for s in spans if s.name == "testAgent.agent"] assert len(agent_spans) == 1, f"Expected 1 agent span, got {len(agent_spans)}" agent_span = agent_spans[0] # Verify the agent span has the agent name attribute set via context propagation - assert GEN_AI_AGENT_NAME in agent_span.attributes, f"Agent span should have {GEN_AI_AGENT_NAME} attribute" - assert agent_span.attributes[GEN_AI_AGENT_NAME] == "testAgent", ( - f"Expected agent name 'testAgent', got '{agent_span.attributes[GEN_AI_AGENT_NAME]}'" + assert GenAIAttributes.GEN_AI_AGENT_NAME in agent_span.attributes, f"Agent span should have {GenAIAttributes.GEN_AI_AGENT_NAME} attribute" + assert agent_span.attributes[GenAIAttributes.GEN_AI_AGENT_NAME] == "testAgent", ( + f"Expected agent name 'testAgent', got '{agent_span.attributes[GenAIAttributes.GEN_AI_AGENT_NAME]}'" )packages/opentelemetry-instrumentation-langchain/tests/test_chains.py (1)
763-778: The async callbacks for streaming do not implement choice-event emission hooks, so tests correctly omit those assertions. To support gen_ai.choice in async streaming, implement corresponding async chunk handlers (e.g.,on_chat_model_new_token/on_llm_new_token) in TraceloopCallbackHandler; otherwise, add a note to the docs clarifying that async streaming does not emit choice events.packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (2)
54-59: Review empty string handling.The helper function
_set_span_attributesets attributes even when the value is an empty string (line 59). This differs from typical OpenTelemetry practice where empty strings are often treated similarly toNoneand skipped.Consider whether empty strings should be set as attributes or skipped like
Nonevalues:def _set_span_attribute(span: Span, key: str, value: Any) -> None: if value is not None: - if value != "": - span.set_attribute(key, value) - else: - span.set_attribute(key, "") + if value != "": + span.set_attribute(key, value)This would align with the common pattern seen in other instrumentation packages (see
/packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py).
292-324: Improve error handling specificity.The code catches a bare
Exception(line 321) which is flagged by static analysis. While the try-except prevents crashes when processing usage metadata, catching all exceptions can hide unexpected errors.Consider catching specific exceptions related to attribute access and data processing:
try: for generations in response.generations: for generation in generations: if ( hasattr(generation, "message") and hasattr(generation.message, "usage_metadata") and generation.message.usage_metadata is not None ): input_tokens += ( generation.message.usage_metadata.get("input_tokens") or generation.message.usage_metadata.get("prompt_tokens") or 0 ) output_tokens += ( generation.message.usage_metadata.get("output_tokens") or generation.message.usage_metadata.get("completion_tokens") or 0 ) total_tokens = input_tokens + output_tokens if generation.message.usage_metadata.get("input_token_details"): input_token_details = generation.message.usage_metadata.get( "input_token_details", {} ) cache_read_tokens += input_token_details.get("cache_read", 0) - except Exception as e: + except (AttributeError, TypeError, KeyError) as e: # If there's any issue processing usage metadata, continue without it print(f"DEBUG: Error processing usage metadata: {e}") - passAlso, remove the redundant
passafter the print statement, and consider using a logger instead ofBased on coding guidelines.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (24)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py(7 hunks)packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py(35 hunks)packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py(15 hunks)packages/opentelemetry-instrumentation-anthropic/tests/test_thinking.py(0 hunks)packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py(8 hunks)packages/opentelemetry-instrumentation-cohere/tests/test_chat.py(3 hunks)packages/opentelemetry-instrumentation-langchain/langgraph_with_otel_sdk.py(0 hunks)packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py(4 hunks)packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py(13 hunks)packages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.py(10 hunks)packages/opentelemetry-instrumentation-langchain/tests/test_chains.py(14 hunks)packages/opentelemetry-instrumentation-langchain/tests/test_documents_chains.py(3 hunks)packages/opentelemetry-instrumentation-langchain/tests/test_lcel.py(17 hunks)packages/opentelemetry-instrumentation-langchain/tests/test_llms.py(43 hunks)packages/opentelemetry-instrumentation-langchain/tests/test_structured_output.py(4 hunks)packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py(4 hunks)packages/opentelemetry-instrumentation-langchain/tests/test_tool_calls.py(19 hunks)packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py(5 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/test_chat_response_format.py(0 hunks)packages/opentelemetry-instrumentation-openai/tests/traces/test_prompt_caching.py(6 hunks)packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py(1 hunks)packages/traceloop-sdk/tests/test_manual.py(2 hunks)packages/traceloop-sdk/tests/test_tasks.py(1 hunks)packages/traceloop-sdk/traceloop/sdk/tracing/manual.py(3 hunks)
💤 Files with no reviewable changes (3)
- packages/opentelemetry-instrumentation-anthropic/tests/test_thinking.py
- packages/opentelemetry-instrumentation-langchain/langgraph_with_otel_sdk.py
- packages/opentelemetry-instrumentation-openai/tests/traces/test_chat_response_format.py
🚧 Files skipped from review as they are similar to previous changes (3)
- packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py
- packages/opentelemetry-instrumentation-cohere/tests/test_chat.py
- packages/traceloop-sdk/tests/test_manual.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/opentelemetry-instrumentation-langchain/tests/test_documents_chains.pypackages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.pypackages/opentelemetry-instrumentation-langchain/tests/test_tool_calls.pypackages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.pypackages/opentelemetry-instrumentation-langchain/tests/test_chains.pypackages/opentelemetry-instrumentation-openai/tests/traces/test_prompt_caching.pypackages/opentelemetry-instrumentation-langchain/tests/test_structured_output.pypackages/traceloop-sdk/tests/test_tasks.pypackages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.pypackages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.pypackages/traceloop-sdk/traceloop/sdk/tracing/manual.pypackages/opentelemetry-instrumentation-anthropic/tests/test_messages.pypackages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.pypackages/opentelemetry-instrumentation-langchain/tests/test_llms.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.pypackages/opentelemetry-instrumentation-langchain/tests/test_lcel.pypackages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.pypackages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.py
🧬 Code graph analysis (11)
packages/opentelemetry-instrumentation-langchain/tests/test_documents_chains.py (1)
packages/opentelemetry-instrumentation-langchain/tests/test_chains.py (1)
assert_message_in_logs(818-826)
packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py (3)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (1)
_set_span_attribute(54-59)packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (2)
LLMRequestTypeValues(277-282)SpanAttributes(64-245)packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/utils.py (2)
to_dict(64-77)dump_object(52-61)
packages/opentelemetry-instrumentation-langchain/tests/test_tool_calls.py (1)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (1)
_(1094-1124)
packages/opentelemetry-instrumentation-langchain/tests/test_chains.py (1)
packages/opentelemetry-instrumentation-langchain/tests/test_documents_chains.py (1)
assert_message_in_logs(155-163)
packages/opentelemetry-instrumentation-langchain/tests/test_structured_output.py (1)
packages/opentelemetry-instrumentation-langchain/tests/test_chains.py (1)
assert_message_in_logs(818-826)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py (2)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py (4)
set_span_attribute(21-25)error_metrics_attributes(216-220)shared_metrics_attributes(179-212)count_prompt_tokens_from_request(224-242)packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (1)
_set_token_usage(296-406)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (3)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (3)
_set_span_attribute(54-59)set_chat_response_usage(280-373)set_chat_response(206-277)packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
SpanAttributes(64-245)packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/utils.py (1)
should_emit_events(80-87)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (2)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (1)
_set_span_attribute(30-37)packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/utils.py (1)
CallbackFilteredJSONEncoder(22-46)
packages/opentelemetry-instrumentation-langchain/tests/test_llms.py (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
SpanAttributes(64-245)
packages/opentelemetry-instrumentation-langchain/tests/test_lcel.py (1)
packages/opentelemetry-instrumentation-langchain/tests/test_chains.py (1)
assert_message_in_logs(818-826)
packages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.py (2)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
Meters(36-61)packages/opentelemetry-instrumentation-milvus/tests/conftest.py (1)
reader(37-41)
🪛 Ruff (0.13.1)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py
222-222: Use raise without specifying exception name
Remove exception name
(TRY201)
278-278: Do not catch blind exception: Exception
(BLE001)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py
321-321: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (36)
packages/traceloop-sdk/tests/test_tasks.py (1)
25-29: Robust span selection looks good.Filtering for
answer_question.taskbefore asserting keeps the test stable when extra spans appear. Nice resilience improvement.packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (2)
69-71: Good practice maintaining backward compatibility.Retaining the legacy
LLM_USAGE_CACHE_*constants with identical values ensures existing instrumentation code won't break during migration to theGEN_AI_*namespace.
65-67: Confirm missing upstream definitions — additions are appropriate
The attributesgen_ai.usage.cache_creation_input_tokensandgen_ai.usage.cache_read_input_tokensaren’t defined in OpenTelemetry semantic conventions v1.28.0+, so defining them here is correct.packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py (1)
497-504: LGTM: Defensive attribute handling.The
__getattr__and stub_complete_instrumentationmethods in both wrapper classes preventAttributeErrorwhen accessing these attributes, since the actual instrumentation logic resides in the wrapped stream objects (AnthropicStreamandAnthropicAsyncStream).Also applies to: 554-561
packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py (7)
8-10: LGTM!Import correctly adds GenAIAttributes from the incubating OTel semantic conventions.
26-113: LGTM!All prompt content attributes correctly migrated to
GenAIAttributes.GEN_AI_PROMPTacross completion, chat, rerank, and embedding request types.
132-155: Request attributes correctly migrated; TODO noted for penalty attributes.Model, temperature, top_p, top_k, and stop_sequences now use
GenAIAttributes.GEN_AI_REQUEST_*. The TODO at lines 149-155 correctly flags that frequency and presence penalty migration is pending.
242-283: LGTM!Chat response attributes correctly migrated to
GenAIAttributes.GEN_AI_COMPLETIONprefix andGEN_AI_RESPONSE_FINISH_REASONS.
286-297: LGTM!Generations response attributes correctly use
GenAIAttributes.GEN_AI_RESPONSE_IDandGEN_AI_COMPLETIONprefix.
299-314: LGTM!Rerank response attributes correctly use
GenAIAttributes.GEN_AI_RESPONSE_IDandGEN_AI_COMPLETIONprefix.
58-77: Retain legacy function request attributes. No GenAI semantic conventions for functions or tools exist in packages/opentelemetry-semantic-conventions-ai, so keepingSpanAttributes.LLM_REQUEST_FUNCTIONSis correct until new conventions are defined.Likely an incorrect or invalid review comment.
packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (4)
13-15: LGTM! Clean module alias for GenAI attributes.The import creates a clear module alias that centralizes access to GenAI semantic convention constants throughout the test file.
141-146: Past issue resolved: token attributes correctly updated.The test now correctly asserts the new
GEN_AI_USAGE_INPUT_TOKENSandGEN_AI_USAGE_OUTPUT_TOKENSattributes, aligning with the OpenTelemetry semantic conventions migration.
300-303: LGTM! Metrics test updated for new token type semantics.The assertion correctly validates token types as "output" and "input", aligning with the migration from completion_tokens/prompt_tokens to output_tokens/input_tokens.
200-210: Verify attribute key construction with module constants.I couldn't find the GenAIAttributes definition in this repo (it's imported from opentelemetry.semconv._incubating.attributes), so confirm GEN_AI_COMPLEION is a string constant or a string-valued Enum member — the f-string f"{GenAIAttributes.GEN_AI_COMPLETION}.tool.name" requires a string (use .value or str() if it's an Enum). If it's not string-valued, update the instrumentation/tests to use the string value so the dotted-key attribute names match.
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (3)
58-60: LGTM! GenAIAttributes import aligns with OTel semantic conventions.The import correctly introduces the GenAIAttributes alias for accessing gen_ai_attributes from the incubating semantic conventions, supporting the migration from legacy LLM_* attributes to standardized GEN_AI_* attributes.
626-631: Good addition for backward compatibility.Calling
set_chat_responsein both event emission and legacy attribute modes ensures span attributes are consistently populated regardless of configuration, maintaining compatibility for downstream consumers.
359-360: LGTM! Attribute key migration correctly implements OTel semantic conventions.All attribute keys have been consistently migrated from legacy
SpanAttributes.LLM_*to standardizedGenAIAttributes.GEN_AI_*naming:
LLM_SYSTEM→GEN_AI_SYSTEMLLM_USAGE_PROMPT_TOKENS→GEN_AI_USAGE_INPUT_TOKENSLLM_USAGE_COMPLETION_TOKENS→GEN_AI_USAGE_OUTPUT_TOKENS- Token and duration histogram attributes updated consistently
This aligns with the PR objective to adopt OpenTelemetry's GenAI semantic conventions.
Also applies to: 555-564, 593-622, 635-641
packages/opentelemetry-instrumentation-langchain/tests/test_structured_output.py (2)
29-38: LGTM! Test updated for GenAI attribute conventions.The test correctly:
- Discards the unused result value
- Uses
GenAIAttributes.GEN_AI_PROMPTfor prompt content assertions- Aligns with the migration to OpenTelemetry semantic conventions
54-75: Verify removal of choice event assertions in structured output tests.Both
test_structured_output_with_events_with_contentandtest_structured_output_with_events_with_no_contenthave commented-out assertions for AI choice events, reducing test coverage. Expected log counts dropped from 2 to 1.Are choice events no longer emitted for structured outputs, or is this incomplete test migration? If this is expected behavior, consider documenting why choice events aren't validated for structured outputs.
Based on the PR context indicating "some tests are outdated and will not be further updated," please confirm whether:
- Structured output flows intentionally skip choice event emission
- This test coverage gap is acceptable for this PR scope
Also applies to: 79-105
packages/opentelemetry-instrumentation-langchain/tests/test_tool_calls.py (2)
217-270: LGTM! GenAI attribute migrations correctly applied in active tests.All active span assertions correctly use
GenAIAttributes.GEN_AI_PROMPTandGenAIAttributes.GEN_AI_COMPLETIONwith proper indexed paths for messages, roles, content, and tool calls. The migration is consistent with OpenTelemetry semantic conventions.Also applies to: 505-529, 722-772, 921-923, 1070-1104
31-79: Aligntest_tool_callswith skip strategy or update assertions
Span-based checks intest_tool_calls(lines 37–74) are fully commented out, leaving only log validation, while all othertest_tool_calls_*tests are skipped withreason="Direct model invocations do not create langchain spans". Either add the same@pytest.mark.skipdecorator here or refactor this test to assert the intended current span/log behavior.packages/opentelemetry-instrumentation-langchain/tests/test_chains.py (1)
126-133: LGTM! Attribute key migrations correctly implement OTel conventions.The test assertions correctly migrate from
SpanAttributes.LLM_*toGenAIAttributes.GEN_AI_*:
LLM_REQUEST_MODEL→GEN_AI_REQUEST_MODELLLM_RESPONSE_MODEL→GEN_AI_RESPONSE_MODELLLM_PROMPTS→GEN_AI_PROMPTConsistent across synchronous and asynchronous test variants.
Also applies to: 216-222, 337-343
packages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.py (2)
7-10: LGTM! Import changes and attribute migrations align with OTel conventions.The test correctly:
- Removes unused
SpanAttributesimport- Introduces
GenAIAttributesalias- Migrates assertions to use
GEN_AI_TOKEN_TYPEandGEN_AI_SYSTEMThis addresses the past review comment requesting replacement of
SpanAttributes.LLM_SYSTEMreferences.Also applies to: 51-73, 103-125
196-272: LGTM! Langgraph test correctly validates metrics with GenAI attributes.The
test_langgraph_metricstest correctly:
- Expects metrics to BE generated (unlike the chain tests)
- Uses
GenAIAttributes.GEN_AI_SYSTEMandGEN_AI_TOKEN_TYPE- Validates token usage, duration, and generation choices metrics
This test's success suggests metrics work for direct OpenAI client invocations but may have issues with LangChain chain integrations (see previous comment).
packages/opentelemetry-instrumentation-langchain/tests/test_llms.py (5)
209-209: Clarify commented assertion pattern.Multiple tests have commented-out assertions checking for
gen_ai.choiceevents with comments like "logs[2] does not exist" or "logs[1] may not exist". This pattern appears consistently throughout the file.Are these commented assertions intentional? If choice events are no longer emitted in the new GenAI attribute model, consider removing these commented lines entirely rather than leaving them in the code. If they represent known test gaps or future work, document this with a TODO comment or tracking issue.
Also applies to: 253-253, 357-357, 406-406, 567-567, 627-627, 766-766, 818-818, 964-964, 1022-1022
289-301: Validate new workflow output approach.The test now validates completion content through the workflow span's
TRACELOOP_ENTITY_OUTPUTattribute rather than directly checking the completion span attributes. This is a change in validation strategy.Ensure this new validation approach is consistent across all similar tests and that it correctly captures the AI response content in all scenarios.
341-341: Verify log count reduction is correct.Multiple tests have been updated to assert
len(logs) == 2instead of the previous count (likely 3). This suggests that one event type (likely thegen_ai.choiceevent) is no longer emitted.Confirm that the reduction in log count is expected behavior with the new GenAI attribute model. The PR description mentions that tests are "outdated and will not be further updated" - if these log count changes reflect known behavioral changes, document this in the test or PR comments.
Also applies to: 392-392, 539-539, 612-612, 748-748, 804-804, 946-946, 1008-1008, 1113-1113, 1189-1189, 1422-1422, 1501-1501
1071-1071: LGTM: System attribute correctly migrated.The assertion correctly validates that the
GEN_AI_SYSTEMattribute is set to the expected vendor name based on the LLM type.
145-154: Attribute migration verified — the instrumentation inspan_utils.pyandcallback_handler.pycorrectly setsGenAIAttributes.GEN_AI_REQUEST_MODEL,GEN_AI_SYSTEM, as well as prompt (.0.content) and completion (.0.content) attributes via_set_span_attribute, so the tests will pass as written.packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (6)
18-20: LGTM: GenAI attributes imported correctly.The import of
gen_ai_attributes as GenAIAttributesaligns with the PR's objective to adopt OpenTelemetry's standardized GenAI semantic conventions.
78-80: LGTM: Request and response model attributes migrated.The migration from legacy attribute names to
GEN_AI_REQUEST_MODELandGEN_AI_RESPONSE_MODELis correct and aligns with OpenTelemetry semantic conventions.
91-97: LGTM: Request parameter attributes migrated.All request parameters (
GEN_AI_REQUEST_MAX_TOKENS,GEN_AI_REQUEST_TEMPERATURE,GEN_AI_REQUEST_TOP_P) are correctly migrated to use GenAI attributes.
213-243: LGTM: Improved content extraction in response handling.The refactored
set_chat_responsefunction now has more robust content extraction logic that tries multiple sources (generation.text,generation.message.content) and handles both string and structured content. The migration toGEN_AI_COMPLETIONprefix is correct.
334-350: LGTM: Token usage attributes correctly migrated.The migration from legacy token attribute names to
GEN_AI_USAGE_INPUT_TOKENS,GEN_AI_USAGE_OUTPUT_TOKENS, andGEN_AI_USAGE_CACHE_READ_INPUT_TOKENSis correct and aligns with OpenTelemetry semantic conventions.
353-373: LGTM: Token histogram attributes migrated.The token histogram recording correctly uses the new GenAI attributes (
GEN_AI_SYSTEM,GEN_AI_TOKEN_TYPE,GEN_AI_RESPONSE_MODEL) for both input and output token metrics.
packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py
Show resolved
Hide resolved
packages/opentelemetry-instrumentation-langchain/tests/test_documents_chains.py
Show resolved
Hide resolved
packages/opentelemetry-instrumentation-langchain/tests/test_tool_calls.py
Show resolved
Hide resolved
packages/opentelemetry-instrumentation-openai/tests/traces/test_prompt_caching.py
Show resolved
Hide resolved
|
hey guys, what's the latest on this? would love to see it in main |
@nirga any updates on this? |
|
sorry for the delay here folks @martimfasantos
|
|
any updates here? @nirga |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/__init__.py (1)
143-155: Keep prompts on GenAI attribute keysWhen the prompt is a plain string we still emit
SpanAttributes.LLM_PROMPTS.0.user, so half the calls use the legacy field. That breaks the rename goal and makes downstream data inconsistent. Please route both branches through the GenAI prompt attribute.Apply this diff to fix the issue:
if isinstance(prompt, list): for index, input_text in enumerate(prompt): _set_span_attribute( span, - f"{SpanAttributes.GEN_AI_PROMPT}.{index}.user", + f"{GenAIAttributes.GEN_AI_PROMPT}.{index}.user", str(input_text).strip(), ) else: _set_span_attribute( span, - f"{SpanAttributes.LLM_PROMPTS}.0.user", + f"{GenAIAttributes.GEN_AI_PROMPT}.0.user", str(prompt).strip(), )
♻️ Duplicate comments (3)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (2)
10-12: Critical: Incorrect import causes undefined name errors.The import aliases the
gen_ai_attributesmodule asGenAIAttributes, but the code uses it as a class with constants (e.g.,GenAIAttributes.GEN_AI_AGENT_NAME). This will fail at runtime and in linting.Apply this diff to import the correct class:
-from opentelemetry.semconv._incubating.attributes import ( - gen_ai_attributes as GenAIAttributes -) +from opentelemetry.semconv_ai import GenAIAttributes
163-165: Remove non-spec gen_ai.completion.tool. attributes.*The GenAI semantic conventions don't define a
gen_ai.completion.tool.*namespace—tools are captured viagen_ai.tool.*(line 161) orLLM_REQUEST_FUNCTIONS(lines 358-376). These attributes can fail semantic convention validation.Based on learnings
Apply this diff to remove the non-spec attributes:
tool_attributes = { SpanAttributes.TRACELOOP_SPAN_KIND: TraceloopSpanKindValues.TOOL.value, "gen_ai.tool.name": tool_name, "gen_ai.system": "openai_agents", - f"{GenAIAttributes.GEN_AI_COMPLETION}.tool.name": tool_name, - f"{GenAIAttributes.GEN_AI_COMPLETION}.tool.type": "FunctionTool", - f"{GenAIAttributes.GEN_AI_COMPLETION}.tool.strict_json_schema": True } if hasattr(span_data, 'description') and span_data.description: # Only use description if it's not a generic class description desc = span_data.description if desc and not desc.startswith("Represents a Function Span"): - tool_attributes[f"{GenAIAttributes.GEN_AI_COMPLETION}.tool.description"] = desc + tool_attributes["gen_ai.tool.description"] = descAlso applies to: 172-172
packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/__init__.py (1)
166-201: Map remaining params to GenAI request attributes
random_seed,max_new_tokens, andrepetition_penaltyare still exported under the old Watsonx-specificSpanAttributes. The GenAI spec now provides direct equivalents, so these should ride the new keys to finish the migration. Onlydecoding_methodandmin_new_tokensneed to stay on the legacy namespace for now. Based on learnings.Apply this diff to fix the issue:
_set_span_attribute( span, SpanAttributes.LLM_DECODING_METHOD, modelParameters.get("decoding_method", None), ) _set_span_attribute( span, - SpanAttributes.LLM_RANDOM_SEED, + GenAIAttributes.GEN_AI_REQUEST_SEED, modelParameters.get("random_seed", None), ) _set_span_attribute( span, - SpanAttributes.LLM_MAX_NEW_TOKENS, + GenAIAttributes.GEN_AI_REQUEST_MAX_TOKENS, modelParameters.get("max_new_tokens", None), ) _set_span_attribute( span, SpanAttributes.LLM_MIN_NEW_TOKENS, modelParameters.get("min_new_tokens", None), ) _set_span_attribute( span, GenAIAttributes.GEN_AI_REQUEST_TOP_K, modelParameters.get("top_k", None) ) _set_span_attribute( span, - SpanAttributes.LLM_REPETITION_PENALTY, + GenAIAttributes.GEN_AI_REQUEST_FREQUENCY_PENALTY, modelParameters.get("repetition_penalty", None), )
🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (1)
438-438: Consider using the existing GenAIAttributes alias for consistency.The
GEN_AI_AGENT_NAMEconstant is imported directly here, but it's already available via theGenAIAttributesalias imported at Line 13. For consistency with the rest of the file, consider usingGenAIAttributes.GEN_AI_AGENT_NAMEdirectly instead of this additional import.Apply this diff to use the existing alias:
- from opentelemetry.semconv._incubating.attributes.gen_ai_attributes import GEN_AI_AGENT_NAMEThen update references in the function to use
GenAIAttributes.GEN_AI_AGENT_NAME.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/__init__.py(3 hunks)packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py(12 hunks)packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py(5 hunks)packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/__init__.py(14 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/__init__.pypackages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.pypackages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/__init__.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
🧠 Learnings (1)
📚 Learning: 2025-08-17T15:06:48.109Z
Learnt from: CR
PR: traceloop/openllmetry#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Semantic conventions must follow the OpenTelemetry GenAI specification
Applied to files:
packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/__init__.pypackages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.pypackages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/__init__.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
🧬 Code graph analysis (1)
packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/__init__.py (2)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
SpanAttributes(64-245)packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py (1)
_set_span_attribute(18-22)
🔇 Additional comments (9)
packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (7)
13-14: LGTM! Import aligns with the GenAI attribute migration.The import correctly introduces the
GenAIAttributesalias, providing access to the new OpenTelemetry GenAI semantic conventions as intended by this PR.
114-115: LGTM! Agent attributes correctly migrated.The assertions now properly reference the GenAI semantic convention constants for agent name and system attributes.
137-138: LGTM! Prompt attributes correctly migrated.The test assertions now properly use the GenAI semantic convention constants for prompt attributes.
141-146: LGTM! Usage token attributes successfully migrated.The test assertions now correctly use the new GenAI semantic convention constants (
GEN_AI_USAGE_INPUT_TOKENSandGEN_AI_USAGE_OUTPUT_TOKENS), replacing the legacy prompt/completion token naming. This addresses the previous review feedback and aligns with the OpenTelemetry specification.
149-151: LGTM! Completion attributes correctly migrated.The test assertions properly use the GenAI semantic convention constants for completion attributes.
200-210: LGTM! Tool attributes correctly migrated.The tool-related assertions now properly reference GenAI semantic convention constants, including tool name, type, description, and JSON schema attributes.
300-303: LGTM! Metrics token type correctly migrated.The metrics assertion properly uses the GenAI semantic convention constant for token type, with values ("output", "input") that align with the new token naming conventions.
packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/__init__.py (2)
28-30: LGTM: Import aligns with OpenTelemetry GenAI semantic conventions.The import of
GenAIAttributesis correctly structured and follows the migration to OpenTelemetry's standardized GenAI attributes.Based on learnings.
184-188: Original review comment flagged a valid concern but verification confirms no action is needed.The partial migration in this file is intentional and consistent across the entire codebase. Here's what verification revealed:
GenAI spec does NOT define a REQUEST_TYPE equivalent: The GenAI semantic conventions use
gen_ai.operation.namefor operation types (different naming pattern), notgen_ai.request.type.This pattern is systematic, not an oversight: Across 18+ instrumentation packages (vertexai, mistralai, ollama, groq, cohere, anthropic, watsonx, and others), the same pattern exists—
GenAIAttributes.GEN_AI_SYSTEMis migrated whileSpanAttributes.LLM_REQUEST_TYPEremains unchanged.LLM_REQUEST_TYPE is maintained intentionally: Since it has no direct GenAI counterpart, it's preserved during the transition from LLM to GenAI semantic conventions.
The code is correct as-is and requires no changes.
nirga
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay in approving this @martimfasantos! I was out for a couple of weeks and this has slipped between the cracks

This pull request updates the token usage terminology across multiple packages and test files. The changes involve replacing
prompt_tokenswithinput_tokensandcompletion_tokenswithoutput_tokensto align with updated naming conventions present in the Documentation. Additionally, all related assertions in the tests have been modified to reflect these updates.Screenshots
feat(instrumentation): ...orfix(instrumentation): ....Important
Renamed
prompt_tokenstoinput_tokensandcompletion_tokenstooutput_tokensacross various test files and packages to align with updated naming conventions.prompt_tokenstoinput_tokensandcompletion_tokenstooutput_tokensinSpanAttributesin__init__.py.test_completion.py,test_messages.py, andtest_prompt_caching.pyto useinput_tokensandoutput_tokens.test_completion.pyfor AlephAlpha, Anthropic, OpenAI, and Together packages.test_prompt_caching.pyfor Anthropic and OpenAI to reflect new token terminology.test_chat.pyandtest_completion.pyfor Together package to use updated token names.This description was created by
for 3dad6f9. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit
New Features
Refactor
Breaking Changes
Tests
Chores