Skip to content

Conversation

@martimfasantos
Copy link
Contributor

@martimfasantos martimfasantos commented Jul 15, 2025

This pull request updates the token usage terminology across multiple packages and test files. The changes involve replacing prompt_tokens with input_tokens and completion_tokens with output_tokens to align with updated naming conventions present in the Documentation. Additionally, all related assertions in the tests have been modified to reflect these updates.

Screenshots

image
  • I have added tests that cover my changes.
  • If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
  • PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
  • (If applicable) I have updated the documentation accordingly.

Important

Renamed prompt_tokens to input_tokens and completion_tokens to output_tokens across various test files and packages to align with updated naming conventions.

  • Terminology Update:
    • Renamed prompt_tokens to input_tokens and completion_tokens to output_tokens in SpanAttributes in __init__.py.
    • Updated assertions in test files like test_completion.py, test_messages.py, and test_prompt_caching.py to use input_tokens and output_tokens.
  • Test Files:
    • Modified token usage assertions in test_completion.py for AlephAlpha, Anthropic, OpenAI, and Together packages.
    • Updated test_prompt_caching.py for Anthropic and OpenAI to reflect new token terminology.
    • Adjusted test_chat.py and test_completion.py for Together package to use updated token names.
  • Misc:
    • Ensured all related test assertions and logs reflect the new token terminology.

This description was created by Ellipsis for 3dad6f9. You can customize this summary. It will automatically update as commits are pushed.

Summary by CodeRabbit

  • New Features

    • Introduced a unified GenAI attributes surface for telemetry and streaming first-token metrics.
  • Refactor

    • Migrated telemetry attribute names from legacy LLM_* to GEN_AI_* across instrumentations; token usage is now reported as gen_ai.usage.input_tokens/gen_ai.usage.output_tokens.
  • Breaking Changes

    • Legacy LLM_* span attribute constants removed or deprecated — integrations should adopt the new GenAI attribute keys.
  • Tests

    • Test suites updated to expect GEN_AI_* names; some assertions relaxed to reduce brittleness.
  • Chores

    • Updated semantic-conventions and SDK dependency declarations.

@CLAassistant
Copy link

CLAassistant commented Jul 15, 2025

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed everything up to 3dad6f9 in 1 minute and 35 seconds. Click for details.
  • Reviewed 591 lines of code in 7 files
  • Skipped 0 files when reviewing.
  • Skipped posting 13 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py:50
  • Draft comment:
    Updated constants: 'LLM_USAGE_COMPLETION_TOKENS' now uses 'gen_ai.usage.output_tokens' and 'LLM_USAGE_PROMPT_TOKENS' uses 'gen_ai.usage.input_tokens'. This aligns with the new naming conventions.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative, explaining the changes made to constants and their alignment with new naming conventions. It does not provide any actionable feedback or suggestions for improvement.
2. packages/opentelemetry-instrumentation-together/tests/test_chat.py:37
  • Draft comment:
    Test assertions now use the updated attribute names (input_tokens/output_tokens) for token usage. Verify consistency with new semantic conventions.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to verify consistency with new semantic conventions, which falls under the rule of not asking the author to double-check things. It doesn't provide a specific suggestion or point out a specific issue with the code.
3. packages/opentelemetry-instrumentation-together/tests/test_completion.py:37
  • Draft comment:
    Ensure that the test for Together completions correctly sums 'gen_ai.usage.input_tokens' and 'gen_ai.usage.output_tokens' to match total tokens.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
4. packages/opentelemetry-instrumentation-openai/tests/traces/test_prompt_caching.py:70
  • Draft comment:
    The OpenAI prompt caching tests have been updated to use new input/output token attribute names. Confirm that these changes are in sync with semantic conventions.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to confirm that changes are in sync with semantic conventions, which is against the rules. It does not provide a specific suggestion or ask for a specific test to be written. Therefore, it should be removed.
5. packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py:54
  • Draft comment:
    Consider updating the accompanying documentation or changelog to mention that the legacy token-related attribute names have been replaced with 'input_tokens' and 'output_tokens'.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment suggests updating documentation or changelog, which is not allowed as per the rules. It doesn't provide a specific code suggestion or ask for a test to be written.
6. packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py:109
  • Draft comment:
    Typographical error: "cassete" should be "cassette" in the comment.
  • Reason this comment was not posted:
    Comment was on unchanged code.
7. packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py:195
  • Draft comment:
    Typo/grammatical note: The comment currently says "but not wrote to it," which could be updated to "but not write to it," for clarity.
  • Reason this comment was not posted:
    Comment was on unchanged code.
8. packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py:604
  • Draft comment:
    Typographical suggestion: The comment reads "but not wrote to it," which could be clearer. Consider revising it to something like "but did not write to it," to improve clarity.
  • Reason this comment was not posted:
    Comment was on unchanged code.
9. packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py:798
  • Draft comment:
    Typo in comment: "but not wrote to it" is ungrammatical. Consider using "but not writing to it" or "but not writing into it" for clarity.
  • Reason this comment was not posted:
    Comment was on unchanged code.
10. packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py:931
  • Draft comment:
    Typo in comment: Consider updating "but not wrote to it" to "but did not write to it" for clearer grammar.
  • Reason this comment was not posted:
    Comment was on unchanged code.
11. packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py:1025
  • Draft comment:
    Typo in comment: Consider changing 'wrote to it' to 'written to it' for clarity.
  • Reason this comment was not posted:
    Comment was on unchanged code.
12. packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py:1029
  • Draft comment:
    Typographical error: 'cassete' appears to be misspelled. Consider updating it to 'cassette'.
  • Reason this comment was not posted:
    Comment was on unchanged code.
13. packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py:1362
  • Draft comment:
    Typo: 'cassete' should be spelled as 'cassette' in the comment.
  • Reason this comment was not posted:
    Comment was on unchanged code.

Workflow ID: wflow_oahtGC07iVEgOTYG

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Member

@nirga nirga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @martimfasantos! in this case I'd remove them from the semantic conventions and just use it directly from the otel package - that way it will always be up to date (when we originally built it there weren't any semconvs for this)

@coderabbitai
Copy link

coderabbitai bot commented Jul 19, 2025

Walkthrough

Mass rename of span and metric attribute keys: legacy SpanAttributes LLM_* constants replaced by GenAIAttributes GEN_AI_* equivalents across instrumentations and tests (prompts→GEN_AI_PROMPT, completions→GEN_AI_COMPLETION, prompt_tokens→input_tokens, completion_tokens→output_tokens). Mostly import/constant substitutions; one event handler now calls span.end().

Changes

Cohort / File(s) Summary
Semantic conventions core
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py, packages/opentelemetry-semantic-conventions-ai/pyproject.toml
Removed many LLM_* public constants, added GenAI usage-cache constants; pyproject dependencies updated (opentelemetry-sdk, opentelemetry-semantic-conventions).
Instrumentation modules (bulk)
packages/*/opentelemetry/instrumentation/** (alephalpha, anthropic, bedrock, cohere, crewai, google_generativeai, groq, haystack, langchain, llamaindex, mistralai, ollama, openai*, replicate, sagemaker, together, transformers, vertexai, watsonx, etc.)
Replaced SpanAttributes.LLM_* usages with GenAIAttributes.GEN_AI_* for prompts, completions, request/response model/system/params, token-type and usage keys; added gen_ai_attributes as GenAIAttributes imports.
Span utils, streaming & helpers
.../span_utils.py, .../streaming.py, .../utils.py, .../guardrail.py
Internal attribute keys migrated to GenAIAttributes.* (prompt/completion prefixes, usage/input/output tokens, token-type, response id); minor local refinements in streaming/error handling and where response id/duration attributes are set.
OpenAI shared & wrappers
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/**
Comprehensive migration to GenAIAttributes.* for prompts/completions/system/request/response/usage/cache; streaming and metric paths updated; event_handler_wrapper.on_end now calls self._span.end().
Tests across packages
packages/*/tests/** (extensive)
Test assertions updated to use GenAIAttributes.GEN_AI_* and token key renames (gen_ai.usage.prompt_tokensgen_ai.usage.input_tokens, gen_ai.usage.completion_tokensgen_ai.usage.output_tokens); total-token assertions adjusted to sum input+output; some tests reduced/disabled granular payload assertions.
TraceLoop SDK & manual tracing
packages/traceloop-sdk/*
Manual tracing APIs and tests updated to emit GEN_AI_* keys for requests, responses, usage, and system; signatures unchanged.

Sequence Diagram(s)

sequenceDiagram
  participant Instr as Instrumentation
  participant GenAI as gen_ai_attributes (GenAIAttributes)
  participant Span as OpenTelemetry Span

  Note over Instr,GenAI: import gen_ai_attributes as GenAIAttributes
  Instr->>GenAI: read constants (GEN_AI_SYSTEM, GEN_AI_PROMPT, GEN_AI_COMPLETION, GEN_AI_USAGE_*)
  Instr->>Span: start span with attributes (GenAIAttributes.GEN_AI_SYSTEM, GenAIAttributes.GEN_AI_REQUEST_MODEL, ...)
  activate Span
  Instr->>Span: set prompt attributes → GenAIAttributes.GEN_AI_PROMPT.*
  Instr->>Span: set completion attributes → GenAIAttributes.GEN_AI_COMPLETION.*
  Instr->>Span: set usage attributes → GEN_AI_USAGE_INPUT_TOKENS / GEN_AI_USAGE_OUTPUT_TOKENS
  Instr->>Span: record token histograms (GEN_AI_TOKEN_TYPE, GEN_AI_RESPONSE_MODEL)
  alt event-handler path
    Instr->>Span: event handler may call self._span.end()
  end
  Instr->>Span: end span
  deactivate Span
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Potential attention areas:

  • Semantic-conventions changes in semconv_ai/__init__.py (removals/additions) — verify public API compatibility.
  • OpenAI shared/wrappers and streaming paths — many touchpoints and the added span.end() call.
  • Langchain, openai_agents, and other high-surface modules where prompt/tool-call nesting was migrated — ensure nested keys and indexing are correct.
  • Tests that were relaxed/disabled — validate intentionality and test coverage impact.

Possibly related PRs

Suggested reviewers

  • nirga
  • galkleinman
  • dinmukhamedm

Poem

I hopped through code and swapped each key,
From LLM to GenAI — the traces agree.
Input and output tokens now neatly align,
I thumped with joy — the spans look fine.
Carrots counted, attributes sung: thump-thump, merge done! 🥕🐇

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 29.41% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "fix(instrumentation): updated GenAI attributes to use OTel's" clearly and directly relates to the main objective of the changeset. The PR comprehensively updates GenAI span attribute keys across numerous instrumentation packages to align with OpenTelemetry's semantic conventions, migrating from legacy LLM_* keys to new GEN_AI_* keys and specifically renaming token usage attributes from prompt/completion to input/output tokens. The title accurately captures this primary change, though it could be more complete (the phrase "OTel's" appears truncated without specifying "semantic conventions"). A teammate reviewing the repository history would immediately understand this PR concerns GenAI attribute migration to align with OpenTelemetry standards.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🔭 Outside diff range comments (12)
packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/__init__.py (1)

210-225: Replace LLM token attributes with GEN_AI equivalents

The semantic-conventions package exports GEN_AI_USAGE_INPUT_TOKENS and GEN_AI_USAGE_OUTPUT_TOKENS, so we should stop using the old LLM_USAGE_PROMPT_TOKENS and LLM_USAGE_COMPLETION_TOKENS. The total-tokens attribute remains LLM_USAGE_TOTAL_TOKENS (no GEN_AI_USAGE_TOTAL_TOKENS is defined).

Locations to update:

  • packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/init.py lines ~210–225
  • packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/init.py lines ~324–336

Suggested diff:

-    _set_span_attribute(
-        span,
-        SpanAttributes.LLM_USAGE_PROMPT_TOKENS,
-        stream_response.get("input_token_count"),
-    )
+    _set_span_attribute(
+        span,
+        SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS,
+        stream_response.get("input_token_count"),
+    )

-    _set_span_attribute(
-        span,
-        SpanAttributes.LLM_USAGE_COMPLETION_TOKENS,
-        stream_response.get("generated_token_count"),
-    )
+    _set_span_attribute(
+        span,
+        SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS,
+        stream_response.get("generated_token_count"),
+    )

     total_token = stream_response.get("input_token_count") + stream_response.get(
         "generated_token_count"
     )
-    _set_span_attribute(
-        span,
-        SpanAttributes.LLM_USAGE_TOTAL_TOKENS,
-        total_token,
-    )
+    _set_span_attribute(
+        span,
+        SpanAttributes.LLM_USAGE_TOTAL_TOKENS,
+        total_token,
+    )

Please apply the same replacements in the second block (lines 324–336) and update any related tests to expect the new GEN_AI_USAGE_* constants.

packages/traceloop-sdk/traceloop/sdk/tracing/manual.py (1)

49-65: Align usage attributes with Gen AI semantic conventions

The Gen AI spec defines gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, and the cache‐related usage attributes, but does not yet provide gen_ai.usage.total_tokens or a gen_ai.request.type. To stay consistent:

• Replace LLM prompt/completion and cache‐input tokens with the GEN_AI equivalents
• Keep LLM_USAGE_TOTAL_TOKENS and LLM_REQUEST_TYPE as they have no Gen AI counterparts

Locations to update (packages/traceloop-sdk/traceloop/sdk/tracing/manual.py, lines 49–65):

-        self._span.set_attribute(
-            SpanAttributes.LLM_USAGE_PROMPT_TOKENS, usage.prompt_tokens
-        )
-        self._span.set_attribute(
-            SpanAttributes.LLM_USAGE_COMPLETION_TOKENS, usage.completion_tokens
-        )
+        self._span.set_attribute(
+            SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS, usage.prompt_tokens
+        )
+        self._span.set_attribute(
+            SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, usage.completion_tokens
+        )
         self._span.set_attribute(
             SpanAttributes.LLM_USAGE_TOTAL_TOKENS, usage.total_tokens
         )
-        self._span.set_attribute(
-            SpanAttributes.LLM_USAGE_CACHE_CREATION_INPUT_TOKENS,
-            usage.cache_creation_input_tokens,
-        )
-        self._span.set_attribute(
-            SpanAttributes.LLM_USAGE_CACHE_READ_INPUT_TOKENS,
-            usage.cache_read_input_tokens,
-        )
+        self._span.set_attribute(
+            SpanAttributes.GEN_AI_USAGE_CACHE_CREATION_INPUT_TOKENS,
+            usage.cache_creation_input_tokens,
+        )
+        self._span.set_attribute(
+            SpanAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS,
+            usage.cache_read_input_tokens,
+        )

No change needed for LLM_REQUEST_TYPE—the semantic conventions only define "llm.request.type" at this time.

packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (1)

479-497: None > 0 guard needed before metric recording

prompt_tokens or completion_tokens can be None when the provider omits usage stats.
Comparing None > 0 raises TypeError.

-            if prompt_tokens > 0:
+            if isinstance(prompt_tokens, (int, float)) and prompt_tokens > 0:
@@
-            if completion_tokens > 0:
+            if isinstance(completion_tokens, (int, float)) and completion_tokens > 0:
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py (1)

226-233: Inconsistent token usage attribute naming.

The token usage attributes still use the legacy LLM_USAGE_* naming instead of the newer GEN_AI_USAGE_* conventions. Based on the PR objectives to update token terminology, these should be updated to align with the new naming conventions.

Apply this diff to align with the new token usage attribute naming:

         _set_span_attribute(
             span,
-            SpanAttributes.LLM_USAGE_COMPLETION_TOKENS,
+            SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS,
             usage_dict.get("completion_tokens"),
         )
         _set_span_attribute(
             span,
-            SpanAttributes.LLM_USAGE_PROMPT_TOKENS,
+            SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS,
             usage_dict.get("prompt_tokens"),
         )
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (1)

402-416: Inconsistent token usage attribute naming.

The token usage span attributes still use the legacy LLM_USAGE_* naming instead of the newer GEN_AI_USAGE_* conventions, which is inconsistent with the overall migration to GenAI semantic conventions.

Consider updating these attributes to align with the new naming conventions:

             set_span_attribute(
                 span,
-                SpanAttributes.LLM_USAGE_PROMPT_TOKENS,
+                SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS,
                 input_tokens,
             )
             set_span_attribute(
                 span,
-                SpanAttributes.LLM_USAGE_COMPLETION_TOKENS,
+                SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS,
                 output_tokens,
             )
packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/span_utils.py (1)

80-89: Inconsistent attribute naming - some LLM_ attributes remain.*

While the response model was updated, the token usage attributes and finish reason attribute still use the legacy LLM_* naming instead of the newer GEN_AI_* conventions.

Consider updating these remaining attributes for consistency:

         span.set_attribute(
-            SpanAttributes.LLM_USAGE_COMPLETION_TOKENS, usage.completion_tokens
+            SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, usage.completion_tokens
         )
-        span.set_attribute(SpanAttributes.LLM_USAGE_PROMPT_TOKENS, usage.prompt_tokens)
+        span.set_attribute(SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS, usage.prompt_tokens)

And for the finish reason attribute:

         span.set_attribute(
-            SpanAttributes.LLM_RESPONSE_FINISH_REASON, choices[0].finish_reason
+            SpanAttributes.GEN_AI_RESPONSE_FINISH_REASONS, choices[0].finish_reason
         )
packages/opentelemetry-instrumentation-together/opentelemetry/instrumentation/together/span_utils.py (1)

88-109: Incomplete semantic convention migration for token usage attributes

While GEN_AI_RESPONSE_MODEL has been correctly updated, the token usage attributes on lines 97, 102, and 107 still use the legacy LLM_USAGE_* naming convention. Based on the PR objectives to update token usage terminology, these should be updated to:

  • LLM_USAGE_PROMPT_TOKENSGEN_AI_USAGE_INPUT_TOKENS
  • LLM_USAGE_COMPLETION_TOKENSGEN_AI_USAGE_OUTPUT_TOKENS
  • LLM_USAGE_TOTAL_TOKENSGEN_AI_USAGE_TOTAL_TOKENS
    _set_span_attribute(
        span,
-       SpanAttributes.LLM_USAGE_TOTAL_TOKENS,
+       SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS,
        input_tokens + output_tokens,
    )
    _set_span_attribute(
        span,
-       SpanAttributes.LLM_USAGE_COMPLETION_TOKENS,
+       SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS,
        output_tokens,
    )
    _set_span_attribute(
        span,
-       SpanAttributes.LLM_USAGE_PROMPT_TOKENS,
+       SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS,
        input_tokens,
    )
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.py (1)

902-911: Update token usage attribute names in streaming tests

The new semantic conventions define GEN_AI_USAGE_INPUT_TOKENS and GEN_AI_USAGE_OUTPUT_TOKENS, so the legacy LLM_USAGE_* constants should be replaced accordingly. Since there is no GEN_AI_USAGE_TOTAL_TOKENS constant, compute the total by summing input and output tokens.

• File: packages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.py (around lines 902–911)

  • Replace SpanAttributes.LLM_USAGE_PROMPT_TOKENSSpanAttributes.GEN_AI_USAGE_INPUT_TOKENS
  • Replace SpanAttributes.LLM_USAGE_COMPLETION_TOKENSSpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS
  • Update total‐tokens assertion to sum the two new attributes instead of using LLM_USAGE_TOTAL_TOKENS

Suggested diff:

- assert (
-     bedrock_span.attributes[SpanAttributes.LLM_USAGE_PROMPT_TOKENS]
-     == inputTokens
- )
- assert (
-     bedrock_span.attributes[SpanAttributes.LLM_USAGE_COMPLETION_TOKENS]
-     == outputTokens
- )
- assert (
-     bedrock_span.attributes[SpanAttributes.LLM_USAGE_TOTAL_TOKENS]
-     == inputTokens + outputTokens
- )
+ assert bedrock_span.attributes[SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS] == inputTokens
+ assert bedrock_span.attributes[SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] == outputTokens
+ assert (
+     bedrock_span.attributes[SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS]
+     + bedrock_span.attributes[SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS]
+     == inputTokens + outputTokens
+ )
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py (2)

222-230: Token usage attributes not updated to new semantic conventions

Similar to other files in this PR, the token usage attributes still use legacy LLM_USAGE_* naming:

  • LLM_USAGE_PROMPT_TOKENS (line 222)
  • LLM_USAGE_COMPLETION_TOKENS (line 224)
  • LLM_USAGE_TOTAL_TOKENS (line 228)

For consistency with the PR objectives, these should be updated to use GEN_AI_USAGE_* equivalents with the new "input/output" terminology instead of "prompt/completion".


76-90: Update semantic convention prefixes for penalty attributes and confirm streaming attribute

The semantic conventions for frequency and presence penalties have GEN_AI_REQUEST_* equivalents and should replace the legacy LLM_* prefixes. The streaming attribute currently has no GEN_AI_* equivalent in the semantic conventions—continue using LLM_IS_STREAMING or propose adding GEN_AI_REQUEST_STREAMING.

• In packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py

  • Line 85: replace SpanAttributes.LLM_FREQUENCY_PENALTY with SpanAttributes.GEN_AI_REQUEST_FREQUENCY_PENALTY
  • Line 88: replace SpanAttributes.LLM_PRESENCE_PENALTY with SpanAttributes.GEN_AI_REQUEST_PRESENCE_PENALTY
  • Line 90: retain SpanAttributes.LLM_IS_STREAMING (no GEN_AI_REQUEST_STREAMING found in opentelemetry-semantic-conventions-ai)

Suggested diff:

- set_span_attribute(
-     span, SpanAttributes.LLM_FREQUENCY_PENALTY, kwargs.get("frequency_penalty")
- )
+ set_span_attribute(
+     span, SpanAttributes.GEN_AI_REQUEST_FREQUENCY_PENALTY, kwargs.get("frequency_penalty")
+ )

- set_span_attribute(
-     span, SpanAttributes.LLM_PRESENCE_PENALTY, kwargs.get("presence_penalty")
- )
+ set_span_attribute(
+     span, SpanAttributes.GEN_AI_REQUEST_PRESENCE_PENALTY, kwargs.get("presence_penalty")
+ )
packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/span_utils.py (1)

86-140: Migrate legacy LLM_USAGE_* attributes to the new naming convention

The old token-usage constants (LLM_USAGE_COMPLETION_TOKENS, LLM_USAGE_PROMPT_TOKENS, LLM_USAGE_TOTAL_TOKENS) are still in use across multiple instrumentation packages (e.g., groq/span_utils.py, together/span_utils.py, vertexai/span_utils.py, etc.). All of these must be updated to the new attribute names alongside the already-migrated GEN_AI_TOKEN_TYPE and GEN_AI_RESPONSE_MODEL.

Please update every occurrence in your span utilities—including tests—to use the correct, non-legacy constants. You can locate all remaining uses with:

rg -l 'LLM_USAGE_.*TOKEN' packages/opentelemetry-instrumentation-*/

• Replace LLM_USAGE_PROMPT_TOKENS
• Replace LLM_USAGE_COMPLETION_TOKENS
• Replace LLM_USAGE_TOTAL_TOKENS

with their new equivalents in each span_utils.py (and related test files).

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (1)

287-300: _get_vendor_from_url returns capital-cased values that violate the GenAI enum

GenAISystemValues in the incuabating semconv are lower-case ("openai", "aws", "azure", …).
Here we return "AWS", "Azure", etc.; this will produce spans that fail semantic-convention
validation and break downstream queries that expect the canonical values.

-    if "openai.azure.com" in base_url:
-        return "Azure"
-    elif "amazonaws.com" in base_url or "bedrock" in base_url:
-        return "AWS"
-    elif "googleapis.com" in base_url or "vertex" in base_url:
-        return "Google"
-    elif "openrouter.ai" in base_url:
-        return "OpenRouter"
+    if "openai.azure.com" in base_url:
+        return "azure"
+    elif "amazonaws.com" in base_url or "bedrock" in base_url:
+        return "aws"
+    elif "googleapis.com" in base_url or "vertex" in base_url:
+        return "google"
+    elif "openrouter.ai" in base_url:
+        return "openrouter"

A one-line return vendor.lower() after detection would also work.

♻️ Duplicate comments (1)
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (1)

269-276: Repeat comment: use enum for vendor

Same concern as above—replace the literal "AWS" with the enum value to stay future-proof.

🧹 Nitpick comments (13)
packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/custom_llm_instrumentor.py (1)

177-181: Simplify nested if statements.

The static analysis tool correctly identified nested if statements that can be combined for better readability.

Apply this diff to combine the nested if statements:

-    if should_send_prompts():
-        if llm_request_type == LLMRequestTypeValues.COMPLETION:
+    if should_send_prompts() and llm_request_type == LLMRequestTypeValues.COMPLETION:
             _set_span_attribute(
                 span, f"{SpanAttributes.GEN_AI_COMPLETION}.0.content", response.text
             )
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (1)

51-52: Prefer the official enum to avoid string drift

"AWS" is hard-coded while elsewhere the enum GenAIAttributes.GenAiSystemValues.AWS_BEDROCK.value is used. Using the enum keeps tests aligned if the canonical value ever changes.

-assert bedrock_span.attributes[SpanAttributes.GEN_AI_SYSTEM] == "AWS"
+assert (
+    bedrock_span.attributes[SpanAttributes.GEN_AI_SYSTEM]
+    == GenAIAttributes.GenAiSystemValues.AWS_BEDROCK.value
+)
packages/opentelemetry-instrumentation-cohere/tests/test_completion.py (1)

53-56: Minor consistency nit

Consider using GenAIAttributes.GenAiSystemValues.COHERE.value instead of the literal "Cohere" for the vendor check.

packages/opentelemetry-instrumentation-openai/tests/traces/test_embeddings.py (2)

145-153: Duplication across tests

The same literal prompt string & model ID are asserted in ~20 places. Parametrize via pytest.mark.parametrize or a fixture to reduce duplication and improve maintainability.


321-327: Enum vs literal vendor value

Throughout the file vendor strings ("openai", etc.) are hard-coded. Prefer the enum values from GenAIAttributes for future resiliency.

packages/opentelemetry-instrumentation-mistralai/tests/test_chat.py (2)

27-41: Repeated literal "MistralAI" vendor

Same comment as other files – consider centralising vendor constant via enum to avoid divergence.

Also applies to: 73-89, 126-143, 179-197


176-197: Large duplication – consider parametrised fixtures

The chat & streaming variants share extensive duplicated assertions differing only by the fixture used (instrument_*). A helper asserting common span attributes would shrink the test file and make future convention changes easier.

packages/opentelemetry-instrumentation-vertexai/opentelemetry/instrumentation/vertexai/span_utils.py (2)

25-29: Minor naming mismatch in prompt attribute key

Other instrumentations store the user prompt under …prompt.0.content, but here it is stored under …prompt.0.user.
If this was accidental, switching to content keeps cross-provider parity.

-            f"{SpanAttributes.GEN_AI_PROMPT}.0.user",
+            f"{SpanAttributes.GEN_AI_PROMPT}.0.content",

57-66: Unused parameter (llm_model) in set_response_attributes

llm_model is accepted but never referenced. Either remove the argument or use it (e.g., set GEN_AI_RESPONSE_MODEL).

packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (1)

273-276: Mixed attribute families in the same span

GEN_AI_SYSTEM replaces LLM_SYSTEM, but the next line still sets LLM_REQUEST_TYPE. Consider moving that one to a GEN_AI_REQUEST_TYPE constant (if defined) for consistency.

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (1)

391-402: Hard-coded strings for usage metrics – prefer the SpanAttributes constants

metric_shared_attributes builds the dict with string literals
"gen_ai.operation.name" etc., but uses SpanAttributes.GEN_AI_SYSTEM
and GEN_AI_RESPONSE_MODEL. For the two usage keys introduced in this
PR we now have SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS /
GEN_AI_USAGE_OUTPUT_TOKENS. Using the constants avoids typos and keeps
refactors mechanical.

packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py (2)

1291-1299: Raw "gen_ai.usage.*" literals – use the constants to avoid drift

These assertions use bare strings for the input / output token counters.
Now that SpanAttributes exposes
GEN_AI_USAGE_INPUT_TOKENS / GEN_AI_USAGE_OUTPUT_TOKENS, switch to:

-assert anthropic_span.attributes["gen_ai.usage.input_tokens"] == 514
+assert anthropic_span.attributes[SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 514

Besides eliminating magic strings, it guarantees the tests fail when the
spec keys change again.

Also applies to: 1468-1474, 1604-1610


433-468: Verbose prompt/completion assertions – extractor helper would shrink test noise

The repeated pattern

assert span.attributes[f"{SpanAttributes.GEN_AI_PROMPT}.{n}.role"] == ...
assert span.attributes[f"{SpanAttributes.GEN_AI_PROMPT}.{n}.content"] == ...

appears dozens of times. A small helper such as
assert_prompt(span, idx, role, content) would make the intentions
clearer and the diff noise smaller when the convention inevitably
evolves again.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 01b6fc1 and f8914d6.

📒 Files selected for processing (47)
  • packages/opentelemetry-instrumentation-alephalpha/opentelemetry/instrumentation/alephalpha/__init__.py (3 hunks)
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (8 hunks)
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py (7 hunks)
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py (3 hunks)
  • packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py (22 hunks)
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py (23 hunks)
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.py (3 hunks)
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py (15 hunks)
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_meta.py (12 hunks)
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.py (17 hunks)
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (14 hunks)
  • packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py (6 hunks)
  • packages/opentelemetry-instrumentation-cohere/tests/test_chat.py (3 hunks)
  • packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/span_utils.py (5 hunks)
  • packages/opentelemetry-instrumentation-google-generativeai/tests/test_generate_content.py (3 hunks)
  • packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/span_utils.py (9 hunks)
  • packages/opentelemetry-instrumentation-groq/tests/traces/test_chat_tracing.py (9 hunks)
  • packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (5 hunks)
  • packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (9 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/test_llms.py (16 hunks)
  • packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/span_utils.py (5 hunks)
  • packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py (6 hunks)
  • packages/opentelemetry-instrumentation-llamaindex/tests/test_chroma_vector_store.py (1 hunks)
  • packages/opentelemetry-instrumentation-mistralai/opentelemetry/instrumentation/mistralai/__init__.py (7 hunks)
  • packages/opentelemetry-instrumentation-mistralai/tests/test_chat.py (12 hunks)
  • packages/opentelemetry-instrumentation-ollama/opentelemetry/instrumentation/ollama/span_utils.py (8 hunks)
  • packages/opentelemetry-instrumentation-ollama/tests/test_chat.py (16 hunks)
  • packages/opentelemetry-instrumentation-ollama/tests/test_generation.py (12 hunks)
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (5 hunks)
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (2 hunks)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (6 hunks)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py (5 hunks)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/event_handler_wrapper.py (2 hunks)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (1 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_assistant.py (15 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py (13 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (19 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_completions.py (10 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_embeddings.py (9 hunks)
  • packages/opentelemetry-instrumentation-together/opentelemetry/instrumentation/together/span_utils.py (5 hunks)
  • packages/opentelemetry-instrumentation-vertexai/opentelemetry/instrumentation/vertexai/span_utils.py (5 hunks)
  • packages/opentelemetry-instrumentation-vertexai/tests/disabled_test_bison.py (6 hunks)
  • packages/opentelemetry-instrumentation-vertexai/tests/disabled_test_gemini.py (3 hunks)
  • packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/__init__.py (13 hunks)
  • packages/traceloop-sdk/tests/test_manual.py (1 hunks)
  • packages/traceloop-sdk/tests/test_privacy_no_prompts.py (1 hunks)
  • packages/traceloop-sdk/traceloop/sdk/tracing/manual.py (2 hunks)
✅ Files skipped from review due to trivial changes (5)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
  • packages/opentelemetry-instrumentation-google-generativeai/tests/test_generate_content.py
  • packages/opentelemetry-instrumentation-alephalpha/opentelemetry/instrumentation/alephalpha/init.py
  • packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py
  • packages/opentelemetry-instrumentation-ollama/tests/test_generation.py
🚧 Files skipped from review as they are similar to previous changes (41)
  • packages/opentelemetry-instrumentation-llamaindex/tests/test_chroma_vector_store.py
  • packages/opentelemetry-instrumentation-vertexai/tests/disabled_test_bison.py
  • packages/traceloop-sdk/tests/test_privacy_no_prompts.py
  • packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/span_utils.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py
  • packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py
  • packages/opentelemetry-instrumentation-ollama/opentelemetry/instrumentation/ollama/span_utils.py
  • packages/traceloop-sdk/tests/test_manual.py
  • packages/opentelemetry-instrumentation-groq/tests/traces/test_chat_tracing.py
  • packages/opentelemetry-instrumentation-together/opentelemetry/instrumentation/together/span_utils.py
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_embeddings.py
  • packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/span_utils.py
  • packages/opentelemetry-instrumentation-vertexai/tests/disabled_test_gemini.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/event_handler_wrapper.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/init.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_completions.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_meta.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/init.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py
  • packages/opentelemetry-instrumentation-mistralai/tests/test_chat.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py
  • packages/opentelemetry-instrumentation-mistralai/opentelemetry/instrumentation/mistralai/init.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.py
  • packages/opentelemetry-instrumentation-cohere/tests/test_chat.py
  • packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
  • packages/opentelemetry-instrumentation-vertexai/opentelemetry/instrumentation/vertexai/span_utils.py
  • packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/span_utils.py
  • packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py
  • packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/init.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py
  • packages/traceloop-sdk/traceloop/sdk/tracing/manual.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_assistant.py
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/init.py
  • packages/opentelemetry-instrumentation-ollama/tests/test_chat.py
  • packages/opentelemetry-instrumentation-langchain/tests/test_llms.py
  • packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.py (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (37-249)
🔇 Additional comments (2)
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.py (2)

99-109: Token attribute migration consistently applied.

The migration pattern is consistently applied here, matching the changes in the first test function. The same consideration about LLM_USAGE_TOTAL_TOKENS applies here as well.


158-168: Token attribute migration consistently applied across all test functions.

The migration is consistently implemented across all three test functions, maintaining the same logical assertions while updating to the new semantic convention attribute names.

@martimfasantos martimfasantos requested a review from nirga July 19, 2025 16:51
@martimfasantos
Copy link
Contributor Author

Hi @nirga! Gentle ping on this PR that updates gen_ai fields to current standards. Happy to adjust anything if needed - just let me know how I can help!

Copy link
Member

@nirga nirga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @martimfasantos! Thanks :) but if you're doing this work I'd rather switch to the official semantic convention in favor of ours. When we began this repo there wasn't any but now there are.

@nirga
Copy link
Member

nirga commented Aug 3, 2025

@@ -1,4 +1,5 @@
from enum import Enum
import opentelemetry.semconv._incubating.attributes.gen_ai_attributes as otel_gen_ai_attributes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi again @nirga, I’m not sure I’m following. Isn’t that what’s already being done here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes @martimfasantos - but ideally we should directly use the opentelemetry.semconv._incubating.attributes.gen_ai_attributes in the code - we want to remove our custom semconv AI long term.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes @martimfasantos - but ideally we should directly use the opentelemetry.semconv._incubating.attributes.gen_ai_attributes in the code - we want to remove our custom semconv AI long term.

@nirga I've just refactored the code to use opentelemetry.semconv._incubating.attributes.gen_ai_attributes directly. This aligns with the long-term goal.

@martimfasantos martimfasantos requested a review from nirga August 3, 2025 15:04
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (1)

1-616: Remove double-qualification of GenAIAttributes
The only instance of GenAIAttributes.GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS was found in responses_wrappers.py. Update it to a single qualification to prevent attribute lookup errors.

  • File: packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
    • Replace the double‐qualified constant:
    - GenAIAttributes.GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS
    + GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS

No other occurrences detected.

packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/__init__.py (2)

353-362: Bug: _emit_response_events signature mismatch and missing event_logger propagation

Definition takes only response, but callers pass (responses, event_logger) in one place and just response in another. Also, emit_event calls here don't pass event_logger. This will raise a TypeError and drop events.

Apply this diff to fix the signature and both call sites:

-def _emit_response_events(response: dict):
+def _emit_response_events(response: dict, event_logger):
     for i, message in enumerate(response.get("results", [])):
-        emit_event(
+        emit_event(
             ChoiceEvent(
                 index=i,
                 message={"content": message.get("generated_text"), "role": "assistant"},
                 finish_reason=message.get("stop_reason", "unknown"),
             )
-        )
+        , event_logger)
-    if should_emit_events() and event_logger:
-        _emit_response_events(responses, event_logger)
+    if should_emit_events() and event_logger:
+        _emit_response_events(responses, event_logger)
-    if should_emit_events() and event_logger:
-        _emit_response_events(
+    if should_emit_events() and event_logger:
+        _emit_response_events(
             {
                 "results": [
                     {
                         "stop_reason": stream_stop_reason,
                         "generated_text": stream_generated_text,
                     }
                 ]
-            },
+            },
+            event_logger
         )

Also applies to: 507-509, 526-537


481-494: Bug: _handle_input arity mismatch (missing response_counter at call site)

Definition expects response_counter but the call omits it. This will raise a TypeError at runtime.

Minimal fix: pass the available response_counter from _wrap:

-    _handle_input(span, event_logger, name, instance, args, kwargs)
+    _handle_input(span, event_logger, name, instance, response_counter, args, kwargs)

Alternatively, remove the unused parameter from the function signature if not needed.

Also applies to: 572-574

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (1)

599-617: Guard metric recording against None values to prevent runtime errors

Histogram.record expects a numeric value. If input_tokens or output_tokens is None, this can raise at runtime.

Apply:

-        if token_histogram:
-            token_histogram.record(
-                input_tokens,
-                attributes={
-                    GenAIAttributes.GEN_AI_SYSTEM: "openai",
-                    GenAIAttributes.GEN_AI_TOKEN_TYPE: "input",
-                    GenAIAttributes.GEN_AI_RESPONSE_MODEL: model_name,
-                    "gen_ai.agent.name": agent_name,
-                },
-            )
-            token_histogram.record(
-                output_tokens,
-                attributes={
-                    GenAIAttributes.GEN_AI_SYSTEM: "openai",
-                    GenAIAttributes.GEN_AI_TOKEN_TYPE: "output",
-                    GenAIAttributes.GEN_AI_RESPONSE_MODEL: model_name,
-                    "gen_ai.agent.name": agent_name,
-                },
-            )
+        if token_histogram:
+            base_attrs = {
+                GenAIAttributes.GEN_AI_SYSTEM: "openai",
+                GenAIAttributes.GEN_AI_RESPONSE_MODEL: model_name,
+            }
+            if agent_name:
+                base_attrs["gen_ai.agent.name"] = agent_name
+            if input_tokens is not None:
+                token_histogram.record(
+                    int(input_tokens),
+                    attributes={**base_attrs, GenAIAttributes.GEN_AI_TOKEN_TYPE: "input"},
+                )
+            if output_tokens is not None:
+                token_histogram.record(
+                    int(output_tokens),
+                    attributes={**base_attrs, GenAIAttributes.GEN_AI_TOKEN_TYPE: "output"},
+                )
♻️ Duplicate comments (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)

2-4: Long-term direction: prefer using incubating attributes directly in instrumentation.

You’re already moving callers to import gen_ai_attributes directly (good). Keeping these re-exports for now is fine, but plan to deprecate/remove once all instrumentations rely on the incubating package.

Also applies to: 40-78

🧹 Nitpick comments (40)
packages/opentelemetry-instrumentation-vertexai/tests/disabled_test_gemini.py (3)

35-36: Migration to GEN_AI_ keys looks correct; consider referencing SpanAttributes for stability*

The key renames to GEN_AI_* align with the PR objective and the semconv exports. To reduce direct coupling to the incubating OTel module and be consistent with the repository convention referenced by @nirga, prefer accessing these keys via SpanAttributes. This keeps tests resilient if the underlying source of the constants changes.

Apply this diff within the selected ranges:

-        in vertexai_span.attributes[f"{GenAIAttributes.GEN_AI_PROMPT}.0.user"]
+        in vertexai_span.attributes[f"{SpanAttributes.GEN_AI_PROMPT}.0.user"]
@@
-        vertexai_span.attributes[GenAIAttributes.GEN_AI_REQUEST_MODEL]
+        vertexai_span.attributes[SpanAttributes.GEN_AI_REQUEST_MODEL]
@@
-        vertexai_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS]
+        vertexai_span.attributes[SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS]
@@
-        vertexai_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS]
+        vertexai_span.attributes[SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS]
@@
-        vertexai_span.attributes[f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content"]
+        vertexai_span.attributes[f"{SpanAttributes.GEN_AI_COMPLETION}.0.content"]

Also applies to: 38-39, 46-48, 50-52, 54-56


86-89: Keep attribute access consistent via SpanAttributes

Same rationale as earlier: swap direct GenAIAttributes usage for SpanAttributes to avoid tying tests to the incubating import path.

-        vertexai_span.attributes[GenAIAttributes.GEN_AI_REQUEST_MODEL]
+        vertexai_span.attributes[SpanAttributes.GEN_AI_REQUEST_MODEL]
@@
-        vertexai_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS]
+        vertexai_span.attributes[SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS]
@@
-        vertexai_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS]
+        vertexai_span.attributes[SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS]

Also applies to: 95-97, 99-101


146-149: Use SpanAttributes for GEN_AI_ keys in the third test as well*

For consistency across the file, mirror the same change here.

-        vertexai_span.attributes[GenAIAttributes.GEN_AI_REQUEST_MODEL]
+        vertexai_span.attributes[SpanAttributes.GEN_AI_REQUEST_MODEL]
@@
-        vertexai_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS]
+        vertexai_span.attributes[SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS]
@@
-        vertexai_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS]
+        vertexai_span.attributes[SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS]

Also applies to: 155-157, 159-161

packages/opentelemetry-instrumentation-ollama/tests/test_generation.py (2)

23-36: Drop redundant f-strings around constant keys

f"{GenAIAttributes.GEN_AI_SYSTEM}" and similar are already strings; f-strings add noise without benefit. Prefer passing constants directly for clarity and to avoid accidental formatting bugs.

Example for this block:

-    assert ollama_span.attributes.get(f"{GenAIAttributes.GEN_AI_SYSTEM}") == "ollama"
+    assert ollama_span.attributes.get(GenAIAttributes.GEN_AI_SYSTEM) == "ollama"
@@
-    assert (
-        ollama_span.attributes.get(f"{SpanAttributes.LLM_REQUEST_TYPE}") == "completion"
-    )
+    assert ollama_span.attributes.get(SpanAttributes.LLM_REQUEST_TYPE) == "completion"
-    assert not ollama_span.attributes.get(f"{SpanAttributes.LLM_IS_STREAMING}")
+    assert not ollama_span.attributes.get(SpanAttributes.LLM_IS_STREAMING)
-    assert ollama_span.attributes.get(f"{GenAIAttributes.GEN_AI_REQUEST_MODEL}") == "llama3"
+    assert ollama_span.attributes.get(GenAIAttributes.GEN_AI_REQUEST_MODEL) == "llama3"
-    assert (
-        ollama_span.attributes.get(f"{GenAIAttributes.GEN_AI_PROMPT}.0.content")
-        == "Tell me a joke about OpenTelemetry"
-    )
+    assert ollama_span.attributes.get(f"{GenAIAttributes.GEN_AI_PROMPT}.0.content") \
+        == "Tell me a joke about OpenTelemetry"
-    assert (
-        ollama_span.attributes.get(f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content")
-        == response["response"]
-    )
+    assert ollama_span.attributes.get(f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content") \
+        == response["response"]

Apply the same pattern across the file.


74-92: Reduce flakiness: don’t assume log ordering; assert by event name instead

Relying on logs[0]/logs[1] ordering can be brittle across exporters. Resolve by selecting logs by EventAttributes.EVENT_NAME.

For example:

logs = log_exporter.get_finished_logs()
by_name = {log.log_record.attributes.get(EventAttributes.EVENT_NAME): log for log in logs}
assert_message_in_logs(by_name["gen_ai.user.message"], "gen_ai.user.message", {...})
assert_message_in_logs(by_name["gen_ai.choice"], "gen_ai.choice", choice_event)

Also applies to: 116-129, 202-219, 250-263, 332-349, 378-391, 468-485, 518-531

packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/span_utils.py (2)

9-14: Consolidate GenAI imports for consistency

Minor nit: Importing GEN_AI_RESPONSE_ID from the submodule while importing gen_ai_attributes from the package root is slightly inconsistent. If the aggregator re-exports the constant, prefer a single import source for readability; otherwise, this is fine to keep as-is.


50-58: Support multiple max-tokens parameter names (Groq typically uses max_tokens)

Using only max_tokens_to_sample is Anthropic-specific. Groq/OpenAI-style clients generally use max_tokens. Consider a tolerant fallback to capture whichever the upstream SDK provides.

Apply this diff:

-    set_span_attribute(
-        span, GenAIAttributes.GEN_AI_REQUEST_MAX_TOKENS, kwargs.get("max_tokens_to_sample")
-    )
+    set_span_attribute(
+        span,
+        GenAIAttributes.GEN_AI_REQUEST_MAX_TOKENS,
+        kwargs.get("max_tokens") or kwargs.get("max_output_tokens") or kwargs.get("max_tokens_to_sample"),
+    )
packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/custom_llm_instrumentor.py (6)

10-12: Avoid depending on incubating internals; prefer SpanAttributes for stability

Using opentelemetry.semconv._incubating.attributes tightly couples us to an internal path that can change. We already import SpanAttributes, which exposes the same constants and is the stable facade used across the repo.

Apply this diff to drop the direct incubating import:

-from opentelemetry.semconv._incubating.attributes import (
-    gen_ai_attributes as GenAIAttributes,
-)

I’ve proposed replacements below to use SpanAttributes.* where GenAIAttributes.* is referenced.


148-151: Use SpanAttributes. instead of GenAIAttributes.**

Stay consistent with the rest of the codebase and reduce direct dependency on incubating internals.

-    _set_span_attribute(span, GenAIAttributes.GEN_AI_SYSTEM, instance.__class__.__name__)
+    _set_span_attribute(span, SpanAttributes.GEN_AI_SYSTEM, instance.__class__.__name__)
     _set_span_attribute(span, SpanAttributes.LLM_REQUEST_TYPE, llm_request_type.value)
     _set_span_attribute(
-        span, GenAIAttributes.GEN_AI_REQUEST_MODEL, instance.metadata.model_name
+        span, SpanAttributes.GEN_AI_REQUEST_MODEL, instance.metadata.model_name
     )

176-178: Use SpanAttributes for response model attribute

Consistency with the rest of the instrumentation and avoidance of incubating internals.

     _set_span_attribute(
-        span, GenAIAttributes.GEN_AI_RESPONSE_MODEL, instance.metadata.model_name
+        span, SpanAttributes.GEN_AI_RESPONSE_MODEL, instance.metadata.model_name
     )

180-184: Flatten nested condition and use SpanAttributes in completion content

Minor readability nit and consistency improvement.

-    if should_send_prompts():
-        if llm_request_type == LLMRequestTypeValues.COMPLETION:
-            _set_span_attribute(
-                span, f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content", response.text
-            )
+    if should_send_prompts() and llm_request_type == LLMRequestTypeValues.COMPLETION:
+        _set_span_attribute(
+            span, f"{SpanAttributes.GEN_AI_COMPLETION}.0.content", response.text
+        )

160-163: Flatten nested condition (ruff SIM102)

Combine the conditions to reduce nesting and match ruff’s SIM102 recommendation.

-    if should_send_prompts():
-        # TODO: add support for chat
-        if llm_request_type == LLMRequestTypeValues.COMPLETION:
+    # TODO: add support for chat
+    if should_send_prompts() and llm_request_type == LLMRequestTypeValues.COMPLETION:

167-169: Use SpanAttributes for prompt attribute; keep .0.user suffix

The .0.user suffix is consistent with how COM­PLE­TION requests are instrumented elsewhere. Swap in the stable SpanAttributes constant for the prefix:

Locations to update:

  • packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/custom_llm_instrumentor.py around lines 167–169

Recommended diff:

-                    f"{GenAIAttributes.GEN_AI_PROMPT}.0.user",
+                    f"{SpanAttributes.GEN_AI_PROMPT}.0.user",
packages/opentelemetry-instrumentation-mistralai/tests/test_chat.py (4)

43-48: Make total token assertion resilient when a usage attribute is missing

If either input or output token usage is absent (common in some async/streaming providers), the current sum will raise a TypeError (None + int). Default missing token attributes to 0 in the equality check to keep tests robust while still validating the accounting.

Apply this pattern at each occurrence:

-    ) == mistral_span.attributes.get(
-        GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS
-    ) + mistral_span.attributes.get(GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS)
+    ) == (
+        (mistral_span.attributes.get(GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS) or 0)
+        + (mistral_span.attributes.get(GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS) or 0)
+    )

Also applies to: 80-86, 133-139, 195-200, 236-242, 293-299, 353-358, 393-398, 447-452, 506-511, 546-551, 602-607


496-496: Use the GenAI constant for the request model key for consistency

Avoid hardcoded strings for attribute keys; use the same GenAIAttributes constant you use elsewhere in this file.

-    assert mistral_span.attributes.get("gen_ai.request.model") == "mistral-tiny"
+    assert mistral_span.attributes.get(GenAIAttributes.GEN_AI_REQUEST_MODEL) == "mistral-tiny"

Apply at all three locations.

Also applies to: 544-544, 600-600


351-353: Fix copy/paste in comment and clarify provider

These async tests are for Mistral, not Ollama. Also, consider keeping the comment but ensure the total-token assertion won’t fail when prompt tokens are missing (see prior suggestion).

-    # For some reason, async ollama chat doesn't report prompt token usage back
+    # For some reason, async Mistral chat sometimes doesn't report prompt (input) token usage back

Also applies to: 391-393, 445-447


27-27: Avoid unnecessary f-strings around attribute key constants

f"{GenAIAttributes.GEN_AI_SYSTEM}" is identical to GenAIAttributes.GEN_AI_SYSTEM. Using the constant directly is cleaner and avoids accidental formatting bugs.

Example edit:

-    assert mistral_span.attributes.get(f"{GenAIAttributes.GEN_AI_SYSTEM}") == "MistralAI"
+    assert mistral_span.attributes.get(GenAIAttributes.GEN_AI_SYSTEM) == "MistralAI"

Apply similarly to GEN_AI_REQUEST_MODEL, GEN_AI_PROMPT, GEN_AI_COMPLETION, etc.

Also applies to: 31-31, 35-35, 39-39, 73-73, 77-77, 183-183, 187-187, 191-191, 229-229, 233-233, 286-286, 290-290, 336-336, 340-340, 343-343, 348-348, 384-384, 388-388, 438-438, 442-442, 493-493, 498-498, 502-502, 541-541, 597-597

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py (2)

151-163: Prefer enum value for system attribute to avoid magic strings

Use the GenAI system enum for consistency with other tests and modules.

Apply this diff in both places:

-        _set_span_attribute(
-            span,
-            GenAIAttributes.GEN_AI_SYSTEM,
-            "openai",
-        )
+        _set_span_attribute(
+            span,
+            GenAIAttributes.GEN_AI_SYSTEM,
+            GenAIAttributes.GenAiSystemValues.OPENAI.value,
+        )

Also applies to: 271-279


176-185: Inconsistent event emission for run instructions; guard with should_emit_events()

Unlike the assistant-instructions block above, this unconditionally emits an event and also sets span attributes. Align with the pattern: emit events when enabled, otherwise set span attributes, not both.

Apply this refactor:

-    _set_span_attribute(
-        span, f"{GenAIAttributes.GEN_AI_PROMPT}.{prompt_index}.role", "system"
-    )
-    _set_span_attribute(
-        span,
-        f"{GenAIAttributes.GEN_AI_PROMPT}.{prompt_index}.content",
-        run["instructions"],
-    )
-    emit_event(MessageEvent(content=run["instructions"], role="system"))
-    prompt_index += 1
+    if should_emit_events():
+        emit_event(MessageEvent(content=run["instructions"], role="system"))
+    else:
+        _set_span_attribute(
+            span, f"{GenAIAttributes.GEN_AI_PROMPT}.{prompt_index}.role", "system"
+        )
+        _set_span_attribute(
+            span,
+            f"{GenAIAttributes.GEN_AI_PROMPT}.{prompt_index}.content",
+            run["instructions"],
+        )
+        prompt_index += 1
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (1)

180-186: Guard usage attribute access for older SDKs
When RESPONSES_AVAILABLE is False, traced_response.usage is a plain Dict[str, Any], so expressions like usage.input_tokens will raise an attribute error. To ensure compatibility, normalize usage via model_as_dict or switch to dict-style access.

Pay special attention to these lines in
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

  • 182: usage.input_tokens
  • 183: usage.output_tokens
  • 185: usage.total_tokens
  • 187: usage.input_tokens_details
  • 191: usage.input_tokens_details.cached_tokens

For example:

if usage := traced_response.usage:
    usage_dict = model_as_dict(usage)
    _set_span_attribute(span, GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS, usage_dict.get("input_tokens"))
    …
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (1)

736-874: Minor duplication: consider extracting shared post-stream handling.

_build_from_streaming_response and _abuild_from_streaming_response duplicate logic for shared attributes, token/choice/duration metrics, response attribute setting, and span closure. Optional: extract a small helper to reduce duplication and future drift.

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (1)

115-131: Set GEN_AI_SYSTEM using canonical enum value (when available).

Right now vendor values like "openai" (lowercase) or "AWS" are set directly. The spec exposes canonical system enum values (e.g., GenAiSystemValues.OPENAI). To avoid casing drift and improve consistency across providers, normalize to the enum value when present and fall back to the raw string.

Apply this diff in-place:

-    _set_span_attribute(span, GenAIAttributes.GEN_AI_SYSTEM, vendor)
+    # Normalize to spec enum value when available; fallback to string
+    system_enum = getattr(getattr(GenAIAttributes, "GenAiSystemValues", object()), vendor.upper(), None)
+    _set_span_attribute(
+        span,
+        GenAIAttributes.GEN_AI_SYSTEM,
+        system_enum.value if system_enum is not None else vendor,
+    )

Optionally, to keep the call-site lean, add a small helper and use it here:

# Define once near other helpers in this module (outside this diff hunk)
def _normalize_gen_ai_system(vendor: str) -> str:
    enum_cls = getattr(GenAIAttributes, "GenAiSystemValues", None)
    if enum_cls:
        maybe = getattr(enum_cls, vendor.upper(), None)
        if maybe:
            return maybe.value
    return vendor

Then change the set call to:

_set_span_attribute(span, GenAIAttributes.GEN_AI_SYSTEM, _normalize_gen_ai_system(vendor))
packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py (1)

142-145: Inline the generations selection (Ruff SIM108).

Minor readability nit: prefer a ternary expression.

-    if hasattr(response, "generations"):
-        generations = response.generations  # Cohere v5
-    else:
-        generations = response  # Cohere v4
+    generations = response.generations if hasattr(response, "generations") else response  # Cohere v5 vs v4
packages/opentelemetry-instrumentation-cohere/tests/test_rerank.py (1)

41-45: Prefer enum value for GEN_AI_SYSTEM in assertions.

Use the spec enum to avoid casing drift and keep tests resilient to normalization changes.

-    assert cohere_span.attributes.get(GenAIAttributes.GEN_AI_SYSTEM) == "Cohere"
+    assert cohere_span.attributes.get(GenAIAttributes.GEN_AI_SYSTEM) == GenAIAttributes.GenAiSystemValues.COHERE.value

Repeat similarly for the other two occurrences in this file.

Also applies to: 121-126, 185-190

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)

42-78: Style: normalize spacing around '=' for readability (PEP 8).

Several assignments have inconsistent spacing. Not a blocker, but easy to tidy.

-    GEN_AI_AGENT_DESCRIPTION=GenAIAttributes.GEN_AI_AGENT_DESCRIPTION
-    GEN_AI_AGENT_ID= GenAIAttributes.GEN_AI_AGENT_ID
-    GEN_AI_AGENT_NAME= GenAIAttributes.GEN_AI_AGENT_NAME
-    GEN_AI_COMPLETION=GenAIAttributes.GEN_AI_COMPLETION
-    GEN_AI_CONVERSATION_ID=GenAIAttributes.GEN_AI_CONVERSATION_ID
-    GEN_AI_DATA_SOURCE_ID=GenAIAttributes.GEN_AI_DATA_SOURCE_ID
-    GEN_AI_OPENAI_REQUEST_SERVICE_TIER=GenAIAttributes.GEN_AI_OPENAI_REQUEST_SERVICE_TIER
-    GEN_AI_OPENAI_RESPONSE_SERVICE_TIER=GenAIAttributes.GEN_AI_OPENAI_RESPONSE_SERVICE_TIER
-    GEN_AI_OPENAI_RESPONSE_SYSTEM_FINGERPRINT=GenAIAttributes.GEN_AI_OPENAI_RESPONSE_SYSTEM_FINGERPRINT
-    GEN_AI_OUTPUT_TYPE=GenAIAttributes.GEN_AI_OUTPUT_TYPE
-    GEN_AI_PROMPT=GenAIAttributes.GEN_AI_PROMPT
-    GEN_AI_REQUEST_CHOICE_COUNT=GenAIAttributes.GEN_AI_REQUEST_CHOICE_COUNT
-    GEN_AI_REQUEST_ENCODING_FORMATS= GenAIAttributes.GEN_AI_REQUEST_ENCODING_FORMATS
-    GEN_AI_REQUEST_FREQUENCY_PENALTY= GenAIAttributes.GEN_AI_REQUEST_FREQUENCY_PENALTY
-    GEN_AI_REQUEST_MAX_TOKENS=GenAIAttributes.GEN_AI_REQUEST_MAX_TOKENS
-    GEN_AI_REQUEST_MODEL=GenAIAttributes.GEN_AI_REQUEST_MODEL
-    GEN_AI_REQUEST_PRESENCE_PENALTY=GenAIAttributes.GEN_AI_REQUEST_PRESENCE_PENALTY
-    GEN_AI_REQUEST_SEED=GenAIAttributes.GEN_AI_REQUEST_SEED
-    GEN_AI_REQUEST_STOP_SEQUENCES=GenAIAttributes.GEN_AI_REQUEST_STOP_SEQUENCES
-    GEN_AI_REQUEST_TEMPERATURE=GenAIAttributes.GEN_AI_REQUEST_TEMPERATURE
-    GEN_AI_REQUEST_TOP_K=GenAIAttributes.GEN_AI_REQUEST_TOP_K
-    GEN_AI_REQUEST_TOP_P=GenAIAttributes.GEN_AI_REQUEST_TOP_P
+    GEN_AI_AGENT_DESCRIPTION = GenAIAttributes.GEN_AI_AGENT_DESCRIPTION
+    GEN_AI_AGENT_ID = GenAIAttributes.GEN_AI_AGENT_ID
+    GEN_AI_AGENT_NAME = GenAIAttributes.GEN_AI_AGENT_NAME
+    GEN_AI_COMPLETION = GenAIAttributes.GEN_AI_COMPLETION
+    GEN_AI_CONVERSATION_ID = GenAIAttributes.GEN_AI_CONVERSATION_ID
+    GEN_AI_DATA_SOURCE_ID = GenAIAttributes.GEN_AI_DATA_SOURCE_ID
+    GEN_AI_OPENAI_REQUEST_SERVICE_TIER = GenAIAttributes.GEN_AI_OPENAI_REQUEST_SERVICE_TIER
+    GEN_AI_OPENAI_RESPONSE_SERVICE_TIER = GenAIAttributes.GEN_AI_OPENAI_RESPONSE_SERVICE_TIER
+    GEN_AI_OPENAI_RESPONSE_SYSTEM_FINGERPRINT = GenAIAttributes.GEN_AI_OPENAI_RESPONSE_SYSTEM_FINGERPRINT
+    GEN_AI_OUTPUT_TYPE = GenAIAttributes.GEN_AI_OUTPUT_TYPE
+    GEN_AI_PROMPT = GenAIAttributes.GEN_AI_PROMPT
+    GEN_AI_REQUEST_CHOICE_COUNT = GenAIAttributes.GEN_AI_REQUEST_CHOICE_COUNT
+    GEN_AI_REQUEST_ENCODING_FORMATS = GenAIAttributes.GEN_AI_REQUEST_ENCODING_FORMATS
+    GEN_AI_REQUEST_FREQUENCY_PENALTY = GenAIAttributes.GEN_AI_REQUEST_FREQUENCY_PENALTY
+    GEN_AI_REQUEST_MAX_TOKENS = GenAIAttributes.GEN_AI_REQUEST_MAX_TOKENS
+    GEN_AI_REQUEST_MODEL = GenAIAttributes.GEN_AI_REQUEST_MODEL
+    GEN_AI_REQUEST_PRESENCE_PENALTY = GenAIAttributes.GEN_AI_REQUEST_PRESENCE_PENALTY
+    GEN_AI_REQUEST_SEED = GenAIAttributes.GEN_AI_REQUEST_SEED
+    GEN_AI_REQUEST_STOP_SEQUENCES = GenAIAttributes.GEN_AI_REQUEST_STOP_SEQUENCES
+    GEN_AI_REQUEST_TEMPERATURE = GenAIAttributes.GEN_AI_REQUEST_TEMPERATURE
+    GEN_AI_REQUEST_TOP_K = GenAIAttributes.GEN_AI_REQUEST_TOP_K
+    GEN_AI_REQUEST_TOP_P = GenAIAttributes.GEN_AI_REQUEST_TOP_P
packages/opentelemetry-instrumentation-replicate/opentelemetry/instrumentation/replicate/__init__.py (1)

124-132: Use canonical enum for GEN_AI_SYSTEM (if available).

Minor consistency tweak: prefer the spec enum over a raw string.

-            GenAIAttributes.GEN_AI_SYSTEM: "Replicate",
+            GenAIAttributes.GEN_AI_SYSTEM: GenAIAttributes.GenAiSystemValues.REPLICATE.value,
packages/opentelemetry-instrumentation-langchain/tests/test_tool_calls.py (1)

59-67: Nit: reduce repetition when building attribute keys.

The repeated f-strings for keys like f"{GenAIAttributes.GEN_AI_PROMPT}.N.role/content" and f"{GenAIAttributes.GEN_AI_COMPLETION}.M.tool_calls.K.*" can be slightly DRYed in tests with small helpers (e.g., prompt_key(i, suffix), completion_tool_call_key(i, j, suffix)). Not required, but would improve readability.

packages/opentelemetry-instrumentation-anthropic/tests/utils.py (1)

60-63: Optional: use the constant for system attribute for consistency.

You assert against the literal "gen_ai.system" key; consider using GenAIAttributes.GEN_AI_SYSTEM as elsewhere in the repo.

Apply this diff:

-                assert all(
-                    data_point.attributes.get("gen_ai.system") == "anthropic"
-                    for data_point in metric.data.data_points
-                )
+                assert all(
+                    data_point.attributes.get(GenAIAttributes.GEN_AI_SYSTEM) == "anthropic"
+                    for data_point in metric.data.data_points
+                )
packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_guardrails_metrics.py (1)

126-129: Safer assertion on first datapoint access.

Indexing metric.data.data_points[0] assumes at least one datapoint; guard it to avoid potential IndexError in edge cases.

Apply this diff:

-                assert (
-                    metric.data.data_points[0].attributes[GenAIAttributes.GEN_AI_SYSTEM]
-                    == "bedrock"
-                )
+                # Ensure at least one datapoint exists before indexing
+                assert metric.data.data_points, f"No datapoints for metric {metric.name}"
+                assert (
+                    metric.data.data_points[0].attributes[GenAIAttributes.GEN_AI_SYSTEM]
+                    == "bedrock"
+                )
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (2)

185-191: Use SpanAttributes for cache token attributes to avoid potential AttributeError

GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS and GEN_AI_USAGE_CACHE_CREATION_INPUT_TOKENS are guaranteed on SpanAttributes (string constants). Depending on your semconv version, they may not exist on GenAIAttributes. For consistency with streaming.py and to avoid runtime attribute errors, use SpanAttributes here too.

Proposed change:

-    set_span_attribute(
-        span, GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS, cache_read_tokens
-    )
+    set_span_attribute(
+        span, SpanAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS, cache_read_tokens
+    )
-    set_span_attribute(
-        span,
-        GenAIAttributes.GEN_AI_USAGE_CACHE_CREATION_INPUT_TOKENS,
-        cache_creation_tokens,
-    )
+    set_span_attribute(
+        span,
+        SpanAttributes.GEN_AI_USAGE_CACHE_CREATION_INPUT_TOKENS,
+        cache_creation_tokens,
+    )

And likewise for the synchronous path:

-    set_span_attribute(
-        span, GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS, cache_read_tokens
-    )
+    set_span_attribute(
+        span, SpanAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS, cache_read_tokens
+    )
-    set_span_attribute(
-        span,
-        GenAIAttributes.GEN_AI_USAGE_CACHE_CREATION_INPUT_TOKENS,
-        cache_creation_tokens,
-    )
+    set_span_attribute(
+        span,
+        SpanAttributes.GEN_AI_USAGE_CACHE_CREATION_INPUT_TOKENS,
+        cache_creation_tokens,
+    )

Also applies to: 276-283


101-110: Nit: avoid mutable default arguments

metric_attributes: dict = {} creates a shared default across calls. Prefer None default and initialize inside.

Minimal patch:

-async def _aset_token_usage(..., metric_attributes: dict = {}, token_histogram: Histogram = None, choice_counter: Counter = None):
+async def _aset_token_usage(..., metric_attributes: dict | None = None, token_histogram: Histogram | None = None, choice_counter: Counter | None = None):
+    metric_attributes = metric_attributes or {}
 
-def _set_token_usage(..., metric_attributes: dict = {}, token_histogram: Histogram = None, choice_counter: Counter = None):
+def _set_token_usage(..., metric_attributes: dict | None = None, token_histogram: Histogram | None = None, choice_counter: Counter | None = None):
+    metric_attributes = metric_attributes or {}

Also applies to: 194-203

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py (1)

105-112: Prefer isinstance over direct type comparison for ints

type(x) is int is brittle; use isinstance(x, int). Same for completion_tokens.

Apply:

-    if token_histogram and type(input_tokens) is int and input_tokens >= 0:
+    if token_histogram and isinstance(input_tokens, int) and input_tokens >= 0:
...
-    if token_histogram and type(completion_tokens) is int and completion_tokens >= 0:
+    if token_histogram and isinstance(completion_tokens, int) and completion_tokens >= 0:

Also applies to: 114-121

packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py (3)

1294-1299: Prefer constants over string literals for usage keys (and consider aligning total_tokens constant)

Use GenAIAttributes/SpanAttributes constants instead of raw strings to reduce drift and ease refactors. Also prefer the SpanAttributes constant over "llm.usage.total_tokens" literal.

Apply these diffs:

@@
-    assert anthropic_span.attributes["gen_ai.usage.input_tokens"] == 514
-    assert (
-        anthropic_span.attributes["gen_ai.usage.output_tokens"]
-        + anthropic_span.attributes["gen_ai.usage.input_tokens"]
-        == anthropic_span.attributes["llm.usage.total_tokens"]
-    )
+    assert anthropic_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 514
+    assert (
+        anthropic_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS]
+        + anthropic_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS]
+        == anthropic_span.attributes[SpanAttributes.LLM_USAGE_TOTAL_TOKENS]
+    )
@@
-    assert anthropic_span.attributes["gen_ai.usage.input_tokens"] == 514
-    assert (
-        anthropic_span.attributes["gen_ai.usage.output_tokens"]
-        + anthropic_span.attributes["gen_ai.usage.input_tokens"]
-        == anthropic_span.attributes["llm.usage.total_tokens"]
-    )
+    assert anthropic_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 514
+    assert (
+        anthropic_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS]
+        + anthropic_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS]
+        == anthropic_span.attributes[SpanAttributes.LLM_USAGE_TOTAL_TOKENS]
+    )
@@
-    assert anthropic_span.attributes["gen_ai.usage.input_tokens"] == 514
-    assert (
-        anthropic_span.attributes["gen_ai.usage.output_tokens"]
-        + anthropic_span.attributes["gen_ai.usage.input_tokens"]
-        == anthropic_span.attributes["llm.usage.total_tokens"]
-    )
+    assert anthropic_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 514
+    assert (
+        anthropic_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS]
+        + anthropic_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS]
+        == anthropic_span.attributes[SpanAttributes.LLM_USAGE_TOTAL_TOKENS]
+    )

If verification (see prior comment) confirms a GenAI total tokens attribute is available and set by the instrumentation, switch the right-hand side to GenAIAttributes.GEN_AI_USAGE_TOTAL_TOKENS instead of SpanAttributes.LLM_USAGE_TOTAL_TOKENS.

Also applies to: 1469-1474, 1605-1610


64-66: Nit: Use direct indexing instead of .get for required attributes to fail fast

Using dict-style indexing will surface missing attributes as KeyError instead of silently returning None, making failures more actionable in tests.

Example change (apply similarly to other occurrences):

-    assert (
-        anthropic_span.attributes.get(f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content")
-        == response.content[0].text
-    )
+    assert (
+        anthropic_span.attributes[f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content"]
+        == response.content[0].text
+    )

Also applies to: 247-249, 528-529, 713-714, 900-901, 1078-1079


71-76: Optional: factor usage-token sum assertion into a small helper to reduce duplication

This assertion pattern repeats throughout the file. A tiny helper improves readability and maintenance.

Add once near the bottom (outside current ranges):

def assert_usage_sums(span):
    assert span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] >= 0
    assert span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] >= 0
    assert (
        span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS]
        + span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS]
        == span.attributes[SpanAttributes.LLM_USAGE_TOTAL_TOKENS]
    )

And replace here as:

-    assert (
-        anthropic_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS]
-        + anthropic_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS]
-        == anthropic_span.attributes[SpanAttributes.LLM_USAGE_TOTAL_TOKENS]
-    )
+    assert_usage_sums(anthropic_span)

If we later migrate total_tokens to GenAI, the change is centralized.

packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/__init__.py (3)

135-146: Prompt attribute keys migrated to GEN_AI_PROMPT — avoid shadowing built-in input

The rename to gen-ai prompt keys is correct. Minor nit: the loop variable name input shadows the built-in. Rename to avoid confusion.

Apply this diff:

-        for index, input in enumerate(prompt):
+        for index, message in enumerate(prompt):
             _set_span_attribute(
                 span,
-                f"{GenAIAttributes.GEN_AI_PROMPT}.{index}.user",
-                input,
+                f"{GenAIAttributes.GEN_AI_PROMPT}.{index}.user",
+                message,
             )

563-570: Use consistent system casing ("watsonx") to match metric_shared_attributes

Start span sets GEN_AI_SYSTEM to "Watsonx" (capitalized), while metrics use "watsonx". Suggest using lowercase across the board.

Apply this diff:

-        attributes={
-            GenAIAttributes.GEN_AI_SYSTEM: "Watsonx",
+        attributes={
+            GenAIAttributes.GEN_AI_SYSTEM: "watsonx",
             SpanAttributes.LLM_REQUEST_TYPE: LLMRequestTypeValues.COMPLETION.value,
         },

442-445: Unify GEN_AI_SYSTEM value casing

Confirmed inconsistent values in this file:

  • packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/init.py:443 uses "watsonx"
  • packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/init.py:567 uses "Watsonx"

To avoid metric/attribute fragmentation, pick a single canonical casing (e.g. "Watsonx") and update the lowercase occurrence:

--- a/packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/__init__.py
+++ b/packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/__init__.py
@@ -440,7 +440,7 @@ def _build_attributes(...):
         GenAIAttributes.GEN_AI_RESPONSE_MODEL: response_model,
-        GenAIAttributes.GEN_AI_SYSTEM: "watsonx",
+        GenAIAttributes.GEN_AI_SYSTEM: "Watsonx",
         "stream": is_streaming,
     }
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (2)

517-523: Confirm prompt attribute structure vs. semconv; indexing may cause high cardinality

You emit ...GEN_AI_PROMPT.{i}.role/content. Many semconvs prefer arrays (e.g., ...prompt.roles: [], ...prompt.contents: []) over dynamic keys. Verify this matches the GenAI spec you’re targeting; if not, consider array attributes to reduce attribute-cardinality.

Would you like me to propose an alternative that emits arrays instead of index-suffixed keys?


426-439: Use the set_span_attribute helper for GenAI tool fields

All direct calls to span.set_attribute on tool-specific completion attributes should use the set_span_attribute(span, key, value) helper so that null or empty values are skipped.

• File: packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/init.py
– Lines 426–439 (FunctionTool fields)
– Lines 471–501 (WebSearchTool, FileSearchTool, ComputerTool fields)

Example diff:

@@ -426,7 +426,7 @@
-   span.set_attribute(
-       f"{GenAIAttributes.GEN_AI_COMPLETION}.tool.name", tool_name
-   )
+   set_span_attribute(
+       span, f"{GenAIAttributes.GEN_AI_COMPLETION}.tool.name", tool_name
+   )

@@ -430,7 +430,7 @@
-   span.set_attribute(
-       f"{GenAIAttributes.GEN_AI_COMPLETION}.tool.type", "FunctionTool"
-   )
+   set_span_attribute(
+       span, f"{GenAIAttributes.GEN_AI_COMPLETION}.tool.type", "FunctionTool"
+   )

...and so on for `.tool.description` and `.tool.strict_json_schema`, and likewise in the 471–501 block.

Note: the current semconv package does not yet define dedicated constants for these tool-specific attributes (e.g. GEN_AI_COMPLETION_TOOL_NAME), so formatted strings are required for now.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (1)

225-238: Bug: type check uses is str instead of isinstance (will mis-handle content)

if generation.message.content is str will almost always be false and cause strings to be JSON-encoded (quoted) incorrectly. Flake8 E721 also flags this.

Apply this fix:

-                if generation.message.content is str:
-                    _set_span_attribute(
-                        span,
-                        f"{prefix}.content",
-                        generation.message.content,
-                    )
-                else:
-                    _set_span_attribute(
-                        span,
-                        f"{prefix}.content",
-                        json.dumps(
-                            generation.message.content, cls=CallbackFilteredJSONEncoder
-                        ),
-                    )
+                content = generation.message.content
+                if isinstance(content, str):
+                    _set_span_attribute(
+                        span,
+                        f"{prefix}.content",
+                        content,
+                    )
+                else:
+                    _set_span_attribute(
+                        span,
+                        f"{prefix}.content",
+                        json.dumps(content, cls=CallbackFilteredJSONEncoder),
+                    )
♻️ Duplicate comments (2)
packages/opentelemetry-instrumentation-anthropic/tests/utils.py (1)

1-3: Import fix resolves prior F821 (undefined name) for GenAIAttributes

Good catch adding the incubating GenAIAttributes import; this unblocks lint/tests and aligns with PR goals.

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (1)

181-193: Back-compat bug: usage fields may be dicts; attribute access will raise; also support legacy prompt/completion token names

On older OpenAI SDKs (RESPONSES_AVAILABLE=False), usage and input_tokens_details are dicts. Accessing usage.input_tokens / usage.input_tokens_details.cached_tokens will raise. Also, some providers still emit prompt_tokens/completion_tokens. Handle both object and dict shapes and map legacy names.

Apply this diff:

-    if usage := traced_response.usage:
-        _set_span_attribute(span, GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS, usage.input_tokens)
-        _set_span_attribute(span, GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, usage.output_tokens)
-        _set_span_attribute(
-            span, SpanAttributes.LLM_USAGE_TOTAL_TOKENS, usage.total_tokens
-        )
-        if usage.input_tokens_details:
-            _set_span_attribute(
-                span,
-                GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS,
-                usage.input_tokens_details.cached_tokens,
-            )
+    if usage := traced_response.usage:
+        # usage can be a pydantic model or a plain dict (older SDKs)
+        is_dict = isinstance(usage, dict)
+
+        input_tokens = (usage.get("input_tokens") if is_dict else getattr(usage, "input_tokens", None))
+        # Back-compat: some SDKs used 'prompt_tokens'
+        if input_tokens is None and is_dict:
+            input_tokens = usage.get("prompt_tokens")
+        _set_span_attribute(span, GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS, input_tokens)
+
+        output_tokens = (usage.get("output_tokens") if is_dict else getattr(usage, "output_tokens", None))
+        # Back-compat: some SDKs used 'completion_tokens'
+        if output_tokens is None and is_dict:
+            output_tokens = usage.get("completion_tokens")
+        _set_span_attribute(span, GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, output_tokens)
+
+        total_tokens = (usage.get("total_tokens") if is_dict else getattr(usage, "total_tokens", None))
+        _set_span_attribute(span, SpanAttributes.LLM_USAGE_TOTAL_TOKENS, total_tokens)
+
+        input_tokens_details = (usage.get("input_tokens_details") if is_dict else getattr(usage, "input_tokens_details", None))
+        if input_tokens_details:
+            details_is_dict = isinstance(input_tokens_details, dict)
+            cached_tokens = (
+                input_tokens_details.get("cached_tokens") if details_is_dict
+                else getattr(input_tokens_details, "cached_tokens", None)
+            )
+            _set_span_attribute(
+                span,
+                GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS,
+                cached_tokens,
+            )
🧹 Nitpick comments (5)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)

2-4: Fix F401: GenAIAttributes imported but unused (make it an explicit re-export).

Linters (Ruff/Flake8) flag this as unused. If the intent is to re-export GenAIAttributes from this package, annotate the import with a noqa or add it to all.

Apply this minimal diff to silence F401 while keeping the public alias:

-from opentelemetry.semconv._incubating.attributes import (
-    gen_ai_attributes as GenAIAttributes,
-)
+from opentelemetry.semconv._incubating.attributes import (
+    gen_ai_attributes as GenAIAttributes,  # noqa: F401 - re-exported
+)

Optionally, also define all to make the re-export explicit:

__all__ = [
    "Meters",
    "SpanAttributes",
    "Events",
    "EventAttributes",
    "LLMRequestTypeValues",
    "TraceloopSpanKindValues",
    "GenAIAttributes",
]
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (3)

76-79: Question: Should response model be set at request time?

You set both request and response model to the same value during request parameter extraction. If the response model can differ (provider-side routing), consider deferring setting the response model until a response is available to avoid misleading spans.

Apply this minimal change if you prefer to avoid setting response model at request time:

 _set_span_attribute(span, GenAIAttributes.GEN_AI_REQUEST_MODEL, model)
 # response is not available for LLM requests (as opposed to chat)
-_set_span_attribute(span, GenAIAttributes.GEN_AI_RESPONSE_MODEL, model)
+# Consider setting GEN_AI_RESPONSE_MODEL when a response is received.

97-114: Add a TODO to migrate function metadata to GenAIAttributes

We couldn’t find any GenAIAttributes constant for request/functions metadata, so continue using SpanAttributes.LLM_REQUEST_FUNCTIONS for now. To avoid mixing conventions once an equivalent lands, please:

  • In packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (at lines 97–114 and 151–161), add a TODO comment above each usage of SpanAttributes.LLM_REQUEST_FUNCTIONS noting to switch to the new GenAI attribute once it’s available.

343-361: Use GenAI token attributes instead of LLM ones for consistency
In packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py replace the mixed families in your histogram calls:

• At the input-tokens record
• At the output-tokens record

Example diff:

--- a/packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py
+++ b/packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py
@@ -346,7 +346,7 @@ def _record_token_usage(span: Span, model_name: str, record_token_usage: boo
-                token_histogram.record(
-                    input_tokens,
-                    attributes={
-                        GenAIAttributes.GEN_AI_SYSTEM: vendor,
-                        SpanAttributes.LLM_TOKEN_TYPE: "input",
-                        SpanAttributes.LLM_RESPONSE_MODEL: model_name,
-                    },
-                )
+                token_histogram.record(
+                    input_tokens,
+                    attributes={
+                        GenAIAttributes.GEN_AI_SYSTEM: vendor,
+                        GenAIAttributes.GEN_AI_TOKEN_TYPE: "input",
+                        GenAIAttributes.GEN_AI_RESPONSE_MODEL: model_name,
+                    },
+                )
@@ -356,7 +356,7 @@ def _record_token_usage(span: Span, model_name: str, record_token_usage: boo
-                token_histogram.record(
-                    output_tokens,
-                    attributes={
-                        GenAIAttributes.GEN_AI_SYSTEM: vendor,
-                        SpanAttributes.LLM_TOKEN_TYPE: "output",
-                        SpanAttributes.LLM_RESPONSE_MODEL: model_name,
-                    },
-                )
+                token_histogram.record(
+                    output_tokens,
+                    attributes={
+                        GenAIAttributes.GEN_AI_SYSTEM: vendor,
+                        GenAIAttributes.GEN_AI_TOKEN_TYPE: "output",
+                        GenAIAttributes.GEN_AI_RESPONSE_MODEL: model_name,
+                    },
+                )

This aligns with other instrumentation (e.g. OpenAI-agents) and avoids mixing GEN_AI_* and LLM_* families.

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (1)

181-187: Optional: unify total tokens key with GenAI namespace (if available)

You’re mixing GenAIAttributes.* for input/output tokens and SpanAttributes.LLM_USAGE_TOTAL_TOKENS for totals. If the incubating GenAI namespace defines a total tokens key, consider switching for consistency; otherwise keeping the stable key is fine.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c44af4c and 0129a0a.

📒 Files selected for processing (9)
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (9 hunks)
  • packages/opentelemetry-instrumentation-anthropic/tests/utils.py (4 hunks)
  • packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (5 hunks)
  • packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (11 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.py (6 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/test_llms.py (17 hunks)
  • packages/opentelemetry-instrumentation-ollama/tests/test_generation.py (12 hunks)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (7 hunks)
  • packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • packages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.py
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/init.py
  • packages/opentelemetry-instrumentation-langchain/tests/test_llms.py
  • packages/opentelemetry-instrumentation-ollama/tests/test_generation.py
  • packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit Inference Engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
  • packages/opentelemetry-instrumentation-anthropic/tests/utils.py
  • packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py
  • packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py
🧬 Code Graph Analysis (2)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (2)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (1)
  • _set_span_attribute (33-40)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (39-212)
packages/opentelemetry-instrumentation-anthropic/tests/utils.py (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • Meters (9-36)
🪛 Ruff (0.12.2)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py

3-3: opentelemetry.semconv._incubating.attributes.gen_ai_attributes imported but unused

(F401)

🪛 Flake8 (7.2.0)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py

[error] 2-2: 'opentelemetry.semconv._incubating.attributes.gen_ai_attributes as GenAIAttributes' imported but unused

(F401)

🔇 Additional comments (16)
packages/opentelemetry-instrumentation-anthropic/tests/utils.py (4)

22-25: Token type attribute migrated to GEN_AI_TOKEN_TYPE — LGTM

Using GenAIAttributes.GEN_AI_TOKEN_TYPE with values ["output", "input"] matches the new semconv naming.


27-29: Response model attribute migration — LGTM

GenAIAttributes.GEN_AI_RESPONSE_MODEL usage is correct.


38-40: Choices metric now keyed by GEN_AI_RESPONSE_MODEL — LGTM


51-55: Duration metric attribute lookup updated — LGTM

Using .get with GenAIAttributes.GEN_AI_RESPONSE_MODEL while allowing error.type fallback is appropriate.

packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (7)

18-20: Good move: centralized alias for GenAI attributes

Importing the incubating GenAI attributes under a single alias keeps usages consistent and makes future migrations easier.


89-96: LGTM: request parameter keys migrated to GenAI

max_tokens/temperature/top_p mappings to GEN_AI_REQUEST_* look correct and backward-compatible with common provider param names.


130-136: LGTM: prompt attributes migrated to GenAI

Using GEN_AI_PROMPT.{i}.role/content is aligned with the incubating GenAI semconv.


168-171: LGTM: chat prompt message mapping + tool calls

  • Roles/content now under GEN_AI_PROMPT.
  • Tool calls recorded under the same GEN_AI_PROMPT prefix, keeping request context cohesive.

Also applies to: 179-180, 189-192, 197-199


211-213: LGTM: completion prefix migrated to GenAI

Switch to GEN_AI_COMPLETION prefix is consistent with the rest of the migration.


147-148: Verify params source for set_request_params (serialized.kwargs vs kwargs)

You’re passing serialized.get("kwargs", {}) into set_request_params instead of the function’s kwargs argument. If providers set request controls under the actual kwargs param, this could miss model/params. If intentional (LangChain specific), consider a short comment.

Would you like me to search the repo for other call sites to confirm whether serialized["kwargs"] is always present and preferred here?


323-341: No GenAI total-tokens attribute found—keeping LLM_USAGE_TOTAL_TOKENS is correct
Verified that GenAIAttributes does not define a GEN_AI_USAGE_TOTAL_TOKENS constant and that SpanAttributes.LLM_USAGE_TOTAL_TOKENS remains the standard total-tokens key across all instrumentations and tests. No change required.

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (5)

41-43: Adopts incubating GenAI attributes alias — LGTM

Using a single namespace alias for incubating GenAI attributes keeps the call sites consistent and readable.


177-180: System/model/ID attributes migrated to GenAI namespace — looks good

These keys align with the GenAI semantic conventions and keep the span payload coherent.


224-227: Prompt attribute migration to GenAI. — LGTM*

All prompt content/role writes are correctly moved under GEN_AI_PROMPT with consistent indexing and gating by should_send_prompts().

Also applies to: 231-234, 254-259, 265-271, 283-295


300-304: Completion and tool_calls attribute migration to GenAI. — LGTM*

Completion role/content and tool_calls are consistently emitted under GEN_AI_COMPLETION. Arguments are serialized; IDs and names are well-scoped. Reasoning summary mapped as expected.

Also applies to: 315-327, 332-340, 344-352, 356-368, 371-372


190-191: Verify incubating constant existence and implement fallback

I couldn’t locate a definition for GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS in the incubating semantic-conventions package. Please confirm that this constant is published in opentelemetry-semantic-conventions; if it’s not present, fall back to using SpanAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS to prevent runtime errors.

• File: packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
Lines: 190–191

    GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS,
    usage.input_tokens_details.cached_tokens,

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (7)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)

340-356: Missing usage attribute tests for OpenAI Agents instrumentation
We didn’t find any tests in packages/opentelemetry-instrumentation-openai-agents that assert the span attributes for prompt, completion, or total tokens (GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS, GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, SpanAttributes.LLM_USAGE_TOTAL_TOKENS). Without these, changes to token extraction in _hooks.py may regress unnoticed.

Please add coverage in tests/test_openai_agents.py, for example:

  • Mock an OpenAI Agents response with a .usage object
  • Instrument and capture spans
  • Assert agent_span.attributes[...] for input_tokens, output_tokens, and total_tokens
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py (1)

231-242: Swap and use correct input/output token attributes in assistant_wrappers.py

The current implementation in assistant_wrappers.py inverts prompt/completion and still emits the legacy names. To align with the rest of the OpenAI instrumentation (see responses_wrappers.py and event_handler_wrapper.py) and with GenAI conventions—and to satisfy the existing assistant tests that assert on GEN_AI_USAGE_INPUT_TOKENS and GEN_AI_USAGE_OUTPUT_TOKENS—please replace the two span attribute calls as follows:

--- a/packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py
+++ b/packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py
@@ -232,9 +232,9 @@ def _wrap_run(run: Dict[str, Any], span: Span, extra: Context) -> Any:
         if run.get("usage"):
             usage_dict = model_as_dict(run.get("usage"))
-            _set_span_attribute(
-                span,
-                GenAIAttributes.GEN_AI_USAGE_PROMPT_TOKENS,
-                usage_dict.get("completion_tokens"),
-            )
-            _set_span_attribute(
-                span,
-                GenAIAttributes.GEN_AI_USAGE_COMPLETION_TOKENS,
-                usage_dict.get("prompt_tokens"),
-            )
+            _set_span_attribute(
+                span,
+                GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS,
+                usage_dict.get("prompt_tokens"),
+            )
+            _set_span_attribute(
+                span,
+                GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS,
+                usage_dict.get("completion_tokens"),
+            )

• Confirmed that GEN_AI_USAGE_INPUT_TOKENS and GEN_AI_USAGE_OUTPUT_TOKENS are already used in responses_wrappers.py (lines 182–183) and in tests under tests/traces/test_assistant.py.
• This swap prevents broken spans in assistant traces and brings consistency across all GenAI attribute emissions.

packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (1)

57-68: Align LangChain tests with GenAI attribute keys for completions and usage

The instrumentation now emits completions under GenAIAttributes.GEN_AI_COMPLETION and usage tokens under GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS, GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS (and retains total tokens as SpanAttributes.LLM_USAGE_TOTAL_TOKENS). Update the assertions in packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (lines 57–68):

• Replace SpanAttributes.LLM_COMPLETIONSGenAIAttributes.GEN_AI_COMPLETION
• Replace SpanAttributes.LLM_USAGE_PROMPT_TOKENSGenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS
• Replace SpanAttributes.LLM_USAGE_COMPLETION_TOKENSGenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS
• Replace SpanAttributes.LLM_USAGE_CACHE_READ_INPUT_TOKENSGenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS
• Keep SpanAttributes.LLM_USAGE_TOTAL_TOKENS as-is

Example diff:

- assert openai_span.attributes[f"{SpanAttributes.LLM_COMPLETIONS}.0.content"] == response
- assert openai_span.attributes[f"{SpanAttributes.LLM_COMPLETIONS}.0.role"] == "assistant"
+ assert openai_span.attributes[f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content"] == response
+ assert openai_span.attributes[f"{GenAIAttributes.GEN_AI_COMPLETION}.0.role"] == "assistant"

- assert openai_span.attributes[SpanAttributes.LLM_USAGE_PROMPT_TOKENS] == 24
- assert openai_span.attributes[SpanAttributes.LLM_USAGE_COMPLETION_TOKENS] == 11
- assert openai_span.attributes[SpanAttributes.LLM_USAGE_TOTAL_TOKENS] == 35
- assert openai_span.attributes[SpanAttributes.LLM_USAGE_CACHE_READ_INPUT_TOKENS] == 0
+ assert openai_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 24
+ assert openai_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] == 11
+ assert openai_span.attributes[SpanAttributes.LLM_USAGE_TOTAL_TOKENS] == 35
+ assert openai_span.attributes[GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS] == 0
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (1)

211-239: Bug: incorrect type check uses is instead of isinstance for message content

This causes string content to be treated as non-string, leading to unnecessary JSON encoding and potentially incorrect behavior.

-                if generation.message.content is str:
+                if isinstance(generation.message.content, str):
                     _set_span_attribute(
                         span,
                         f"{prefix}.content",
                         generation.message.content,
                     )

Additionally, guard generation_info which can be None to avoid AttributeError:

-                if generation.generation_info.get("finish_reason"):
+                if getattr(generation, "generation_info", None) and generation.generation_info.get("finish_reason"):
                     _set_span_attribute(
                         span,
                         f"{prefix}.finish_reason",
                         generation.generation_info.get("finish_reason"),
                     )
packages/opentelemetry-instrumentation-openai/tests/traces/test_streaming_with_api_usage.py (3)

1-4: Import GenAIAttributes and drop legacy SpanAttributes to align with PR semantics

Move off SpanAttributes and import GenAIAttributes directly; this also resolves the NameError on Line 44.

-from opentelemetry.semconv_ai import SpanAttributes
+from opentelemetry.semconv._incubating.attributes.gen_ai_attributes import GenAIAttributes

37-41: Update token usage assertions to GenAI semantics (prompt→input, completion→output)

These are still using legacy LLM_* keys. Align to GenAI usage keys per PR objectives.

-    assert span.attributes.get(SpanAttributes.LLM_USAGE_PROMPT_TOKENS) > 0
-    assert span.attributes.get(SpanAttributes.LLM_USAGE_COMPLETION_TOKENS) > 0
-    assert span.attributes.get(SpanAttributes.LLM_USAGE_TOTAL_TOKENS) > 0
+    assert span.attributes.get(GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS) > 0
+    assert span.attributes.get(GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS) > 0
+    assert span.attributes.get(GenAIAttributes.GEN_AI_USAGE_TOTAL_TOKENS) > 0

66-69: Also update token usage assertions in the second test

Bring these in line with GenAI usage attributes.

-    assert span.attributes.get(SpanAttributes.LLM_USAGE_PROMPT_TOKENS) > 0
-    assert span.attributes.get(SpanAttributes.LLM_USAGE_COMPLETION_TOKENS) > 0
+    assert span.attributes.get(GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS) > 0
+    assert span.attributes.get(GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS) > 0
♻️ Duplicate comments (4)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (1)

39-41: Resolved: GenAIAttributes import correctly added

This addresses the prior undefined-name (F821) issue flagged in earlier reviews. Linting/runtime should now pass for GenAIAttributes references in this module.

packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py (1)

122-130: Same undefined name issue applies here.

The references to GenAIAttributes in this test will also error without the import fix suggested above.

After applying the import change, please run the tests locally to ensure no remaining references to SpanAttributes exist in this file.

packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py (2)

111-115: Same fixes as above apply here.

Import GenAIAttributes and update completion/usage keys to GEN_AI_* in this test section as well.


162-166: Same fixes as above apply here (beta variant).

Ensure the import is present and migrate completion/usage keys to GEN_AI_*.

🧹 Nitpick comments (14)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (1)

178-191: Avoid mutable default arguments and tighten type hints

metric_attributes defaults to {} in both paths. While not mutated here, mutable defaults are best avoided. Also, token_histogram and choice_counter are annotated as non-optional but default to None.

Proposed refactor:

  • Use Optional[...] for the counters/histogram and metric_attributes.
  • Normalize metric_attributes to {} inside the function.

Apply these diffs:

Async path:

 async def _aset_token_usage(
     span,
     anthropic,
     request,
     response,
-    metric_attributes: dict = {},
-    token_histogram: Histogram = None,
-    choice_counter: Counter = None,
+    metric_attributes: Optional[dict] = None,
+    token_histogram: Optional[Histogram] = None,
+    choice_counter: Optional[Counter] = None,
 ):
     import inspect
+    if metric_attributes is None:
+        metric_attributes = {}

Sync path:

 def _set_token_usage(
     span,
     anthropic,
     request,
     response,
-    metric_attributes: dict = {},
-    token_histogram: Histogram = None,
-    choice_counter: Counter = None,
+    metric_attributes: Optional[dict] = None,
+    token_histogram: Optional[Histogram] = None,
+    choice_counter: Optional[Counter] = None,
 ):
     import inspect
+    if metric_attributes is None:
+        metric_attributes = {}

Also applies to: 295-309

packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py (1)

151-156: LGTM after adding the missing import.

Assertions correctly validate content presence and the absence of tool_calls for assistant-only content scenarios. With the GenAIAttributes import in place, this block looks good.

Optionally, mirror the call.args pattern here for readability, as suggested earlier.

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (5)

340-356: Use GenAI usage token attributes (input/output/total) instead of legacy LLM_USAGE_ here.*

The PR objective is to replace prompt_tokens/completion_tokens with input_tokens/output_tokens; however, this block still writes LLM_USAGE_* attributes. Recommend migrating to GenAI usage attributes for consistency.

Apply this diff:

-                        if hasattr(usage, 'input_tokens') and usage.input_tokens is not None:
-                            otel_span.set_attribute(SpanAttributes.LLM_USAGE_PROMPT_TOKENS, usage.input_tokens)
-                        elif hasattr(usage, 'prompt_tokens') and usage.prompt_tokens is not None:
-                            otel_span.set_attribute(SpanAttributes.LLM_USAGE_PROMPT_TOKENS, usage.prompt_tokens)
+                        if hasattr(usage, 'input_tokens') and usage.input_tokens is not None:
+                            otel_span.set_attribute(GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS, usage.input_tokens)
+                        elif hasattr(usage, 'prompt_tokens') and usage.prompt_tokens is not None:
+                            otel_span.set_attribute(GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS, usage.prompt_tokens)
 
-                        if hasattr(usage, 'output_tokens') and usage.output_tokens is not None:
-                            otel_span.set_attribute(SpanAttributes.LLM_USAGE_COMPLETION_TOKENS, usage.output_tokens)
-                        elif hasattr(usage, 'completion_tokens') and usage.completion_tokens is not None:
-                            otel_span.set_attribute(SpanAttributes.LLM_USAGE_COMPLETION_TOKENS, usage.completion_tokens)
+                        if hasattr(usage, 'output_tokens') and usage.output_tokens is not None:
+                            otel_span.set_attribute(GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, usage.output_tokens)
+                        elif hasattr(usage, 'completion_tokens') and usage.completion_tokens is not None:
+                            otel_span.set_attribute(GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, usage.completion_tokens)
 
-                        if hasattr(usage, 'total_tokens') and usage.total_tokens is not None:
-                            otel_span.set_attribute(SpanAttributes.LLM_USAGE_TOTAL_TOKENS, usage.total_tokens)
+                        if hasattr(usage, 'total_tokens') and usage.total_tokens is not None:
+                            otel_span.set_attribute(GenAIAttributes.GEN_AI_USAGE_TOTAL_TOKENS, usage.total_tokens)

Note: This requires the import suggested above:

from opentelemetry.semconv_ai import GenAIAttributes

443-459: Mirror the GenAI usage token migration in the legacy fallback block.

The legacy path still writes LLM_USAGE_* attributes. Align it with the GenAI usage attributes for consistency.

Apply this diff:

-                        if hasattr(usage, 'input_tokens') and usage.input_tokens is not None:
-                            otel_span.set_attribute(SpanAttributes.LLM_USAGE_PROMPT_TOKENS, usage.input_tokens)
-                        elif hasattr(usage, 'prompt_tokens') and usage.prompt_tokens is not None:
-                            otel_span.set_attribute(SpanAttributes.LLM_USAGE_PROMPT_TOKENS, usage.prompt_tokens)
+                        if hasattr(usage, 'input_tokens') and usage.input_tokens is not None:
+                            otel_span.set_attribute(GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS, usage.input_tokens)
+                        elif hasattr(usage, 'prompt_tokens') and usage.prompt_tokens is not None:
+                            otel_span.set_attribute(GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS, usage.prompt_tokens)
 
-                        if hasattr(usage, 'output_tokens') and usage.output_tokens is not None:
-                            otel_span.set_attribute(SpanAttributes.LLM_USAGE_COMPLETION_TOKENS, usage.output_tokens)
-                        elif hasattr(usage, 'completion_tokens') and usage.completion_tokens is not None:
-                            otel_span.set_attribute(SpanAttributes.LLM_USAGE_COMPLETION_TOKENS, usage.completion_tokens)
+                        if hasattr(usage, 'output_tokens') and usage.output_tokens is not None:
+                            otel_span.set_attribute(GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, usage.output_tokens)
+                        elif hasattr(usage, 'completion_tokens') and usage.completion_tokens is not None:
+                            otel_span.set_attribute(GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, usage.completion_tokens)
 
-                        if hasattr(usage, 'total_tokens') and usage.total_tokens is not None:
-                            otel_span.set_attribute(SpanAttributes.LLM_USAGE_TOTAL_TOKENS, usage.total_tokens)
+                        if hasattr(usage, 'total_tokens') and usage.total_tokens is not None:
+                            otel_span.set_attribute(GenAIAttributes.GEN_AI_USAGE_TOTAL_TOKENS, usage.total_tokens)

304-334: Migrate completion payload attributes from LLM_COMPLETIONS → GenAI completion.

You already use GEN_AI_* for tools; keep completions consistent with GenAI by switching off SpanAttributes.LLM_COMPLETIONS here.

Apply this diff:

-                                    otel_span.set_attribute(
-                                        f"{SpanAttributes.LLM_COMPLETIONS}.{i}.content", content_text)
-                                    otel_span.set_attribute(
-                                        f"{SpanAttributes.LLM_COMPLETIONS}.{i}.role", getattr(
-                                            output, 'role', 'assistant'))
+                                    otel_span.set_attribute(
+                                        f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.content", content_text)
+                                    otel_span.set_attribute(
+                                        f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.role", getattr(
+                                            output, 'role', 'assistant'))
@@
-                                otel_span.set_attribute(f"{SpanAttributes.LLM_COMPLETIONS}.{i}.role", "assistant")
-                                otel_span.set_attribute(
-                                    f"{SpanAttributes.LLM_COMPLETIONS}.{i}.finish_reason", "tool_calls")
-                                otel_span.set_attribute(
-                                    f"{SpanAttributes.LLM_COMPLETIONS}.{i}.tool_calls.0.name", tool_name)
-                                otel_span.set_attribute(
-                                    f"{SpanAttributes.LLM_COMPLETIONS}.{i}.tool_calls.0.arguments", arguments)
-                                otel_span.set_attribute(
-                                    f"{SpanAttributes.LLM_COMPLETIONS}.{i}.tool_calls.0.id", tool_call_id)
+                                otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.role", "assistant")
+                                otel_span.set_attribute(
+                                    f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.finish_reason", "tool_calls")
+                                otel_span.set_attribute(
+                                    f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.tool_calls.0.name", tool_name)
+                                otel_span.set_attribute(
+                                    f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.tool_calls.0.arguments", arguments)
+                                otel_span.set_attribute(
+                                    f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.tool_calls.0.id", tool_call_id)
@@
-                                otel_span.set_attribute(f"{SpanAttributes.LLM_COMPLETIONS}.{i}.content", output.text)
-                                otel_span.set_attribute(
-                                    f"{SpanAttributes.LLM_COMPLETIONS}.{i}.role", getattr(
-                                        output, 'role', 'assistant'))
+                                otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.content", output.text)
+                                otel_span.set_attribute(
+                                    f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.role", getattr(
+                                        output, 'role', 'assistant'))
@@
-                                otel_span.set_attribute(
-                                    f"{SpanAttributes.LLM_COMPLETIONS}.{i}.finish_reason", response.finish_reason)
+                                otel_span.set_attribute(
+                                    f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.finish_reason", response.finish_reason)

Note: Ensure the alias import exists:

from opentelemetry.semconv_ai import GenAIAttributes

409-441: Do the same GenAI completion migration in the legacy block.

Mirror the change for the second completions block to keep attributes consistent.

Apply this diff:

-                                    otel_span.set_attribute(
-                                        f"{SpanAttributes.LLM_COMPLETIONS}.{i}.content", content_text)
-                                    otel_span.set_attribute(
-                                        f"{SpanAttributes.LLM_COMPLETIONS}.{i}.role", getattr(
-                                            output, 'role', 'assistant'))
+                                    otel_span.set_attribute(
+                                        f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.content", content_text)
+                                    otel_span.set_attribute(
+                                        f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.role", getattr(
+                                            output, 'role', 'assistant'))
@@
-                                otel_span.set_attribute(f"{SpanAttributes.LLM_COMPLETIONS}.{i}.role", "assistant")
-                                otel_span.set_attribute(
-                                    f"{SpanAttributes.LLM_COMPLETIONS}.{i}.finish_reason", "tool_calls")
-                                otel_span.set_attribute(
-                                    f"{SpanAttributes.LLM_COMPLETIONS}.{i}.tool_calls.0.name", tool_name)
-                                otel_span.set_attribute(
-                                    f"{SpanAttributes.LLM_COMPLETIONS}.{i}.tool_calls.0.arguments", arguments)
-                                otel_span.set_attribute(
-                                    f"{SpanAttributes.LLM_COMPLETIONS}.{i}.tool_calls.0.id", tool_call_id)
+                                otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.role", "assistant")
+                                otel_span.set_attribute(
+                                    f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.finish_reason", "tool_calls")
+                                otel_span.set_attribute(
+                                    f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.tool_calls.0.name", tool_name)
+                                otel_span.set_attribute(
+                                    f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.tool_calls.0.arguments", arguments)
+                                otel_span.set_attribute(
+                                    f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.tool_calls.0.id", tool_call_id)
@@
-                                otel_span.set_attribute(f"{SpanAttributes.LLM_COMPLETIONS}.{i}.content", output.text)
-                                otel_span.set_attribute(
-                                    f"{SpanAttributes.LLM_COMPLETIONS}.{i}.role", getattr(
-                                        output, 'role', 'assistant'))
+                                otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.content", output.text)
+                                otel_span.set_attribute(
+                                    f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.role", getattr(
+                                        output, 'role', 'assistant'))
@@
-                                otel_span.set_attribute(
-                                    f"{SpanAttributes.LLM_COMPLETIONS}.{i}.finish_reason", response.finish_reason)
+                                otel_span.set_attribute(
+                                    f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.finish_reason", response.finish_reason)

10-11: Avoid private incubating module imports where possible.

You import GEN_AI_COMPLETION from a private incubating path. Prefer stable exports to reduce churn and align with the PR’s GenAIAttributes usage.

Two options:

  • Switch to the alias everywhere and drop the incubating import:
    • Import: from opentelemetry.semconv_ai import GenAIAttributes
    • Replace usages of GEN_AI_COMPLETION with GenAIAttributes.GEN_AI_COMPLETION.
  • Or, if you keep the constant import, ensure all GenAI keys consistently use either the alias or constants, not a mix.

Happy to provide a follow-up diff if you want to unify the file one way or the other.

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py (1)

206-213: LGTM with a consistency suggestion: user/system message prompts use GEN_AI_PROMPT; consider migrating completions too

For user/system messages, writing role/content under GEN_AI_PROMPT is correct. However, completion messages in this function still use the legacy LLM_COMPLETIONS prefix and a raw "gen_ai.response" string for IDs, which mixes namespaces.

Consider migrating completions to the GenAI response path for consistency, e.g.:

  • Use a GEN_AI response base instead of LLM_COMPLETIONS (if available in GenAIAttributes, e.g., GenAIAttributes.GEN_AI_RESPONSE).
  • Write role/content/id under f"{GenAIAttributes.GEN_AI_RESPONSE}.{completion_index}.*".

This avoids mixing old and new conventions in a single span.

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)

2-4: Fix lint failure: imported alias is unused (F401) — explicitly re-export or silence.

Ruff/Flake8 flags GenAIAttributes as an unused import. If the intent is to re-export the incubating attributes through this module (recommended for callers), add an inline noqa or explicitly reference it.

Apply one of the following:

Option A (noqa on import):

-from opentelemetry.semconv._incubating.attributes import (
-    gen_ai_attributes as GenAIAttributes,
-)
+from opentelemetry.semconv._incubating.attributes import (
+    gen_ai_attributes as GenAIAttributes,  # noqa: F401  (public re-export)
+)

Option B (explicit re-export; also keeps linters happy):

 from opentelemetry.semconv._incubating.attributes import (
-    gen_ai_attributes as GenAIAttributes,
+    gen_ai_attributes as GenAIAttributes,
 )
+# Public re-export for downstreams to import from opentelemetry.semconv_ai
+GenAIAttributes  # touch symbol to satisfy F401
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py (3)

116-118: Inconsistent prompt schema: use .role/.content instead of .user.

Everywhere else you use ...prompt.{i}.role and ...prompt.{i}.content. Here it writes to ...prompt.0.user, which is inconsistent and breaks consumers expecting role/content.

-def _set_prompt_span_attributes(span, request_body):
-    _set_span_attribute(
-        span, f"{GenAIAttributes.GEN_AI_PROMPT}.0.user", request_body.get("prompt")
-    )
+def _set_prompt_span_attributes(span, request_body):
+    _set_span_attribute(
+        span, f"{GenAIAttributes.GEN_AI_PROMPT}.0.content", request_body.get("prompt")
+    )
+    _set_span_attribute(span, f"{GenAIAttributes.GEN_AI_PROMPT}.0.role", "user")

441-444: Amazon input prompt: align with role/content (avoid .user).

Same inconsistency as above: prefer ...prompt.0.content + ...prompt.0.role = "user".

-        _set_span_attribute(
-            span,
-            f"{GenAIAttributes.GEN_AI_PROMPT}.0.user",
-            request_body.get("inputText"),
-        )
+        _set_span_attribute(
+            span,
+            f"{GenAIAttributes.GEN_AI_PROMPT}.0.content",
+            request_body.get("inputText"),
+        )
+        _set_span_attribute(span, f"{GenAIAttributes.GEN_AI_PROMPT}.0.role", "user")

546-558: Guard against None when summing token usage.

If either prompt_tokens or completion_tokens is None (possible in some provider responses), prompt_tokens + completion_tokens raises TypeError. Upstream callsites largely avoid this, but _set_imported_model_span_attributes can pass None.

Consider normalizing to ints (default 0) before summing:

# outside selected range; illustrative patch
pt = int(prompt_tokens or 0)
ct = int(completion_tokens or 0)
_set_span_attribute(span, SpanAttributes.LLM_USAGE_TOTAL_TOKENS, pt + ct)

Alternatively, skip the total attribute if either is missing.

packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (1)

76-96: LGTM: migrate request model and request params to GEN_AI_ keys*

The move to GEN_AI_REQUEST_MODEL/RESPONSE_MODEL/MAX_TOKENS/TEMPERATURE/TOP_P looks correct. Minor note: if no model is found, span_holder.request_model remains None; consider defaulting it to "unknown" for consistency.

     else:
-        model = "unknown"
+        model = "unknown"
+        span_holder.request_model = model
packages/opentelemetry-instrumentation-openai/tests/traces/test_streaming_with_api_usage.py (1)

9-12: Avoid hardcoding API keys in code, even in tests

Use an env var (e.g., OPENAI_API_KEY="test-api-key") to prevent accidental leakage if the value changes in the future.

Example:

import os
return OpenAI(
    api_key=os.getenv("OPENAI_API_KEY", "test-api-key"),
    base_url="https://api.deepseek.com/beta",
)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0129a0a and 0959a96.

⛔ Files ignored due to path filters (1)
  • packages/opentelemetry-instrumentation-alephalpha/poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (31)
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (9 hunks)
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py (11 hunks)
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py (6 hunks)
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py (3 hunks)
  • packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py (3 hunks)
  • packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py (24 hunks)
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py (24 hunks)
  • packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/__init__.py (3 hunks)
  • packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/__init__.py (3 hunks)
  • packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (5 hunks)
  • packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (10 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.py (7 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (1 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py (3 hunks)
  • packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py (2 hunks)
  • packages/opentelemetry-instrumentation-llamaindex/tests/test_chroma_vector_store.py (2 hunks)
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1 hunks)
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (5 hunks)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (7 hunks)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (7 hunks)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py (3 hunks)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py (6 hunks)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (7 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/metrics/test_openai_metrics.py (5 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py (13 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (20 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_completions.py (10 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_streaming_with_api_usage.py (1 hunks)
  • packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1 hunks)
  • packages/opentelemetry-semantic-conventions-ai/pyproject.toml (1 hunks)
  • packages/traceloop-sdk/tests/test_workflows.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (19)
  • packages/opentelemetry-semantic-conventions-ai/pyproject.toml
  • packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/init.py
  • packages/opentelemetry-instrumentation-llamaindex/tests/test_chroma_vector_store.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
  • packages/opentelemetry-instrumentation-openai/tests/metrics/test_openai_metrics.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_completions.py
  • packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py
  • packages/traceloop-sdk/tests/test_workflows.py
  • packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
  • packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/init.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit Inference Engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py
  • packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/__init__.py
  • packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_streaming_with_api_usage.py
  • packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py
  • packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py
  • packages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.py
  • packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py
🧠 Learnings (1)
📚 Learning: 2025-08-17T15:06:48.109Z
Learnt from: CR
PR: traceloop/openllmetry#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Semantic conventions must follow the OpenTelemetry GenAI specification

Applied to files:

  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py
🧬 Code Graph Analysis (4)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (2)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py (1)
  • set_span_attribute (21-25)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (67-240)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py (2)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (1)
  • _set_span_attribute (55-57)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (2)
  • SpanAttributes (67-240)
  • LLMRequestTypeValues (272-277)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py (2)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (1)
  • _set_span_attribute (30-37)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (1)
  • should_emit_events (174-181)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (2)
packages/opentelemetry-instrumentation-alephalpha/opentelemetry/instrumentation/alephalpha/__init__.py (1)
  • _set_span_attribute (63-67)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (1)
  • _set_span_attribute (30-37)
🪛 Ruff (0.12.2)
packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py

56-56: Undefined name GenAIAttributes

(F821)


59-59: Undefined name GenAIAttributes

(F821)


111-111: Undefined name GenAIAttributes

(F821)


114-114: Undefined name GenAIAttributes

(F821)


162-162: Undefined name GenAIAttributes

(F821)


165-165: Undefined name GenAIAttributes

(F821)

packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py

49-49: Undefined name GenAIAttributes

(F821)


51-51: Undefined name GenAIAttributes

(F821)


53-53: Undefined name GenAIAttributes

(F821)


55-55: Undefined name GenAIAttributes

(F821)

packages/opentelemetry-instrumentation-openai/tests/traces/test_streaming_with_api_usage.py

44-44: Undefined name GenAIAttributes

(F821)

packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py

52-52: Undefined name GenAIAttributes

(F821)


53-53: Undefined name GenAIAttributes

(F821)


54-54: Undefined name GenAIAttributes

(F821)


56-56: Undefined name GenAIAttributes

(F821)


59-59: Undefined name GenAIAttributes

(F821)


60-60: Undefined name GenAIAttributes

(F821)


61-61: Undefined name GenAIAttributes

(F821)


63-63: Undefined name GenAIAttributes

(F821)


66-66: Undefined name GenAIAttributes

(F821)


68-68: Undefined name GenAIAttributes

(F821)


71-71: Undefined name GenAIAttributes

(F821)


73-73: Undefined name GenAIAttributes

(F821)


76-76: Undefined name GenAIAttributes

(F821)


77-77: Undefined name GenAIAttributes

(F821)


78-78: Undefined name GenAIAttributes

(F821)


80-80: Undefined name GenAIAttributes

(F821)


82-82: Undefined name GenAIAttributes

(F821)


84-84: Undefined name GenAIAttributes

(F821)


87-87: Undefined name GenAIAttributes

(F821)


88-88: Undefined name GenAIAttributes

(F821)


89-89: Undefined name GenAIAttributes

(F821)


91-91: Undefined name GenAIAttributes

(F821)


122-122: Undefined name GenAIAttributes

(F821)


123-123: Undefined name GenAIAttributes

(F821)


124-124: Undefined name GenAIAttributes

(F821)


125-125: Undefined name GenAIAttributes

(F821)


126-126: Undefined name GenAIAttributes

(F821)


127-127: Undefined name GenAIAttributes

(F821)


129-129: Undefined name GenAIAttributes

(F821)


151-151: Undefined name GenAIAttributes

(F821)


152-152: Undefined name GenAIAttributes

(F821)


153-153: Undefined name GenAIAttributes

(F821)


155-155: Undefined name GenAIAttributes

(F821)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py

238-238: Undefined name GenAIAttributes

(F821)


239-239: Undefined name GenAIAttributes

(F821)


242-242: Undefined name GenAIAttributes

(F821)


243-243: Undefined name GenAIAttributes

(F821)

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py

3-3: opentelemetry.semconv._incubating.attributes.gen_ai_attributes imported but unused

(F401)

🪛 Flake8 (7.2.0)
packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py

[error] 56-56: undefined name 'GenAIAttributes'

(F821)


[error] 59-59: undefined name 'GenAIAttributes'

(F821)


[error] 111-111: undefined name 'GenAIAttributes'

(F821)


[error] 114-114: undefined name 'GenAIAttributes'

(F821)


[error] 162-162: undefined name 'GenAIAttributes'

(F821)


[error] 165-165: undefined name 'GenAIAttributes'

(F821)

packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py

[error] 49-49: undefined name 'GenAIAttributes'

(F821)


[error] 51-51: undefined name 'GenAIAttributes'

(F821)


[error] 53-53: undefined name 'GenAIAttributes'

(F821)


[error] 55-55: undefined name 'GenAIAttributes'

(F821)

packages/opentelemetry-instrumentation-openai/tests/traces/test_streaming_with_api_usage.py

[error] 44-44: undefined name 'GenAIAttributes'

(F821)

packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py

[error] 52-52: undefined name 'GenAIAttributes'

(F821)


[error] 53-53: undefined name 'GenAIAttributes'

(F821)


[error] 54-54: undefined name 'GenAIAttributes'

(F821)


[error] 56-56: undefined name 'GenAIAttributes'

(F821)


[error] 59-59: undefined name 'GenAIAttributes'

(F821)


[error] 60-60: undefined name 'GenAIAttributes'

(F821)


[error] 61-61: undefined name 'GenAIAttributes'

(F821)


[error] 63-63: undefined name 'GenAIAttributes'

(F821)


[error] 66-66: undefined name 'GenAIAttributes'

(F821)


[error] 68-68: undefined name 'GenAIAttributes'

(F821)


[error] 71-71: undefined name 'GenAIAttributes'

(F821)


[error] 73-73: undefined name 'GenAIAttributes'

(F821)


[error] 76-76: undefined name 'GenAIAttributes'

(F821)


[error] 77-77: undefined name 'GenAIAttributes'

(F821)


[error] 78-78: undefined name 'GenAIAttributes'

(F821)


[error] 80-80: undefined name 'GenAIAttributes'

(F821)


[error] 82-82: undefined name 'GenAIAttributes'

(F821)


[error] 84-84: undefined name 'GenAIAttributes'

(F821)


[error] 87-87: undefined name 'GenAIAttributes'

(F821)


[error] 88-88: undefined name 'GenAIAttributes'

(F821)


[error] 89-89: undefined name 'GenAIAttributes'

(F821)


[error] 91-91: undefined name 'GenAIAttributes'

(F821)


[error] 122-122: undefined name 'GenAIAttributes'

(F821)


[error] 123-123: undefined name 'GenAIAttributes'

(F821)


[error] 124-124: undefined name 'GenAIAttributes'

(F821)


[error] 125-125: undefined name 'GenAIAttributes'

(F821)


[error] 126-126: undefined name 'GenAIAttributes'

(F821)


[error] 127-127: undefined name 'GenAIAttributes'

(F821)


[error] 129-129: undefined name 'GenAIAttributes'

(F821)


[error] 151-151: undefined name 'GenAIAttributes'

(F821)


[error] 152-152: undefined name 'GenAIAttributes'

(F821)


[error] 153-153: undefined name 'GenAIAttributes'

(F821)


[error] 155-155: undefined name 'GenAIAttributes'

(F821)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py

[error] 238-238: undefined name 'GenAIAttributes'

(F821)


[error] 239-239: undefined name 'GenAIAttributes'

(F821)


[error] 242-242: undefined name 'GenAIAttributes'

(F821)


[error] 243-243: undefined name 'GenAIAttributes'

(F821)

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py

[error] 2-2: 'opentelemetry.semconv._incubating.attributes.gen_ai_attributes as GenAIAttributes' imported but unused

(F401)

🔇 Additional comments (22)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (8)

229-231: Token type metric attribute migrated to GenAI key

Using GenAIAttributes.GEN_AI_TOKEN_TYPE with "input"/"output" aligns with the GenAI semantic conventions and keeps metric attributes consistent with the rest of the PR.

Also applies to: 256-258


279-282: Span attributes: input/output token usage keys migrated

Setting GEN_AI_USAGE_INPUT_TOKENS and GEN_AI_USAGE_OUTPUT_TOKENS on the span looks good and matches the new terminology.


286-292: Span attributes: cache token usage keys migrated

GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS and GEN_AI_USAGE_CACHE_CREATION_INPUT_TOKENS updates are consistent and correct.


343-345: Sync path: token_type metric attribute migrated to GenAI key

Mirrors the async path; no issues spotted.

Also applies to: 370-372


393-396: Sync path: input/output token usage span attributes migrated

Consistent with async path; looks good.


400-406: Sync path: cache token usage span attributes migrated

Consistent with the async path and PR intent.


538-540: Set GEN_AI_SYSTEM attribute to 'Anthropic'

Correct usage of GenAI system identity for both sync and async wrappers.

Also applies to: 662-664


279-284: No GEN_AI_USAGE_TOTAL_TOKENS constant found—no changes needed
I searched the repo for GEN_AI_USAGE_TOTAL_TOKENS and didn’t find any definition. Since that constant doesn’t exist, the optional addition isn’t applicable.

packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/__init__.py (1)

33-35: GenAI attributes import aligns with the migration — LGTM

Importing gen_ai_attributes as GenAIAttributes matches the GenAI semconv migration and keeps usage ergonomic at call sites.

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py (6)

23-25: LGTM: Using GenAIAttributes alias from semconv incubating attributes

Import aligns with the GenAI semantic conventions migration and keeps usage readable across the file.


154-169: LGTM: Migrated system/model attributes to GenAI keys

Setting GEN_AI_SYSTEM, GEN_AI_REQUEST_MODEL, and GEN_AI_RESPONSE_MODEL on the span follows the updated conventions and matches the assistant model source.


172-188: LGTM: Per-prompt attributes migrated to GEN_AI_PROMPT path

Storing role/content under f"{GenAIAttributes.GEN_AI_PROMPT}.{index}.*" for assistant and run instructions looks correct and consistent with the GenAI schema.


275-287: LGTM: Streaming wrapper uses GenAI system/model attributes

Using GEN_AI_REQUEST_MODEL, GEN_AI_SYSTEM, and GEN_AI_RESPONSE_MODEL here mirrors the non-streaming flow and maintains consistency.


296-303: LGTM: Streaming wrapper emits/records assistant instructions under GEN_AI_PROMPT

Per-prompt role/content storage for the assistant instructions is correct.


307-310: LGTM: Streaming wrapper records run instructions under GEN_AI_PROMPT

Role/content mapping for the run instructions under the GenAI prompt path looks good.

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py (1)

85-89: LGTM: system/model/response attributes migrated to GenAI namespace.

Correct mapping to GEN_AI_SYSTEM, GEN_AI_REQUEST_MODEL, GEN_AI_RESPONSE_MODEL, and GEN_AI_RESPONSE_ID.

packages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.py (2)

7-9: LGTM: importing GenAIAttributes alias

Importing the incubating GenAIAttributes alias is correct and aligns the tests with the new semantic conventions.


51-58: LGTM: token_type and system attributes migrated to GEN_AI_ in metrics assertions*

The updates to use GenAIAttributes.GEN_AI_TOKEN_TYPE and GenAIAttributes.GEN_AI_SYSTEM (and keeping the value checks) look good and are consistent across helpers and streaming variants.

Also applies to: 71-73, 103-112, 123-126, 133-139, 145-146

packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (3)

18-20: LGTM: adopt GenAIAttributes alias

Switching to the incubating GenAIAttributes alias is correct and keeps this module aligned with the new GenAI semantic conventions.


130-136: LGTM: prompt role/content and tool_call_id migrated under GEN_AI_PROMPT path

Prompt attributes and tool call prefixing under GEN_AI_PROMPT are consistent with the rest of the migration. No functional issues spotted.

Also applies to: 168-199


321-340: Check GenAI total-tokens attribute for consistency

We currently emit input/output/cache-read tokens under GenAIAttributes.GEN_AI_USAGE_* but keep total tokens under SpanAttributes.LLM_USAGE_TOTAL_TOKENS. Please:

  • Verify whether the OpenTelemetry GenAI semconv defines a GEN_AI_USAGE_TOTAL_TOKENS constant.
    • If it exists, migrate for consistency:
       _set_span_attribute(
           span,
  •    SpanAttributes.LLM_USAGE_TOTAL_TOKENS,
    
  •    GenAIAttributes.GEN_AI_USAGE_TOTAL_TOKENS,
       total_tokens,
    
    )
    - Otherwise, add a brief comment explaining why `LLM_USAGE_TOTAL_TOKENS` is intentionally retained.
    
    

• Apply the same adjustment in the block at lines 342–361.

packages/opentelemetry-instrumentation-openai/tests/traces/test_streaming_with_api_usage.py (1)

16-18: No change needed — instrument_legacy already emits GenAI attributes

Short reason: instrument_legacy instruments OpenAI with OpenAIInstrumentor and the instrumentation code sets GenAI attributes (GenAIAttributes.*) alongside the LLM SpanAttributes, so the test assertions are valid.

Evidence (key locations):

  • packages/opentelemetry-instrumentation-openai/tests/conftest.py — instrument_legacy creates and instruments OpenAIInstrumentor.
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/config.py — Config.use_legacy_attributes exists (default True) but does not prevent GenAI attribute emission.
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/init.py — _set_span_attribute writes attributes (used with GenAIAttributes keys).
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py — sets GenAIAttributes.GEN_AI_RESPONSE_MODEL and GEN_AI_USAGE_* (and also SpanAttributes.LLM_USAGE_TOTAL_TOKENS).
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_streaming_with_api_usage.py — the test asserts both SpanAttributes.LLM_USAGE_* and GenAIAttributes.GEN_AI_RESPONSE_MODEL.

Conclusion: the original concern is incorrect for this codebase; no fixture change is required.

Likely an incorrect or invalid review comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py (6)

1661-1666: Leftover legacy usage keys: prompt_tokens/completion_tokens should be input_tokens/output_tokens.

These assertions still reference gen_ai.usage.prompt_tokens/completion_tokens and will diverge from instrumentation emitting input_tokens/output_tokens.

Apply this diff:

-    assert anthropic_span.attributes["gen_ai.usage.prompt_tokens"] == 568
+    assert anthropic_span.attributes["gen_ai.usage.input_tokens"] == 568
     assert (
-        anthropic_span.attributes["gen_ai.usage.completion_tokens"]
-        + anthropic_span.attributes["gen_ai.usage.prompt_tokens"]
+        anthropic_span.attributes["gen_ai.usage.output_tokens"]
+        + anthropic_span.attributes["gen_ai.usage.input_tokens"]
         == anthropic_span.attributes["llm.usage.total_tokens"]
     )

1821-1826: Same issue: replace prompt/completion_tokens with input/output_tokens.

-    assert anthropic_span.attributes["gen_ai.usage.prompt_tokens"] == 568
+    assert anthropic_span.attributes["gen_ai.usage.input_tokens"] == 568
     assert (
-        anthropic_span.attributes["gen_ai.usage.completion_tokens"]
-        + anthropic_span.attributes["gen_ai.usage.prompt_tokens"]
+        anthropic_span.attributes["gen_ai.usage.output_tokens"]
+        + anthropic_span.attributes["gen_ai.usage.input_tokens"]
         == anthropic_span.attributes["llm.usage.total_tokens"]
     )

2007-2012: Same issue: replace prompt/completion_tokens with input/output_tokens (streaming tools legacy).

-    assert anthropic_span.attributes["gen_ai.usage.prompt_tokens"] == 506
+    assert anthropic_span.attributes["gen_ai.usage.input_tokens"] == 506
     assert (
-        anthropic_span.attributes["gen_ai.usage.completion_tokens"]
-        + anthropic_span.attributes["gen_ai.usage.prompt_tokens"]
+        anthropic_span.attributes["gen_ai.usage.output_tokens"]
+        + anthropic_span.attributes["gen_ai.usage.input_tokens"]
         == anthropic_span.attributes["llm.usage.total_tokens"]
     )

2134-2139: Same issue: replace prompt/completion_tokens with input/output_tokens (tools streaming with events).

-    assert anthropic_span.attributes["gen_ai.usage.prompt_tokens"] == 514
+    assert anthropic_span.attributes["gen_ai.usage.input_tokens"] == 514
     assert (
-        anthropic_span.attributes["gen_ai.usage.completion_tokens"]
-        + anthropic_span.attributes["gen_ai.usage.prompt_tokens"]
+        anthropic_span.attributes["gen_ai.usage.output_tokens"]
+        + anthropic_span.attributes["gen_ai.usage.input_tokens"]
         == anthropic_span.attributes["llm.usage.total_tokens"]
     )

2247-2252: Same issue: replace prompt/completion_tokens with input/output_tokens (tools streaming with events/no-content).

-    assert anthropic_span.attributes["gen_ai.usage.prompt_tokens"] == 514
+    assert anthropic_span.attributes["gen_ai.usage.input_tokens"] == 514
     assert (
-        anthropic_span.attributes["gen_ai.usage.completion_tokens"]
-        + anthropic_span.attributes["gen_ai.usage.prompt_tokens"]
+        anthropic_span.attributes["gen_ai.usage.output_tokens"]
+        + anthropic_span.attributes["gen_ai.usage.input_tokens"]
         == anthropic_span.attributes["llm.usage.total_tokens"]
     )

2469-2471: Update legacy usage keys in tests

The test file packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py still contains assertions against the old gen_ai.usage.* keys. Please replace all occurrences of:

  • gen_ai.usage.prompt_tokens
  • gen_ai.usage.completion_tokens

with the new llm.usage.* equivalents. Specifically, update the assertions at:

  • Lines 1661–1665
  • Lines 1821–1825
  • Lines 1924–1928
  • Lines 2007–2011
  • Lines 2134–2138
  • Lines 2247–2251

Suggested diff for each block (using the first occurrence as an example):

- assert anthropic_span.attributes["gen_ai.usage.prompt_tokens"] == 568
+ assert anthropic_span.attributes["llm.usage.prompt_tokens"] == 568

- assert (
-     anthropic_span.attributes["gen_ai.usage.completion_tokens"]
-     + anthropic_span.attributes["gen_ai.usage.prompt_tokens"]
-     == anthropic_span.attributes["llm.usage.total_tokens"]
- )
+ assert (
+     anthropic_span.attributes["llm.usage.completion_tokens"]
+     + anthropic_span.attributes["llm.usage.prompt_tokens"]
+     == anthropic_span.attributes["llm.usage.total_tokens"]
+ )

Apply the same replacement at the other five locations to fully remove legacy keys.

♻️ Duplicate comments (6)
packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py (1)

12-14: Fix import: aliasing the module as GenAIAttributes breaks constant access

You’re importing the module gen_ai_attributes and aliasing it as GenAIAttributes, but the code expects the class/symbol GenAIAttributes (so that GenAIAttributes.GEN_AI_PROMPT resolves). Import the symbol directly to avoid runtime/name errors and to align with the rest of the repo.

Apply one of the following diffs (Option A recommended as the stable alias used across the repo):

Option A (preferred):

-from opentelemetry.semconv._incubating.attributes import (
-    gen_ai_attributes as GenAIAttributes,
-)
+from opentelemetry.semconv_ai import GenAIAttributes

Option B (if you want to stick to incubating paths):

-from opentelemetry.semconv._incubating.attributes import (
-    gen_ai_attributes as GenAIAttributes,
-)
+from opentelemetry.semconv._incubating.attributes.gen_ai_attributes import GenAIAttributes

To confirm the canonical import used elsewhere and catch this pattern if it appears again, run:

#!/bin/bash
# Inspect imports of GenAIAttributes across the repo
rg -n -C2 -P --type=py 'from\s+opentelemetry\.(semconv_ai|semconv\._incubating\.attributes(\.gen_ai_attributes)?)\s+import\s+GenAIAttributes'

# Find any remaining module-alias imports that could be incorrect
rg -n -C2 -P --type=py 'from\s+opentelemetry\.semconv\._incubating\.attributes\s+import\s*\(\s*gen_ai_attributes\s+as\s+GenAIAttributes'
packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (1)

6-8: Undefined name fixed: GenAIAttributes import added — LGTM

This resolves the prior F821 (Ruff/Flake8) by defining GenAIAttributes via the incubating semconv alias.

packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py (1)

3-5: LGTM: GenAIAttributes migration is correct and resolves earlier F821.

  • Importing GenAIAttributes and switching to GEN_AI_PROMPT/GEN_AI_COMPLETION plus GEN_AI_USAGE_INPUT_TOKENS/GEN_AI_USAGE_OUTPUT_TOKENS is consistent with the PR objective.
  • Keeping SpanAttributes.LLM_USAGE_TOTAL_TOKENS for totals is acceptable.

Also applies to: 57-63, 70-81, 114-132, 164-189

packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/__init__.py (2)

248-250: Resolved: GEN_AI_SYSTEM set via GenAIAttributes with canonical provider id "groq"

Lowercased provider id and switch to GenAIAttributes look correct and align with the repo-wide migration.


330-332: Async path mirrors the fix — LGTM

The async wrapper now sets GEN_AI_SYSTEM to "groq" via GenAIAttributes consistently with the sync path.

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)

10-12: Resolved: GenAIAttributes import added (addresses prior undefined symbol).

This fixes the earlier F821 lint/runtime errors flagged in the previous review. The alias usage below compiles.

🧹 Nitpick comments (7)
packages/opentelemetry-instrumentation-openai/tests/traces/test_streaming_with_api_usage.py (1)

41-45: Also migrate total tokens to GenAI for consistency

You migrated input/output token keys to GenAI, but total tokens still uses the legacy LLM key. For consistency with the PR objective and to avoid mixed semantics, switch to the GenAI total tokens attribute.

Apply:

-    assert span.attributes.get(SpanAttributes.LLM_USAGE_TOTAL_TOKENS) > 0
+    assert span.attributes.get(GenAIAttributes.GEN_AI_USAGE_TOTAL_TOKENS) > 0

Additionally, once updated, remove the now-unused import:

-from opentelemetry.semconv_ai import SpanAttributes

Please confirm your installed semconv version defines GEN_AI_USAGE_TOTAL_TOKENS under gen_ai_attributes.

packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (1)

67-69: No GenAI total_tokens attribute emitted yet – continue using LLM_USAGE_TOTAL_TOKENS with a future plan to migrate

The instrumentation currently only sets SpanAttributes.LLM_USAGE_TOTAL_TOKENS (there is no GenAIAttributes.GEN_AI_USAGE_TOTAL_TOKENS in span_utils). If you’d like to adopt the new GenAI namespace later, you can:

• Add support for GenAIAttributes.GEN_AI_USAGE_TOTAL_TOKENS in your instrumentation code
• Update tests to prefer the GenAI total with an LLM fallback, for example in
packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py Lines 67–69

-    assert openai_span.attributes[SpanAttributes.LLM_USAGE_TOTAL_TOKENS] == 35
+    # Prefer GenAI total tokens once implemented, fallback to LLM total:
+    assert openai_span.attributes.get(
+        GenAIAttributes.GEN_AI_USAGE_TOTAL_TOKENS,
+        openai_span.attributes.get(SpanAttributes.LLM_USAGE_TOTAL_TOKENS),
+    ) == 35

This is an optional, forward-looking refactor to align with GenAI semantic conventions once total_tokens is emitted under that namespace.

packages/opentelemetry-instrumentation-anthropic/tests/test_completion.py (1)

41-43: Optional: Prefer constants over string literals for response IDs.

Consider using GenAIAttributes.GEN_AI_RESPONSE_ID instead of the string "gen_ai.response.id" to avoid drift if keys change again.

packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/__init__.py (1)

247-251: Optional: factor shared span attributes to avoid duplication

Both wrappers build the same attributes dict. Consider a tiny helper to keep them in one place.

Apply this localized change in each start_span call:

-    span = tracer.start_span(
-        name,
-        kind=SpanKind.CLIENT,
-        attributes={
-            GenAIAttributes.GEN_AI_SYSTEM: "groq",
-            SpanAttributes.LLM_REQUEST_TYPE: LLMRequestTypeValues.COMPLETION.value,
-        },
-    )
+    span = tracer.start_span(
+        name,
+        kind=SpanKind.CLIENT,
+        attributes=_groq_span_base_attributes(),
+    )

Add this helper once in the module (e.g., near other small helpers):

def _groq_span_base_attributes() -> dict:
    return {
        GenAIAttributes.GEN_AI_SYSTEM: "groq",
        SpanAttributes.LLM_REQUEST_TYPE: LLMRequestTypeValues.COMPLETION.value,
    }

Also applies to: 326-333

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (3)

158-161: Avoid writing completion. tool attributes on a tool span.*

These attributes are prefixed under completion.* but are being emitted on the tool/function span. This is semantically confusing. The tool span already carries top-level tool fields (gen_ai.tool.). Completion tool calls belong under the response span as gen_ai.completion.{i}.tool_calls. (which you already handle later).

Recommend removing these lines from the tool span:

-                f"{GenAIAttributes.GEN_AI_COMPLETION}.tool.name": tool_name,
-                f"{GenAIAttributes.GEN_AI_COMPLETION}.tool.type": "FunctionTool",
-                f"{GenAIAttributes.GEN_AI_COMPLETION}.tool.strict_json_schema": True
@@
-                    tool_attributes[f"{GenAIAttributes.GEN_AI_COMPLETION}.tool.description"] = desc

Also applies to: 167-167


279-294: Model settings attributes migrated correctly; consider using the model alias for consistency.

Temperature, max_tokens, and top_p use GenAIAttributes. The model field still uses a string literal ("gen_ai.request.model"). For consistency with the rest of the repo and the second file, consider switching to the alias.

-                        otel_span.set_attribute("gen_ai.request.model", response.model)
+                        otel_span.set_attribute(GenAIAttributes.GEN_AI_REQUEST_MODEL, response.model)

Apply similarly where model is set later (Line 396 and Line 480).


474-479: Propagating model settings to agent spans is consistent with GenAI aliases.

Temperature, max_tokens, and top_p use GenAIAttributes; consider also switching the model literal to the alias, as noted above.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0959a96 and 9d4ad88.

📒 Files selected for processing (13)
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py (12 hunks)
  • packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py (4 hunks)
  • packages/opentelemetry-instrumentation-anthropic/tests/test_completion.py (1 hunks)
  • packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py (28 hunks)
  • packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/__init__.py (3 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.py (9 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (2 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py (4 hunks)
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (10 hunks)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py (7 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_completions.py (10 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_streaming_with_api_usage.py (3 hunks)
  • packages/traceloop-sdk/tests/test_prompt_management.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • packages/traceloop-sdk/tests/test_prompt_management.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_completions.py
  • packages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit Inference Engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/__init__.py
  • packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py
  • packages/opentelemetry-instrumentation-anthropic/tests/test_completion.py
  • packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_streaming_with_api_usage.py
  • packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py
  • packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py
🧠 Learnings (1)
📚 Learning: 2025-08-17T15:06:48.109Z
Learnt from: CR
PR: traceloop/openllmetry#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Semantic conventions must follow the OpenTelemetry GenAI specification

Applied to files:

  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py
🧬 Code Graph Analysis (2)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py (2)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (1)
  • _set_span_attribute (30-37)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (1)
  • should_emit_events (174-181)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py (1)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py (1)
  • set_span_attribute (21-25)
🔇 Additional comments (23)
packages/opentelemetry-instrumentation-openai/tests/traces/test_streaming_with_api_usage.py (3)

3-5: Resolved: GenAI attributes import added correctly

This addresses the earlier NameError complaint and unblocks the test from import-time failure. The aliasing pattern is consistent with the broader migration.


48-48: LGTM: response model assertion updated to GenAI key

Using GEN_AI_RESPONSE_MODEL aligns with the new naming and the provider’s model id.


70-73: LGTM: usage assertions migrated to GenAI keys in second test

Both input and output tokens now assert against the new GenAI attributes. Matches the migration.

packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py (3)

54-96: LGTM: Assertions correctly migrated to GEN_AI_PROMPT and validate content/tool_calls

The checks cover role, content, and tool_call metadata on the assistant flow and match the updated GenAI attribute namespace.


124-132: LGTM: Tool-only assistant message behavior is asserted under GEN_AI_PROMPT

Good coverage for the “no content, only tool_calls” case. Keys and values match expected naming.


153-158: LGTM: Content-only assistant message behavior is asserted under GEN_AI_PROMPT

Assertions align with the GenAI naming and validate that no tool_calls attributes are emitted.

packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (3)

50-58: GenAI request.model and prompt attributes migration — LGTM

Using GenAIAttributes for request model and prompt list indices looks correct and consistent with the new semconv.


60-66: GenAI completion attributes migration — LGTM

Completion content and role assertions correctly use GenAIAttributes.


49-54: Keep LLM semconv for request type
The GenAI semantic conventions package does not define a GEN_AI_REQUEST_TYPE attribute—only llm.request.type is emitted by the instrumentation. Leave this assertion unchanged:

-    assert openai_span.attributes[SpanAttributes.LLM_REQUEST_TYPE] == "chat"
+    assert openai_span.attributes[SpanAttributes.LLM_REQUEST_TYPE] == "chat"

Likely an incorrect or invalid review comment.

packages/opentelemetry-instrumentation-anthropic/tests/test_completion.py (1)

35-40: LGTM: Assertions updated to new GEN_AI_ keys.*

Span assertions now use GenAIAttributes.GEN_AI_PROMPT and GEN_AI_COMPLETION. This aligns with the new semconv mapping. Nice cleanup.

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py (1)

73-91: LGTM: Request attributes migrated to GenAI request keys.

GEN_AI_REQUEST_MODEL, GEN_AI_REQUEST_MAX_TOKENS, GEN_AI_REQUEST_TEMPERATURE, and GEN_AI_REQUEST_TOP_P usage looks consistent. Leaving LLM_* penalties and is_streaming as-is is fine given current semconv coverage.

packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py (1)

108-113: LGTM (partial): Updated assertions to input_tokens/output_tokens and GEN_AI_ keys.*

The migrated sections use GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS/OUTPUT_TOKENS and GEN_AI_PROMPT/COMPLETION correctly, while preserving LLM_USAGE_TOTAL_TOKENS. Good consistency across sync/async and streaming paths.

Also applies to: 155-160, 219-225, 291-296, 345-350, 406-411, 572-577, 627-632, 689-694, 757-762, 811-816, 876-881, 944-949, 991-996, 1049-1054, 1122-1127, 1176-1181, 1240-1245, 1297-1302, 1437-1442, 1539-1544

packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/__init__.py (1)

33-35: Confirm & Pin Minimum opentelemetry-semantic-conventions Version for GenAI Attributes

The import

from opentelemetry.semconv._incubating.attributes import gen_ai_attributes as GenAIAttributes

requires a semconv release that actually ships the opentelemetry/semconv/_incubating/attributes.py module. In your pyproject.toml you currently have:

  • opentelemetry-semantic-conventions = ">=0.50b0"
  • opentelemetry-semantic-conventions-ai = "^0.4.12"

Please verify which opentelemetry-semantic-conventions version first introduced the incubating.attributes module (and its gen_ai_attributes function) and bump the minimum version accordingly to avoid runtime ImportErrors in older environments.

• File to update:
packages/opentelemetry-instrumentation-groq/pyproject.toml
• Dependency entry:
opentelemetry-semantic-conventions = ">=<version-that-includes-gen_ai_attributes>"
• Code location relying on it:
packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/init.py

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (5)

240-246: Prompt attributes migration looks correct.

Recording role/content under GEN_AI_PROMPT.{i}.* aligns with GenAI semconv.


306-341: Completion attributes migration looks correct.

Content, role, and finish_reason are emitted under GEN_AI_COMPLETION.{i}.*, and tool_calls are structured under the correct prefix.


346-355: Usage tokens: correct input/output mapping and names.

Handles both input_tokens/output_tokens and legacy prompt/completion fields, mapping to GEN_AI_USAGE_INPUT_TOKENS and GEN_AI_USAGE_OUTPUT_TOKENS.


411-444: Legacy path: completion attributes aligned with GenAI.

Even in the fallback branch, completions are emitted via GEN_AI_COMPLETION.{i}.*, consistent with the new naming.


450-458: Legacy path: usage tokens mapping correct.

Maintains input→input_tokens/prompt_tokens and output→output_tokens/completion_tokens mapping.

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py (5)

23-25: GenAIAttributes import added — good.

Consistent with other instrumentations in this PR.


156-168: System and model attributes correctly migrated to GenAI.

GEN_AI_SYSTEM, GEN_AI_REQUEST_MODEL, and GEN_AI_RESPONSE_MODEL usage aligns with the spec.


172-178: Per-prompt attributes correctly migrated.

Recording role/content under GEN_AI_PROMPT.{index}.* is correct, both for assistant config and run instructions.

Also applies to: 182-187


195-212: Completion prefix usage is correct.

Using prefix = f"{GenAIAttributes.GEN_AI_COMPLETION}.{completion_index}" and writing role/content under it is consistent with the GenAI conventions.


276-287: Stream wrapper: attributes migrated correctly.

System/model attributes and per-prompt system messages use the GenAI aliases consistently.

Also applies to: 296-310

@martimfasantos martimfasantos requested a review from nirga September 6, 2025 17:55
@martimfasantos
Copy link
Contributor Author

Thanks @martimfasantos! And sorry for the delay here - I missed the github notification. Can you fix lint + tests error? Unfortunately since you're a first time contributor github doesn't allow the CI to run automatically and I have to manually run it but you can run it locally from the root directory:

nx run-many -t lint
nx run-many -t test

No worries :)
I have resolved the all linting errors and addressed some of the test failures. However, I believe several of the tests are significantly outdated due to changes in certain packages and would require extensive context-specific modifications. As such, I won't be updating them.

Here are the screenshots:
{E3821708-97EF-4FB8-AD0E-15DD1577B683}
{B1A22677-C529-426D-8277-2C05FE7641CA}

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
packages/opentelemetry-instrumentation-langchain/tests/test_structured_output.py (1)

92-104: IndexError risk: asserting 1 log but accessing logs[1]

assert len(logs) == 1 conflicts with logs[1]. Pick one behavior and align.

Option A: Expect both events (user + choice):

-    assert len(logs) == 1
+    assert len(logs) == 2

Option B: Only the user message is emitted when content is empty:

-    assert len(logs) == 1
+    assert len(logs) == 1
@@
-    # Validate AI choice Event
-    choice_event = {
-        "index": 0,
-        "finish_reason": "stop",
-        "message": {},
-    }
-    assert_message_in_logs(logs[1], "gen_ai.choice", choice_event)
+    # No choice event expected when content is empty
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py (1)

195-205: Test no longer validates issue #3350; restore serialization checks

The docstring says we’re guarding against an “Invalid type dict” warning and expect serialized reasoning. Current assertions skip that and even assert absence in Line 197, which defeats the regression test.

Apply:

-    # Verify the reasoning attributes are properly set without causing warnings
-    # assert span.attributes["gen_ai.request.reasoning_effort"] == "medium"
-    # assert span.attributes["gen_ai.request.reasoning_summary"] == "auto"
-    # This should not cause an "Invalid type dict" warning and should contain serialized reasoning
-    assert "gen_ai.completion.0.reasoning" not in span.attributes
-    # The reasoning should be serialized as JSON since it contains complex data
-    # reasoning_attr = span.attributes["gen_ai.completion.0.reasoning"]
-    # assert isinstance(reasoning_attr, str)
-    # Should be valid JSON containing reasoning summary data
-    # import json
-    # parsed_reasoning = json.loads(reasoning_attr)
-    # assert isinstance(parsed_reasoning, (dict, list))  # Could be dict or list depending on response structure
+    # Verify the reasoning attributes are properly set without causing warnings
+    # Request fields (optional, depending on instrumentation contract):
+    # assert span.attributes.get("gen_ai.request.reasoning_effort") == "medium"
+    # assert span.attributes.get("gen_ai.request.reasoning_summary") == "auto"
+    # This should not cause an "Invalid type dict" warning and should contain serialized reasoning
+    assert "gen_ai.completion.0.reasoning" in span.attributes
+    # The reasoning should be serialized as JSON since it contains complex data
+    reasoning_attr = span.attributes["gen_ai.completion.0.reasoning"]
+    assert isinstance(reasoning_attr, str)
+    # Should be valid JSON containing reasoning summary data
+    import json
+    parsed_reasoning = json.loads(reasoning_attr)
+    assert isinstance(parsed_reasoning, (dict, list))
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py (1)

291-304: Avoid token-counting fallbacks returning 0 due to None/non-text blocks.

Filter to text blocks and skip falsy values before counting.

-            else:
-                raw_messages.extend([content.get("text") for content in message])
+            else:
+                raw_messages.extend(
+                    [
+                        c.get("text")
+                        for c in message
+                        if isinstance(c, dict) and c.get("type") == "text" and c.get("text")
+                    ]
+                )
@@
-        completion_tokens = _count_anthropic_tokens(
-            [content.get("text") for content in response_body.get("content")]
-        )
+        completion_tokens = _count_anthropic_tokens(
+            [
+                c.get("text")
+                for c in response_body.get("content")
+                if isinstance(c, dict) and c.get("type") == "text" and c.get("text")
+            ]
+        )
@@
-        for message in messages:
-            count += anthropic_client.count_tokens(text=message)
+        for message in messages:
+            if not message:
+                continue
+            count += anthropic_client.count_tokens(text=message)

Also applies to: 323-329

♻️ Duplicate comments (6)
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (1)

627-632: Fixed: use GenAIAttributes for completion path

This addresses the prior review about using the wrong constant source for completion paths. Good catch.

Also applies to: 803-807

packages/opentelemetry-instrumentation-openai/tests/traces/test_assistant.py (1)

86-87: Previous concern resolved: legacy LLM_USAGE_ assertions have been migrated.*

Also applies to: 169-170, 249-250, 319-320, 387-388, 461-462, 541-542, 620-621, 715-716, 798-799, 866-867, 969-970, 1047-1048, 1115-1116, 1203-1204

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py (4)

470-476: Amazon inputText: use role/content keys, not ".user".

-        _set_span_attribute(
-            span,
-            f"{GenAIAttributes.GEN_AI_PROMPT}.0.user",
-            request_body.get("inputText"),
-        )
+        _set_span_attribute(
+            span,
+            f"{GenAIAttributes.GEN_AI_PROMPT}.0.content",
+            request_body.get("inputText"),
+        )
+        _set_span_attribute(span, f"{GenAIAttributes.GEN_AI_PROMPT}.0.role", "user")

41-50: Anthropic messages: index bug and potential serialization error.

Content is always written to index 0; also json.dumps without default may raise on non-serializable blocks.

Apply:

                 _set_span_attribute(
                     span,
-                    f"{GenAIAttributes.GEN_AI_PROMPT}.0.content",
-                    json.dumps(message.get("content")),
+                    f"{GenAIAttributes.GEN_AI_PROMPT}.{idx}.content",
+                    json.dumps(message.get("content"), default=str),
                 )

246-254: Anthropic content branch: writing role into .content; also add safe JSON default.

First write should target .role; keep payload in .content with default=str to avoid serialization errors.

-        _set_span_attribute(
-            span, f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content", "assistant"
-        )
+        _set_span_attribute(
+            span, f"{GenAIAttributes.GEN_AI_COMPLETION}.0.role", "assistant"
+        )
         _set_span_attribute(
             span,
-            f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content",
-            json.dumps(response_body.get("content")),
+            f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content",
+            json.dumps(response_body.get("content"), default=str),
         )

147-151: Use role/content keys instead of deprecated ".user" for prompts.

-    _set_span_attribute(
-        span, f"{GenAIAttributes.GEN_AI_PROMPT}.0.user", request_body.get("prompt")
-    )
+    _set_span_attribute(
+        span, f"{GenAIAttributes.GEN_AI_PROMPT}.0.content", request_body.get("prompt")
+    )
+    _set_span_attribute(span, f"{GenAIAttributes.GEN_AI_PROMPT}.0.role", "user")
🧹 Nitpick comments (39)
packages/opentelemetry-instrumentation-langchain/tests/test_structured_output.py (1)

29-29: Avoid unused assignment; call directly

No need to bind to _—just invoke.

-    _ = model_with_structured_output.invoke(query)
+    model_with_structured_output.invoke(query)
packages/opentelemetry-instrumentation-openai/tests/traces/test_prompt_caching.py (6)

70-71: Prefer semconv constants over raw strings for token keys

Use GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS and GEN_AI_USAGE_OUTPUT_TOKENS to avoid typos (like the one above) and track future renames centrally.

-    assert cache_creation_span.attributes["gen_ai.usage.input_tokens"] == 1149
-    assert cache_creation_span.attributes["gen_ai.usage.output_tokens"] == 315
+    assert cache_creation_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 1149
+    assert cache_creation_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] == 315
-    assert cache_read_span.attributes["gen_ai.usage.input_tokens"] == 1149
-    assert cache_read_span.attributes["gen_ai.usage.output_tokens"] == 353
+    assert cache_read_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 1149
+    assert cache_read_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] == 353

Also applies to: 74-75


131-134: Use semconv constants for usage keys

Keeps tests resilient to renames and avoids string typos.

-    assert cache_creation_span.attributes["gen_ai.usage.input_tokens"] == 1149
-    assert cache_creation_span.attributes["gen_ai.usage.output_tokens"] == 315
-    assert cache_read_span.attributes["gen_ai.usage.input_tokens"] == 1149
-    assert cache_read_span.attributes["gen_ai.usage.output_tokens"] == 353
+    assert cache_creation_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 1149
+    assert cache_creation_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] == 315
+    assert cache_read_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 1149
+    assert cache_read_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] == 353

221-224: Same constant usage recommendation here

Consistent with other tests.

-    assert cache_creation_span.attributes["gen_ai.usage.input_tokens"] == 1149
-    assert cache_creation_span.attributes["gen_ai.usage.output_tokens"] == 315
-    assert cache_read_span.attributes["gen_ai.usage.input_tokens"] == 1149
-    assert cache_read_span.attributes["gen_ai.usage.output_tokens"] == 353
+    assert cache_creation_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 1149
+    assert cache_creation_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] == 315
+    assert cache_read_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 1149
+    assert cache_read_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] == 353

315-316: Async variant: switch to constants for usage keys

Prevents drift and copy/paste errors.

-    assert cache_creation_span.attributes["gen_ai.usage.input_tokens"] == 1150
-    assert cache_creation_span.attributes["gen_ai.usage.output_tokens"] == 293
+    assert cache_creation_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 1150
+    assert cache_creation_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] == 293
-    assert cache_read_span.attributes["gen_ai.usage.input_tokens"] == 1150
-    assert cache_read_span.attributes["gen_ai.usage.output_tokens"] == 307
+    assert cache_read_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 1150
+    assert cache_read_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] == 307

Also applies to: 318-319


376-380: Async-with-content: use constants

Same rationale as above.

-    assert cache_creation_span.attributes["gen_ai.usage.input_tokens"] == 1150
-    assert cache_creation_span.attributes["gen_ai.usage.output_tokens"] == 293
-    assert cache_read_span.attributes["gen_ai.usage.input_tokens"] == 1150
-    assert cache_read_span.attributes["gen_ai.usage.output_tokens"] == 307
+    assert cache_creation_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 1150
+    assert cache_creation_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] == 293
+    assert cache_read_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 1150
+    assert cache_read_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] == 307

468-471: Async-without-content: use constants

Keeps all test blocks consistent.

-    assert cache_creation_span.attributes["gen_ai.usage.input_tokens"] == 1150
-    assert cache_creation_span.attributes["gen_ai.usage.output_tokens"] == 293
-    assert cache_read_span.attributes["gen_ai.usage.input_tokens"] == 1150
-    assert cache_read_span.attributes["gen_ai.usage.output_tokens"] == 307
+    assert cache_creation_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 1150
+    assert cache_creation_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] == 293
+    assert cache_read_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 1150
+    assert cache_read_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS] == 307
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (2)

38-43: Switch to GenAI constants looks good; consider a safer presence check

Using GenAIAttributes for prompt/completion is correct. For existence checks, prefer explicit key membership to avoid false positives on empty strings.

-    assert open_ai_span.attributes.get(
-        f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content")
+    assert f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content" in open_ai_span.attributes

682-686: Ruff S101 in tests

Ruff flags assert usage (S101) in these test files. If your lint config doesn’t already ignore S101 for tests, add a per-file ignore (e.g., in ruff.toml: per-file-ignores = { "packages//tests//test_*.py" = ["S101"] }).

Also applies to: 748-753, 751-756

packages/traceloop-sdk/tests/test_prompt_management.py (2)

241-241: Avoid bare assert for presence check (ruff S101) and be explicit

Prevents Bandit/Ruff complaints and gives a clearer failure message.

-    assert open_ai_span.attributes.get(f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content")
+    if f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content" not in open_ai_span.attributes:
+        pytest.fail("Missing completion content attribute on span")

272-279: Harden JSON validation and drop redundant assertion

Catch None/TypeError and remove no-op assert True.

-    completion = open_ai_span.attributes.get(
-        f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content"
-    )
-    try:
-        json.loads(completion)
-    except json.JSONDecodeError:
-        pytest.fail("Response is not valid JSON")
-    assert True
+    completion = open_ai_span.attributes.get(
+        f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content"
+    )
+    if completion is None:
+        pytest.fail("Missing completion content attribute on span")
+    try:
+        json.loads(completion)
+    except (TypeError, json.JSONDecodeError):
+        pytest.fail("Response is not valid JSON")
packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py (2)

101-103: Uncomment input_tokens assertion for consistency

Only this test leaves the input_tokens check commented, while all others assert it. Either re-enable it or document why this legacy path differs. Prefer enabling to avoid regressions.

Apply:

-    # assert cache_creation_span.attributes["gen_ai.usage.input_tokens"] == 1167
+    assert cache_creation_span.attributes["gen_ai.usage.input_tokens"] == 1167

190-201: Use GenAIAttributes constants instead of string-literals for usage keys
Replace raw "gen_ai.usage.*" keys with the corresponding GenAIAttributes.GEN_AI_USAGE_* constants (e.g. GEN_AI_USAGE_INPUT_TOKENS, GEN_AI_USAGE_OUTPUT_TOKENS, GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS, GEN_AI_USAGE_CACHE_CREATION_INPUT_TOKENS) in:

  • packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py (190–201)
  • packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py (1661–1664, 1821–1824, 1924–1927, 2007–2010, 2134–2137, 2247–2250)
packages/opentelemetry-instrumentation-langchain/tests/test_tool_calls.py (6)

36-74: Re-enable span assertions (don’t leave commented checks) and bind result.

This test currently only checks logs; the core span validations are commented out. Either delete the dead code or reinstate minimal span assertions using GenAI attributes. Suggest reinstating with a bound result:

-    _ = model_with_tools.invoke(query)
-    # spans = span_exporter.get_finished_spans()
-
-    # span_names = set(span.name for span in spans)
-    # expected_spans = {"ChatOpenAI.chat"}
-    # assert expected_spans.issubset(span_names)
-
-    # chat_span = next(
-    #     span for span in spans if span.name == "ChatOpenAI.chat"
-    # )
-
-    # assert chat_span.attributes[f"{SpanAttributes.LLM_REQUEST_FUNCTIONS}.0.name"] == "food_analysis"
-    # assert json.loads(chat_span.attributes[f"{SpanAttributes.LLM_REQUEST_FUNCTIONS}.0.parameters"]) == {
-    #     "properties": {
-    #         "name": {"type": "string"},
-    #         "healthy": {"type": "boolean"},
-    #         "calories": {"type": "integer"},
-    #         "taste_profile": {"type": "array", "items": {"type": "string"}},
-    #     },
-    #     "required": ["name", "healthy", "calories", "taste_profile"],
-    #     "type": "object",
-    # }
-
-    # assert chat_span.attributes[f"{GenAIAttributes.GEN_AI_PROMPT}.0.content"] == query_text
-    # assert (
-    #     chat_span.attributes[f"{GenAIAttributes.GEN_AI_COMPLETION}.0.tool_calls.0.name"]
-    #     == "food_analysis"
-    # )
-
-    # arguments = chat_span.attributes[
-    #     f"{GenAIAttributes.GEN_AI_COMPLETION}.0.tool_calls.0.arguments"
-    # ]
-    # assert json.loads(arguments) == json.loads(
-    #     result.model_dump()
-    #     .get("additional_kwargs")
-    #     .get("tool_calls")[0]
-    #     .get("function")
-    #     .get("arguments")
-    # )
+    result = model_with_tools.invoke(query)
+    spans = span_exporter.get_finished_spans()
+    chat_span = next(span for span in spans if span.name == "ChatOpenAI.chat")
+    assert chat_span.attributes[f"{GenAIAttributes.GEN_AI_PROMPT}.0.content"] == query_text
+    assert (
+        chat_span.attributes[f"{GenAIAttributes.GEN_AI_COMPLETION}.0.tool_calls.0.name"]
+        == "food_analysis"
+    )
+    assert chat_span.attributes.get(
+        f"{GenAIAttributes.GEN_AI_COMPLETION}.0.tool_calls.0.arguments"
+    )

213-256: LGTM: migrated to GenAI prompt attributes and roles. Also remove duplicate function-attributes block above.

Assertions on GEN_AI_PROMPT and roles look correct. There’s a duplicate assertions block for SpanAttributes.LLM_REQUEST_FUNCTIONS right above (Lines 204–212) that repeats Lines 195–203. Remove the duplicate to cut noise.

-    assert chat_span.attributes[f"{SpanAttributes.LLM_REQUEST_FUNCTIONS}.0.name"] == "get_weather"
-    assert json.loads(chat_span.attributes[f"{SpanAttributes.LLM_REQUEST_FUNCTIONS}.0.parameters"]) == {
-        "properties": {
-            "location": {"type": "string"},
-        },
-        "required": ["location"],
-        "type": "object",
-    }

496-523: Avoid brittle hard-coded Anthropic tool_call id; derive from result.

Provider IDs can change; prefer comparing to the returned result’s id to keep tests stable.

-    assert (
-        chat_span.attributes[f"{GenAIAttributes.GEN_AI_COMPLETION}.0.tool_calls.0.id"]
-        == "toolu_016q9vtSd8CY2vnZSpEp1j4o"
-    )
+    expected_id = result.model_dump()["content"][1]["id"]
+    assert (
+        chat_span.attributes[f"{GenAIAttributes.GEN_AI_COMPLETION}.0.tool_calls.0.id"]
+        == expected_id
+    )

707-759: Same here: don’t pin Anthropic tool_call id; use the response’s id.

Stabilize by asserting against result rather than a literal.

-    assert (
-        chat_span.attributes[f"{GenAIAttributes.GEN_AI_COMPLETION}.0.tool_calls.0.id"]
-        == "toolu_012guEZNJ5yH5jxHKWAkzCzh"
-    )
+    expected_id = result.model_dump()["content"][1]["id"]
+    assert (
+        chat_span.attributes[f"{GenAIAttributes.GEN_AI_COMPLETION}.0.tool_calls.0.id"]
+        == expected_id
+    )

1048-1083: Relax OpenAI tool_call id equality; assert presence instead of fixed literals.

Recorded IDs tend to be unstable across providers/cassettes. Keep names/args exact, but make IDs existence checks non-brittle.

-    assert (
-        chat_span.attributes[f"{GenAIAttributes.GEN_AI_COMPLETION}.0.tool_calls.0.id"]
-        == "call_EgULHWKqGjuB36aUeiOSpALZ"
-    )
+    id0 = chat_span.attributes[f"{GenAIAttributes.GEN_AI_COMPLETION}.0.tool_calls.0.id"]
+    assert isinstance(id0, str) and id0
@@
-    assert (
-        chat_span.attributes[f"{GenAIAttributes.GEN_AI_COMPLETION}.0.tool_calls.1.id"]
-        == "call_Xer9QGOTDMG2Bxn9AKGiVM14"
-    )
+    id1 = chat_span.attributes[f"{GenAIAttributes.GEN_AI_COMPLETION}.0.tool_calls.1.id"]
+    assert isinstance(id1, str) and id1

812-812: Remove debug prints from tests.

Stray prints add noise to CI logs; rely on assertions instead.

-    print(result.model_dump())

Also applies to: 959-959

packages/opentelemetry-instrumentation-ollama/tests/test_ollama_metrics.py (3)

72-73: Strengthen the model assertion for tighter signal validation

Instead of only checking non-null, assert the expected model value to catch mislabeling.

Apply:

-            assert dp.attributes.get(GenAIAttributes.GEN_AI_RESPONSE_MODEL) is not None
+            assert dp.attributes.get(GenAIAttributes.GEN_AI_RESPONSE_MODEL) == "gemma3:1b"

49-49: Ruff S101 in tests: either ignore or annotate

Ruff flags assert (S101) here. In tests this is fine; consider per-file ignore to keep CI green.

Options:

  • In pyproject.toml:
    [tool.ruff.lint.per-file-ignores]
    "packages//tests/.py" = ["S101"]
  • Or annotate lines with # noqa: S101.

Also applies to: 72-73


96-104: Unify on GenAIAttributes for consistency with the PR direction

These checks still use SpanAttributes.*. For consistency with the updated tests above and the PR’s move to GenAI keys, switch to GenAIAttributes (same underlying keys), then remove the now-unused SpanAttributes import.

Apply:

-            assert dp.attributes.get(SpanAttributes.LLM_SYSTEM) == "Ollama", \
+            assert dp.attributes.get(GenAIAttributes.GEN_AI_SYSTEM) == "Ollama", \
@@
-            model_name = dp.attributes.get(SpanAttributes.LLM_RESPONSE_MODEL)
+            model_name = dp.attributes.get(GenAIAttributes.GEN_AI_RESPONSE_MODEL)

And drop the import at the top:

-from opentelemetry.semconv_ai import SpanAttributes
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py (7)

11-11: Avoid throwaway assignment for side-effect-only call

Call openai_client.responses.create(...) directly instead of assigning to _. It’s a no-op variable and slightly obscures the intent.

Apply:

-    _ = openai_client.responses.create(
+    openai_client.responses.create(

22-25: Preserve minimal coverage for prompt attributes (non-brittle)

Rather than fully commenting these out, assert presence and basic shape. This keeps signal without coupling to exact text.

Apply:

-    # assert (
-    #     span.attributes["gen_ai.prompt.0.content"] == "What is the capital of France?"
-    # )
-    # assert span.attributes["gen_ai.prompt.0.role"] == "user"
+    assert "gen_ai.prompt.0.content" in span.attributes
+    assert isinstance(span.attributes["gen_ai.prompt.0.content"], str) and span.attributes["gen_ai.prompt.0.content"]
+    assert "gen_ai.prompt.0.role" in span.attributes

35-35: Same nit: remove _ = for clarity

The second call also doesn’t need assignment.

Apply:

-    _ = openai_client.responses.create(
+    openai_client.responses.create(

62-78: Validate conversation-history shape without brittle literals

Keep checks lightweight: presence and serializability where applicable.

Apply:

-    # assert (
-    #     span.attributes["gen_ai.prompt.0.content"]
-    #     == "Come up with an adjective in English. Respond with just one word."
-    # )
-    # assert span.attributes["gen_ai.prompt.0.role"] == "user"
-    # assert json.loads(span.attributes["gen_ai.prompt.1.content"]) == [
-    #     {
-    #         "type": "output_text",
-    #         "text": first_response.output[0].content[0].text,
-    #     }
-    # ]
-    # assert span.attributes["gen_ai.prompt.1.role"] == "assistant"
-    # assert (
-    #     span.attributes["gen_ai.prompt.2.content"]
-    #     == "Can you explain why you chose that word?"
-    # )
-    # assert span.attributes["gen_ai.prompt.2.role"] == "user"
+    assert "gen_ai.prompt.0.content" in span.attributes
+    assert "gen_ai.prompt.0.role" in span.attributes
+    assert "gen_ai.prompt.1.content" in span.attributes
+    # JSON-encoded assistant content should be parseable:
+    import json
+    json.loads(span.attributes["gen_ai.prompt.1.content"])
+    assert "gen_ai.prompt.1.role" in span.attributes
+    assert "gen_ai.prompt.2.content" in span.attributes
+    assert "gen_ai.prompt.2.role" in span.attributes

137-140: Don’t assert specific tool_call IDs; assert existence and type

IDs vary across recordings/providers. Check that some tool_call id(s) exist and are non-empty strings.

Apply:

-    # assert (
-    #     span.attributes["gen_ai.completion.0.tool_calls.0.id"]
-    #     == "fc_685ff89422ec819a977b2ea385bc9b6601c537ddeff5c2a2"
-    # )
+    ids = [
+        v for k, v in span.attributes.items()
+        if k.startswith("gen_ai.completion.0.tool_calls.") and k.endswith(".id")
+    ]
+    assert ids and all(isinstance(v, str) and v for v in ids)

164-168: Reasoning request attributes: consider keeping tolerant presence checks

Given the test is gated by is_reasoning_supported(), you can safely assert the request effort (and optionally summary) without coupling to provider response structure. Example (optional):

Would you like me to add guarded assertions such as:

  • If "gen_ai.request.reasoning_effort" in attributes, assert it equals the input value.
  • Only assert absence of completion reasoning when summary is None.

I can send a patch if you confirm instrumentation guarantees for these keys.


171-171: Optional: usage token sanity check

If usage metrics are emitted for reasoning models, prefer a non-brittle lower bound (e.g., total_tokens > 0) over exact names. Skip if not guaranteed by the instrumentation.

packages/opentelemetry-instrumentation-openai/tests/traces/test_assistant.py (4)

88-88: Prefer enum value for gen_ai.system over string literal.

Avoid hardcoding "openai"; use the canonical enum to reduce brittleness.

-    assert open_ai_span.attributes[GenAIAttributes.GEN_AI_SYSTEM] == "openai"
+    assert open_ai_span.attributes[GenAIAttributes.GEN_AI_SYSTEM] == GenAIAttributes.GenAiSystemValues.OPENAI.value

Also applies to: 171-171, 251-251, 321-321, 389-389, 463-463, 543-543, 623-623, 717-717, 801-801, 869-869, 971-971, 1049-1049, 1117-1117, 1205-1205


304-306: Unify attribute access: prefer constants over string keys.

Use SpanAttributes/GenAIAttributes constants instead of raw strings for request/response model/type to avoid typos.

-    assert open_ai_span.attributes["llm.request.type"] == "chat"
-    assert open_ai_span.attributes["gen_ai.request.model"] == "gpt-4-turbo-preview"
-    assert open_ai_span.attributes["gen_ai.response.model"] == "gpt-4-turbo-preview"
+    assert open_ai_span.attributes[SpanAttributes.LLM_REQUEST_TYPE] == "chat"
+    assert open_ai_span.attributes[GenAIAttributes.GEN_AI_REQUEST_MODEL] == "gpt-4-turbo-preview"
+    assert open_ai_span.attributes[GenAIAttributes.GEN_AI_RESPONSE_MODEL] == "gpt-4-turbo-preview"

Apply similarly in the other polling/event blocks noted in the line ranges.

Also applies to: 384-386, 458-460


1-1: Silence Ruff Bandit S101 for test asserts (optional).

If CI flags S101 in tests, disable it at file level.

+ # ruff: noqa: S101  # allow assert in tests
  import pytest

142-143: Typos in comments: “cassetes” → “cassettes”, “assistent” → “assistant”.

-        # Now that we have VCR cassetes recorded, we don't need to wait
+        # Now that we have VCR cassettes recorded, we don't need to wait
-    # Validate assistent system message Event
+    # Validate assistant system message Event

Apply to all similar occurrences in this file.

Also applies to: 185-188, 289-291, 403-410, 471-473

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py (6)

603-626: Broaden numeric guard for token histograms.

type(x) is int is too strict (e.g., numpy ints). Use isinstance.

-    if (
-        metric_params.token_histogram
-        and type(prompt_tokens) is int
-        and prompt_tokens >= 0
-    ):
+    if (
+        metric_params.token_histogram
+        and isinstance(prompt_tokens, int)
+        and prompt_tokens >= 0
+    ):
@@
-    if (
-        metric_params.token_histogram
-        and type(completion_tokens) is int
-        and completion_tokens >= 0
-    ):
+    if (
+        metric_params.token_histogram
+        and isinstance(completion_tokens, int)
+        and completion_tokens >= 0
+    ):

185-192: Add completion role = "assistant" for Cohere generations.

 def _set_generations_span_attributes(span, response_body):
     for i, generation in enumerate(response_body.get("generations")):
+        _set_span_attribute(
+            span, f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.role", "assistant"
+        )
         _set_span_attribute(
             span,
             f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.content",
             generation.get("text"),
         )

356-362: Add completion role = "assistant" for AI21 completions.

 def _set_span_completions_attributes(span, response_body):
     for i, completion in enumerate(response_body.get("completions")):
+        _set_span_attribute(
+            span, f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.role", "assistant"
+        )
         _set_span_attribute(
             span,
             f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.content",
             completion.get("data").get("text"),
         )

505-527: Add completion role = "assistant" for Amazon outputs.

     if "results" in response_body:
         for i, result in enumerate(response_body.get("results")):
+            _set_span_attribute(
+                span, f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.role", "assistant"
+            )
             _set_span_attribute(
                 span,
                 f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.content",
                 result.get("outputText"),
             )
     elif "outputText" in response_body:
+        _set_span_attribute(
+            span, f"{GenAIAttributes.GEN_AI_COMPLETION}.0.role", "assistant"
+        )
         _set_span_attribute(
             span,
             f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content",
             response_body.get("outputText"),
         )
     elif "output" in response_body:
         msgs = response_body.get("output").get("message", {}).get("content", [])
         for idx, msg in enumerate(msgs):
+            _set_span_attribute(
+                span, f"{GenAIAttributes.GEN_AI_COMPLETION}.{idx}.role", "assistant"
+            )
             _set_span_attribute(
                 span,
                 f"{GenAIAttributes.GEN_AI_COMPLETION}.{idx}.content",
                 msg.get("text"),
             )

561-566: Add completion role = "assistant" for imported models.

 def _set_imported_model_response_span_attributes(span, response_body):
+    _set_span_attribute(
+        span, f"{GenAIAttributes.GEN_AI_COMPLETION}.0.role", "assistant"
+    )
     _set_span_attribute(
         span,
         f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content",
         response_body.get("generation"),
     )

132-145: Consider aligning guardrail filter attributes with GenAI prompt/completion keys.

You’re mixing LLM_* and GEN_AI_*; for consistency with this PR, consider writing filter metadata under GenAI prompt/completion.

-        _set_span_attribute(
-            span,
-            f"{SpanAttributes.LLM_PROMPTS}.{PROMPT_FILTER_KEY}",
-            json.dumps(input_filters, default=str)
-        )
+        _set_span_attribute(
+            span,
+            f"{GenAIAttributes.GEN_AI_PROMPT}.{PROMPT_FILTER_KEY}",
+            json.dumps(input_filters, default=str)
+        )
@@
-        _set_span_attribute(
-            span,
-            f"{SpanAttributes.LLM_COMPLETIONS}.{CONTENT_FILTER_KEY}",
-            json.dumps(output_filters, default=str)
-        )
+        _set_span_attribute(
+            span,
+            f"{GenAIAttributes.GEN_AI_COMPLETION}.{CONTENT_FILTER_KEY}",
+            json.dumps(output_filters, default=str)
+        )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f7fb725 and a0e5e06.

📒 Files selected for processing (15)
  • packages/opentelemetry-instrumentation-alephalpha/opentelemetry/instrumentation/alephalpha/__init__.py (4 hunks)
  • packages/opentelemetry-instrumentation-alephalpha/opentelemetry/instrumentation/alephalpha/span_utils.py (3 hunks)
  • packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py (12 hunks)
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/guardrail.py (5 hunks)
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py (24 hunks)
  • packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/span_utils.py (6 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/test_structured_output.py (3 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/test_tool_calls.py (6 hunks)
  • packages/opentelemetry-instrumentation-ollama/tests/test_ollama_metrics.py (3 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_assistant.py (15 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (20 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_prompt_caching.py (6 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py (7 hunks)
  • packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (0 hunks)
  • packages/traceloop-sdk/tests/test_prompt_management.py (4 hunks)
💤 Files with no reviewable changes (1)
  • packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/init.py
🚧 Files skipped from review as they are similar to previous changes (4)
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/guardrail.py
  • packages/opentelemetry-instrumentation-alephalpha/opentelemetry/instrumentation/alephalpha/init.py
  • packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/span_utils.py
  • packages/opentelemetry-instrumentation-alephalpha/opentelemetry/instrumentation/alephalpha/span_utils.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/traceloop-sdk/tests/test_prompt_management.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py
  • packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py
  • packages/opentelemetry-instrumentation-langchain/tests/test_structured_output.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py
  • packages/opentelemetry-instrumentation-langchain/tests/test_tool_calls.py
  • packages/opentelemetry-instrumentation-ollama/tests/test_ollama_metrics.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_assistant.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_prompt_caching.py
🧬 Code graph analysis (4)
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py (2)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (1)
  • _ (1094-1124)
packages/traceloop-sdk/tests/test_prompt_management.py (1)
  • openai_client (222-223)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py (2)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (64-237)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/utils.py (1)
  • should_send_prompts (36-39)
packages/opentelemetry-instrumentation-ollama/tests/test_ollama_metrics.py (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (2)
  • SpanAttributes (64-237)
  • Meters (36-61)
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (64-237)
🪛 Ruff (0.12.2)
packages/traceloop-sdk/tests/test_prompt_management.py

241-241: Use of assert detected

(S101)

packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py

169-169: Use of assert detected

(S101)


197-197: Use of assert detected

(S101)

packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py

102-102: Use of assert detected

(S101)


105-105: Use of assert detected

(S101)


106-106: Use of assert detected

(S101)


109-109: Use of assert detected

(S101)


110-110: Use of assert detected

(S101)


111-111: Use of assert detected

(S101)


190-190: Use of assert detected

(S101)


191-191: Use of assert detected

(S101)


194-194: Use of assert detected

(S101)


195-195: Use of assert detected

(S101)


198-198: Use of assert detected

(S101)


199-199: Use of assert detected

(S101)


200-200: Use of assert detected

(S101)


377-377: Use of assert detected

(S101)


378-378: Use of assert detected

(S101)


381-381: Use of assert detected

(S101)


382-382: Use of assert detected

(S101)


385-385: Use of assert detected

(S101)


386-386: Use of assert detected

(S101)


387-387: Use of assert detected

(S101)


506-506: Use of assert detected

(S101)


507-507: Use of assert detected

(S101)


510-510: Use of assert detected

(S101)


511-511: Use of assert detected

(S101)


514-514: Use of assert detected

(S101)


515-515: Use of assert detected

(S101)


516-516: Use of assert detected

(S101)


596-596: Use of assert detected

(S101)


597-597: Use of assert detected

(S101)


600-600: Use of assert detected

(S101)


601-601: Use of assert detected

(S101)


604-604: Use of assert detected

(S101)


605-605: Use of assert detected

(S101)


606-606: Use of assert detected

(S101)


789-789: Use of assert detected

(S101)


790-790: Use of assert detected

(S101)


793-793: Use of assert detected

(S101)


794-794: Use of assert detected

(S101)


797-797: Use of assert detected

(S101)


798-798: Use of assert detected

(S101)


799-799: Use of assert detected

(S101)


921-921: Use of assert detected

(S101)


922-922: Use of assert detected

(S101)


925-925: Use of assert detected

(S101)


926-926: Use of assert detected

(S101)


929-929: Use of assert detected

(S101)


930-930: Use of assert detected

(S101)


931-931: Use of assert detected

(S101)


1014-1014: Use of assert detected

(S101)


1015-1015: Use of assert detected

(S101)


1018-1018: Use of assert detected

(S101)


1019-1019: Use of assert detected

(S101)


1022-1022: Use of assert detected

(S101)


1023-1023: Use of assert detected

(S101)


1024-1024: Use of assert detected

(S101)


1212-1212: Use of assert detected

(S101)


1213-1213: Use of assert detected

(S101)


1216-1216: Use of assert detected

(S101)


1217-1217: Use of assert detected

(S101)


1220-1220: Use of assert detected

(S101)


1221-1221: Use of assert detected

(S101)


1222-1222: Use of assert detected

(S101)


1345-1345: Use of assert detected

(S101)


1346-1346: Use of assert detected

(S101)


1349-1349: Use of assert detected

(S101)


1350-1350: Use of assert detected

(S101)


1353-1353: Use of assert detected

(S101)


1354-1354: Use of assert detected

(S101)


1355-1355: Use of assert detected

(S101)


1439-1439: Use of assert detected

(S101)


1440-1440: Use of assert detected

(S101)


1443-1443: Use of assert detected

(S101)


1444-1444: Use of assert detected

(S101)


1447-1447: Use of assert detected

(S101)


1448-1448: Use of assert detected

(S101)


1449-1449: Use of assert detected

(S101)


1648-1648: Use of assert detected

(S101)


1649-1649: Use of assert detected

(S101)


1652-1652: Use of assert detected

(S101)


1653-1653: Use of assert detected

(S101)


1656-1656: Use of assert detected

(S101)


1657-1657: Use of assert detected

(S101)


1658-1658: Use of assert detected

(S101)

packages/opentelemetry-instrumentation-langchain/tests/test_structured_output.py

38-38: Use of assert detected

(S101)


92-92: Use of assert detected

(S101)

packages/opentelemetry-instrumentation-langchain/tests/test_tool_calls.py

217-217: Use of assert detected

(S101)


219-219: Use of assert detected

(S101)


223-223: Use of assert detected

(S101)


225-225: Use of assert detected

(S101)


229-229: Use of assert detected

(S101)


237-237: Use of assert detected

(S101)


241-241: Use of assert detected

(S101)


243-243: Use of assert detected

(S101)


247-247: Use of assert detected

(S101)


248-248: Use of assert detected

(S101)


250-250: Use of assert detected

(S101)


254-254: Use of assert detected

(S101)


256-256: Use of assert detected

(S101)


500-500: Use of assert detected

(S101)


502-502: Use of assert detected

(S101)


506-506: Use of assert detected

(S101)


510-510: Use of assert detected

(S101)


514-514: Use of assert detected

(S101)


518-518: Use of assert detected

(S101)


711-711: Use of assert detected

(S101)


712-712: Use of assert detected

(S101)


714-714: Use of assert detected

(S101)


718-718: Use of assert detected

(S101)


726-726: Use of assert detected

(S101)


731-731: Use of assert detected

(S101)


732-732: Use of assert detected

(S101)


736-736: Use of assert detected

(S101)


738-738: Use of assert detected

(S101)


742-742: Use of assert detected

(S101)


746-746: Use of assert detected

(S101)


750-750: Use of assert detected

(S101)


754-754: Use of assert detected

(S101)


903-903: Use of assert detected

(S101)


904-904: Use of assert detected

(S101)


905-905: Use of assert detected

(S101)


1051-1051: Use of assert detected

(S101)


1053-1053: Use of assert detected

(S101)


1057-1057: Use of assert detected

(S101)


1061-1061: Use of assert detected

(S101)


1065-1065: Use of assert detected

(S101)


1070-1070: Use of assert detected

(S101)


1074-1074: Use of assert detected

(S101)


1078-1078: Use of assert detected

(S101)

packages/opentelemetry-instrumentation-ollama/tests/test_ollama_metrics.py

49-49: Use of assert detected

(S101)


72-72: Use of assert detected

(S101)


73-73: Use of assert detected

(S101)

packages/opentelemetry-instrumentation-openai/tests/traces/test_assistant.py

67-67: Use of assert detected

(S101)


71-71: Use of assert detected

(S101)


75-75: Use of assert detected

(S101)


76-76: Use of assert detected

(S101)


80-80: Use of assert detected

(S101)


81-81: Use of assert detected

(S101)


82-82: Use of assert detected

(S101)


86-86: Use of assert detected

(S101)


87-87: Use of assert detected

(S101)


88-88: Use of assert detected

(S101)


94-94: Use of assert detected

(S101)


100-100: Use of assert detected

(S101)


164-164: Use of assert detected

(S101)


169-169: Use of assert detected

(S101)


170-170: Use of assert detected

(S101)


171-171: Use of assert detected

(S101)


245-245: Use of assert detected

(S101)


249-249: Use of assert detected

(S101)


250-250: Use of assert detected

(S101)


251-251: Use of assert detected

(S101)


319-319: Use of assert detected

(S101)


320-320: Use of assert detected

(S101)


321-321: Use of assert detected

(S101)


327-327: Use of assert detected

(S101)


333-333: Use of assert detected

(S101)


387-387: Use of assert detected

(S101)


388-388: Use of assert detected

(S101)


389-389: Use of assert detected

(S101)


461-461: Use of assert detected

(S101)


462-462: Use of assert detected

(S101)


463-463: Use of assert detected

(S101)


525-525: Use of assert detected

(S101)


529-529: Use of assert detected

(S101)


533-533: Use of assert detected

(S101)


534-534: Use of assert detected

(S101)


538-538: Use of assert detected

(S101)


539-539: Use of assert detected

(S101)


540-540: Use of assert detected

(S101)


541-541: Use of assert detected

(S101)


542-542: Use of assert detected

(S101)


543-543: Use of assert detected

(S101)


549-549: Use of assert detected

(S101)


555-555: Use of assert detected

(S101)


615-615: Use of assert detected

(S101)


620-620: Use of assert detected

(S101)


621-621: Use of assert detected

(S101)


622-622: Use of assert detected

(S101)


710-710: Use of assert detected

(S101)


715-715: Use of assert detected

(S101)


716-716: Use of assert detected

(S101)


717-717: Use of assert detected

(S101)


783-783: Use of assert detected

(S101)


787-787: Use of assert detected

(S101)


791-791: Use of assert detected

(S101)


792-792: Use of assert detected

(S101)


796-796: Use of assert detected

(S101)


798-798: Use of assert detected

(S101)


799-799: Use of assert detected

(S101)


800-800: Use of assert detected

(S101)


803-803: Use of assert detected

(S101)


807-807: Use of assert detected

(S101)


861-861: Use of assert detected

(S101)


866-866: Use of assert detected

(S101)


867-867: Use of assert detected

(S101)


868-868: Use of assert detected

(S101)


964-964: Use of assert detected

(S101)


969-969: Use of assert detected

(S101)


970-970: Use of assert detected

(S101)


971-971: Use of assert detected

(S101)


1033-1033: Use of assert detected

(S101)


1037-1037: Use of assert detected

(S101)


1041-1041: Use of assert detected

(S101)


1042-1042: Use of assert detected

(S101)


1046-1046: Use of assert detected

(S101)


1047-1047: Use of assert detected

(S101)


1048-1048: Use of assert detected

(S101)


1049-1049: Use of assert detected

(S101)


1052-1052: Use of assert detected

(S101)


1056-1056: Use of assert detected

(S101)


1110-1110: Use of assert detected

(S101)


1115-1115: Use of assert detected

(S101)


1116-1116: Use of assert detected

(S101)


1117-1117: Use of assert detected

(S101)


1198-1198: Use of assert detected

(S101)


1203-1203: Use of assert detected

(S101)


1204-1204: Use of assert detected

(S101)


1205-1205: Use of assert detected

(S101)

packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py

41-41: Use of assert detected

(S101)


199-199: Use of assert detected

(S101)


200-200: Use of assert detected

(S101)


204-204: Use of assert detected

(S101)


211-211: Use of assert detected

(S101)


215-215: Use of assert detected

(S101)


264-264: Use of assert detected

(S101)


343-343: Use of assert detected

(S101)


417-417: Use of assert detected

(S101)


418-418: Use of assert detected

(S101)


422-422: Use of assert detected

(S101)


429-429: Use of assert detected

(S101)


433-433: Use of assert detected

(S101)


490-490: Use of assert detected

(S101)


576-576: Use of assert detected

(S101)


630-630: Use of assert detected

(S101)


806-806: Use of assert detected

(S101)


1378-1378: Use of assert detected

(S101)


1382-1382: Use of assert detected

(S101)


1386-1386: Use of assert detected

(S101)


1391-1391: Use of assert detected

(S101)


1392-1392: Use of assert detected

(S101)


1396-1396: Use of assert detected

(S101)


1400-1400: Use of assert detected

(S101)


1404-1404: Use of assert detected

(S101)


1405-1405: Use of assert detected

(S101)


1409-1409: Use of assert detected

(S101)


1445-1445: Use of assert detected

(S101)


1449-1449: Use of assert detected

(S101)


1453-1453: Use of assert detected

(S101)


1458-1458: Use of assert detected

(S101)


1459-1459: Use of assert detected

(S101)


1463-1463: Use of assert detected

(S101)


1467-1467: Use of assert detected

(S101)


1471-1471: Use of assert detected

(S101)


1472-1472: Use of assert detected

(S101)


1476-1476: Use of assert detected

(S101)


1620-1620: Use of assert detected

(S101)

packages/opentelemetry-instrumentation-openai/tests/traces/test_prompt_caching.py

70-70: Use of assert detected

(S101)


71-71: Use of assert detected

(S101)


72-72: Use of assert detected

(S101)


74-74: Use of assert detected

(S101)


75-75: Use of assert detected

(S101)


131-131: Use of assert detected

(S101)


132-132: Use of assert detected

(S101)


133-133: Use of assert detected

(S101)


134-134: Use of assert detected

(S101)


221-221: Use of assert detected

(S101)


222-222: Use of assert detected

(S101)


223-223: Use of assert detected

(S101)


224-224: Use of assert detected

(S101)


315-315: Use of assert detected

(S101)


316-316: Use of assert detected

(S101)


318-318: Use of assert detected

(S101)


319-319: Use of assert detected

(S101)


376-376: Use of assert detected

(S101)


377-377: Use of assert detected

(S101)


379-379: Use of assert detected

(S101)


380-380: Use of assert detected

(S101)


468-468: Use of assert detected

(S101)


469-469: Use of assert detected

(S101)


470-470: Use of assert detected

(S101)


471-471: Use of assert detected

(S101)

@nirga
Copy link
Member

nirga commented Sep 7, 2025

Hey @martimfasantos - all tests currently pass in main, and we keep them up to date. I can't merge this if they don't pass.

@nirga
Copy link
Member

nirga commented Sep 7, 2025

Looking at some of the test failures @martimfasantos - seems like you're using non existing span attributes

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (7)
packages/opentelemetry-instrumentation-mistralai/opentelemetry/instrumentation/mistralai/__init__.py (1)

183-197: Don’t emit new token metrics under legacy keys

We’re still writing total_tokens to SpanAttributes.LLM_USAGE_TOTAL_TOKENS, while the other token metrics already moved to the GenAI namespace. This mixes the legacy and GenAI specs on the same span and breaks the goal of this PR. Please switch total_tokens to the GenAI attribute (GenAIAttributes.GEN_AI_USAGE_TOTAL_TOKENS) so all usage numbers land under the new semantic convention.

Based on learnings

-    _set_span_attribute(
-        span,
-        SpanAttributes.LLM_USAGE_TOTAL_TOKENS,
-        total_tokens,
-    )
+    _set_span_attribute(
+        span,
+        GenAIAttributes.GEN_AI_USAGE_TOTAL_TOKENS,
+        total_tokens,
+    )
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py (2)

243-251: Critical: Undefined name GEN_AI_RESPONSE_ID - missing import

Line 244 references GEN_AI_RESPONSE_ID directly instead of using GenAIAttributes.GEN_AI_RESPONSE_ID. This will cause a NameError at runtime.

Fix the attribute reference to use the imported GenAIAttributes:

-        set_span_attribute(self._span, GEN_AI_RESPONSE_ID, self._complete_response.get("id"))
+        set_span_attribute(self._span, GenAIAttributes.GEN_AI_RESPONSE_ID, self._complete_response.get("id"))

398-399: Critical: Another undefined name GEN_AI_RESPONSE_ID

Line 399 has the same issue as line 244 - using GEN_AI_RESPONSE_ID directly instead of GenAIAttributes.GEN_AI_RESPONSE_ID.

Fix the attribute reference:

-        set_span_attribute(self._span, GEN_AI_RESPONSE_ID, self._complete_response.get("id"))
+        set_span_attribute(self._span, GenAIAttributes.GEN_AI_RESPONSE_ID, self._complete_response.get("id"))
packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py (1)

1661-1666: Legacy token attributes still referenced in newer tests

Lines 1661-1666 and other locations still reference "gen_ai.usage.prompt_tokens" and "gen_ai.usage.completion_tokens" as string literals. These should be updated to use the new input/output naming.

Update all occurrences of the legacy attribute names:

-    assert anthropic_span.attributes["gen_ai.usage.prompt_tokens"] == 568
+    assert anthropic_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 568
     assert (
-        anthropic_span.attributes["gen_ai.usage.completion_tokens"]
-        + anthropic_span.attributes["gen_ai.usage.prompt_tokens"]
+        anthropic_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS]
+        + anthropic_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS]
         == anthropic_span.attributes["llm.usage.total_tokens"]
     )

Apply similar changes to:

  • Lines 1821-1826
  • Lines 1924-1929
  • Lines 2007-2012
  • Lines 2134-2139
  • Lines 2247-2252

Also applies to: 1821-1826, 1924-1929, 2007-2012, 2134-2139, 2247-2252

packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py (2)

98-111: Critical issue: Undefined variable references.

Lines 98 and 109 reference GEN_AI_PROMPT directly instead of using the proper GenAIAttributes.GEN_AI_PROMPT reference. This will cause NameError at runtime.

Fix the undefined variable references:

             _set_span_attribute(
                 span,
-                f"{GEN_AI_PROMPT}.0.role",
+                f"{GenAIAttributes.GEN_AI_PROMPT}.0.role",
                 "user",
             )
             _set_span_attribute(
                 span,
-                f"{GEN_AI_PROMPT}.0.content",
+                f"{GenAIAttributes.GEN_AI_PROMPT}.0.content",
                 dump_object(inputs),
             )

240-241: Critical issue: Undefined variable references in response functions.

Several functions reference GEN_AI_COMPLETION, GEN_AI_RESPONSE_FINISH_REASONS, etc. directly without the GenAIAttributes prefix, which will cause NameError at runtime.

Fix all undefined variable references:

-    prefix = f"{GEN_AI_COMPLETION}.{index}"
+    prefix = f"{GenAIAttributes.GEN_AI_COMPLETION}.{index}"
-        _set_span_attribute(span, GEN_AI_RESPONSE_FINISH_REASONS, [finish_reason])
+        _set_span_attribute(span, GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS, [finish_reason])

And similar fixes for lines 283, 290, 296, and 298.

Also applies to: 283-283, 290-291, 296-298

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)

319-352: Align GenAI attributes to spec-defined names
Replace the flattened GEN_AI_COMPLETION.* keys—e.g. gen_ai.completion.{i}.content, .role, .finish_reason, and .tool_calls.{j}.{…}—with the structured semantic-convention fields defined in the GenAI spec (for example: gen_ai.response.message.content, gen_ai.response.message.role, gen_ai.response.finish_reasons, gen_ai.tool.call.name/arguments/id), or retain the existing LLM_* fallbacks if necessary.
Location: packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py lines 319–352 (and 341–347)

🧹 Nitpick comments (6)
packages/opentelemetry-instrumentation-mistralai/tests/test_chat.py (3)

509-509: Inconsistent attribute reference - use GenAIAttributes

Line 509 uses a string literal "gen_ai.request.model" instead of the GenAIAttributes.GEN_AI_REQUEST_MODEL constant, which is inconsistent with the pattern used elsewhere in the file.

Apply this diff to maintain consistency:

-    assert mistral_span.attributes.get("gen_ai.request.model") == "mistral-tiny"
+    assert mistral_span.attributes.get(f"{GenAIAttributes.GEN_AI_REQUEST_MODEL}") == "mistral-tiny"

557-557: Inconsistent attribute reference - use GenAIAttributes

Line 557 uses a string literal "gen_ai.request.model" instead of the GenAIAttributes.GEN_AI_REQUEST_MODEL constant, which is inconsistent with the pattern used elsewhere in the file.

Apply this diff to maintain consistency:

-    assert mistral_span.attributes.get("gen_ai.request.model") == "mistral-tiny"
+    assert mistral_span.attributes.get(f"{GenAIAttributes.GEN_AI_REQUEST_MODEL}") == "mistral-tiny"

616-616: Inconsistent attribute reference - use GenAIAttributes

Line 616 uses a string literal "gen_ai.request.model" instead of the GenAIAttributes.GEN_AI_REQUEST_MODEL constant, which is inconsistent with the pattern used elsewhere in the file.

Apply this diff to maintain consistency:

-    assert mistral_span.attributes.get("gen_ai.request.model") == "mistral-tiny"
+    assert mistral_span.attributes.get(f"{GenAIAttributes.GEN_AI_REQUEST_MODEL}") == "mistral-tiny"
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (3)

141-144: Avoid duplicating agent identity keys on handoff spans.

You set both gen_ai.handoff.from_agent and GEN_AI_AGENT_NAME to the same value. This can confuse queries/dashboards. Recommend keeping only the explicit handoff field.

Apply this diff:

-                handoff_attributes[GenAIAttributes.GEN_AI_AGENT_NAME] = from_agent

245-257: Normalize prompt content serialization for dict/list inputs.

In the dict branch you only json-dump when content is a dict. Use the same “not string → json” rule as above to handle lists/objects consistently and avoid invalid attribute types.

Apply this diff:

-                                content = message['content']
-                                if isinstance(content, dict):
-                                    content = json.dumps(content)
+                                content = message['content']
+                                if not isinstance(content, str):
+                                    content = json.dumps(content)

399-414: DRY up model settings mapping (duplicated in both branches).

Temperature/max_tokens/top_p/model are set in two places. Extract a small helper to reduce duplication and drift.

Example (outside this diff):

def _apply_model_settings(otel_span, response):
    if getattr(response, "temperature", None) is not None:
        otel_span.set_attribute(GenAIAttributes.GEN_AI_REQUEST_TEMPERATURE, response.temperature)
    if getattr(response, "max_output_tokens", None) is not None:
        otel_span.set_attribute(GenAIAttributes.GEN_AI_REQUEST_MAX_TOKENS, response.max_output_tokens)
    if getattr(response, "top_p", None) is not None:
        otel_span.set_attribute(GenAIAttributes.GEN_AI_REQUEST_TOP_P, response.top_p)
    if getattr(response, "model", None):
        otel_span.set_attribute("gen_ai.request.model", response.model)

Then call _apply_model_settings(otel_span, response) in both locations.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a0e5e06 and 857150b.

📒 Files selected for processing (12)
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (9 hunks)
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py (6 hunks)
  • packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py (28 hunks)
  • packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/__init__.py (2 hunks)
  • packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py (7 hunks)
  • packages/opentelemetry-instrumentation-cohere/tests/test_chat.py (3 hunks)
  • packages/opentelemetry-instrumentation-cohere/tests/test_rerank.py (3 hunks)
  • packages/opentelemetry-instrumentation-mistralai/opentelemetry/instrumentation/mistralai/__init__.py (8 hunks)
  • packages/opentelemetry-instrumentation-mistralai/tests/test_chat.py (12 hunks)
  • packages/opentelemetry-instrumentation-mistralai/tests/test_embeddings.py (6 hunks)
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (12 hunks)
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/opentelemetry-instrumentation-mistralai/tests/test_embeddings.py
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py
  • packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
  • packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
  • packages/opentelemetry-instrumentation-mistralai/opentelemetry/instrumentation/mistralai/__init__.py
  • packages/opentelemetry-instrumentation-cohere/tests/test_rerank.py
  • packages/opentelemetry-instrumentation-cohere/tests/test_chat.py
  • packages/opentelemetry-instrumentation-mistralai/tests/test_chat.py
  • packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/__init__.py
🧠 Learnings (1)
📚 Learning: 2025-08-17T15:06:48.109Z
Learnt from: CR
PR: traceloop/openllmetry#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Semantic conventions must follow the OpenTelemetry GenAI specification

Applied to files:

  • packages/opentelemetry-instrumentation-mistralai/opentelemetry/instrumentation/mistralai/__init__.py
🧬 Code graph analysis (8)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (2)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py (1)
  • set_span_attribute (21-25)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (64-237)
packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py (2)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • LLMRequestTypeValues (269-274)
packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/utils.py (2)
  • to_dict (64-77)
  • dump_object (52-61)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py (1)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py (3)
  • set_span_attribute (21-25)
  • error_metrics_attributes (216-220)
  • shared_metrics_attributes (179-212)
packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (64-237)
packages/opentelemetry-instrumentation-mistralai/opentelemetry/instrumentation/mistralai/__init__.py (2)
packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py (1)
  • _set_span_attribute (18-22)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • LLMRequestTypeValues (269-274)
packages/opentelemetry-instrumentation-cohere/tests/test_rerank.py (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (64-237)
packages/opentelemetry-instrumentation-cohere/tests/test_chat.py (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (64-237)
packages/opentelemetry-instrumentation-mistralai/tests/test_chat.py (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (64-237)
🪛 Ruff (0.13.1)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py

228-228: Undefined name exception_counter

(F821)


229-229: Undefined name exception_counter

(F821)


230-230: Use raise without specifying exception name

Remove exception name

(TRY201)


231-231: Undefined name complete_response

(F821)


233-233: Undefined name complete_response

(F821)


234-234: Undefined name span

(F821)


234-234: Undefined name complete_response

(F821)


235-235: Undefined name duration_histogram

(F821)


236-236: Undefined name start_time

(F821)


237-237: Undefined name duration_histogram

(F821)


294-294: Undefined name span

(F821)


294-294: Undefined name complete_response

(F821)

🪛 Flake8 (7.2.0)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py

[error] 228-228: undefined name 'exception_counter'

(F821)


[error] 229-229: undefined name 'exception_counter'

(F821)


[error] 231-231: undefined name 'complete_response'

(F821)


[error] 233-233: undefined name 'complete_response'

(F821)


[error] 234-234: undefined name 'span'

(F821)


[error] 234-234: undefined name 'complete_response'

(F821)


[error] 235-235: undefined name 'duration_histogram'

(F821)


[error] 236-236: undefined name 'start_time'

(F821)


[error] 237-237: undefined name 'duration_histogram'

(F821)


[error] 294-294: undefined name 'span'

(F821)


[error] 294-294: undefined name 'complete_response'

(F821)

🔇 Additional comments (26)
packages/opentelemetry-instrumentation-mistralai/tests/test_chat.py (4)

7-8: LGTM! Consistent use of GenAIAttributes import

The import statement correctly aliases gen_ai_attributes as GenAIAttributes, following the pattern used across the repository for the migration to OpenTelemetry GenAI semantic conventions.


27-27: System identifier inconsistency persists - needs resolution

This test still asserts span attribute GenAIAttributes.GEN_AI_SYSTEM == "MistralAI" while log events expect "mistral_ai" (Line 647). This creates brittle test assertions due to casing inconsistency.


647-647: System identifier inconsistency persists in log assertions

The log assertions on Line 647 expect GenAIAttributes.GEN_AI_SYSTEM == "mistral_ai" while span assertions throughout the file expect "MistralAI". This creates the same system identifier inconsistency issue flagged in previous reviews.


35-47: All token attributes are correctly set by instrumentation
The _set_model_response_attributes function populates SpanAttributes.LLM_USAGE_TOTAL_TOKENS, GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS, and GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, so the test’s assertion that total equals input + output tokens is valid.

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py (2)

91-107: LGTM! Token attribute updates are correct

The updates from LLM_USAGE_PROMPT_TOKENS/LLM_USAGE_COMPLETION_TOKENS to GEN_AI_USAGE_INPUT_TOKENS/GEN_AI_USAGE_OUTPUT_TOKENS align with the PR's objective to adopt OpenTelemetry semantic conventions. The cache token attributes on lines 101-107 are also correctly updated.


114-124: LGTM! Token type attribute migration is correct

The token histogram recording correctly uses GenAIAttributes.GEN_AI_TOKEN_TYPE instead of the legacy LLM_TOKEN_TYPE for both input (line 114) and output (line 123) tokens.

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (6)

39-41: LGTM! GenAIAttributes import added correctly

The import statement properly brings in the GenAIAttributes alias from the incubating semantic conventions, which is necessary for the attribute migrations throughout this file.


229-230: LGTM! Token type attributes updated in async token usage

The token histogram recording correctly uses GenAIAttributes.GEN_AI_TOKEN_TYPE for both input and output tokens in the async _aset_token_usage function.

Also applies to: 256-257


279-293: LGTM! Span attributes updated to GenAI namespace

The span attribute updates correctly migrate from legacy LLM attributes to GenAI equivalents:

  • GEN_AI_USAGE_INPUT_TOKENS (line 279)
  • GEN_AI_USAGE_OUTPUT_TOKENS (line 281)
  • GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS (line 286)
  • GEN_AI_USAGE_CACHE_CREATION_INPUT_TOKENS (line 290)

343-344: LGTM! Sync token usage function updated consistently

The synchronous _set_token_usage function mirrors the async version's changes, correctly using GenAIAttributes.GEN_AI_TOKEN_TYPE for token histogram recording.

Also applies to: 370-371


393-406: LGTM! Sync span attributes updated consistently

The synchronous function's span attributes are correctly updated to match the async version, maintaining consistency across the codebase.


538-538: LGTM! System attribute updated to GenAI namespace

Both _wrap (line 538) and _awrap (line 662) correctly update the system attribute from the legacy name to GenAIAttributes.GEN_AI_SYSTEM.

Also applies to: 662-662

packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py (2)

96-113: LGTM! Test assertions correctly updated for prompt and completion attributes

The test correctly validates the migrated attributes:

  • GenAIAttributes.GEN_AI_PROMPT replaces legacy prompt attributes
  • GenAIAttributes.GEN_AI_COMPLETION replaces legacy completion attributes
  • GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS and GEN_AI_USAGE_OUTPUT_TOKENS replace the legacy token attributes

155-160: LGTM! Token attributes consistently updated across test variants

The test variants (with events, with/without content, multimodal) all correctly use the GenAIAttributes constants for token usage assertions.

Also applies to: 220-225, 291-296

packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/__init__.py (1)

32-34: LGTM! Consistent GenAI attributes import.

The import follows the expected pattern used across the codebase for accessing GenAI semantic conventions.

packages/opentelemetry-instrumentation-cohere/tests/test_rerank.py (1)

41-41: LGTM! Test updates correctly validate GenAI attributes.

The test assertions have been properly updated to validate the new GenAI attribute keys instead of the legacy LLM attributes. The migration is consistent and follows the expected pattern.

Also applies to: 44-45, 49-50, 55-56, 62-63, 66-67, 72-73, 77-78, 121-121, 124-125, 185-185, 188-189

packages/opentelemetry-instrumentation-cohere/tests/test_chat.py (2)

60-60: LGTM! Selective migration to GenAI attributes in legacy tests.

The test updates correctly migrate specific attributes (GEN_AI_SYSTEM, GEN_AI_REQUEST_MODEL, GEN_AI_PROMPT, GEN_AI_COMPLETION, and token usage attributes) to GenAI conventions while preserving legacy attributes where appropriate for backward compatibility testing.

Also applies to: 62-62, 64-65, 68-69, 71-71, 75-77, 527-527, 529-529, 556-556, 558-558


75-77: No changes needed: token calculation remains consistent. Tests in span_utils.py now set LLM_USAGE_TOTAL_TOKENS to input_tokens + output_tokens while also recording GEN_AI_USAGE_INPUT_TOKENS and GEN_AI_USAGE_OUTPUT_TOKENS, matching the previous use of total_tokens from to_dict(token_count). All existing assertions verify this sum.

packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py (5)

8-10: LGTM! Proper import of GenAI attributes.

The import follows the established pattern for accessing GenAI semantic conventions.


32-35: LGTM! Consistent migration of prompt attributes.

All prompt-related attributes have been properly migrated from LLM_PROMPTS to GEN_AI_PROMPT with consistent indexing and field naming.

Also applies to: 40-43, 45-48, 53-54, 79-83, 87-88, 92-94


134-146: LGTM! Request attributes properly migrated.

All request-related attributes have been consistently updated to use GenAI attribute keys while preserving the same functionality.


164-168: LGTM! Response ID attributes properly migrated.

The response ID attribute has been correctly updated to use the GenAI convention.


203-204: LGTM! Token usage attributes properly migrated.

The migration from LLM_USAGE_COMPLETION_TOKENS and LLM_USAGE_PROMPT_TOKENS to GEN_AI_USAGE_OUTPUT_TOKENS and GEN_AI_USAGE_INPUT_TOKENS is correct and follows the semantic conventions.

Also applies to: 208-210

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (3)

10-12: Stabilize GenAIAttributes import (prefer stable alias; keep incubating as fallback).

The private _incubating path is brittle across versions. Prefer the project-wide stable alias if available, and fall back to the incubating layout to avoid breakage in environments with different semconv bundles.

Apply this diff:

-from opentelemetry.semconv._incubating.attributes import (
-    gen_ai_attributes as GenAIAttributes
-)
+try:
+    # Preferred stable alias (when available)
+    from opentelemetry.semconv_ai import GenAIAttributes  # type: ignore[attr-defined]
+except Exception:
+    # Fallback to incubating layout
+    from opentelemetry.semconv._incubating.attributes import gen_ai_attributes as GenAIAttributes

357-366: Token usage mapping looks correct with backward-compat fallback.

input_tokens/prompt_tokens and output_tokens/completion_tokens are mapped to GEN_AI_USAGE_* consistently.

Consider also mapping total_tokens to a GenAI constant if one exists in your version for consistency with the new naming (fallback to LLM_USAGE_TOTAL_TOKENS otherwise).


491-496: Model settings propagation to agent span looks good.

Good reuse of the stored model settings with consistent GEN_AI_* keys.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py (1)

180-188: Prompt/completion token rename is incomplete

The Cohere v4 and v2 paths still emit SpanAttributes.LLM_USAGE_{PROMPT,COMPLETION}_TOKENS, so we keep producing the legacy attribute names alongside the new GenAI ones. That undercuts the PR goal of aligning everything with gen_ai.usage.{input,output}_tokens and leaves downstream consumers with a mixed schema. Please switch these setters to the GenAI tokens just like you did for the billed-units block above.

Here’s the adjustment:

         _set_span_attribute(
             span,
             SpanAttributes.LLM_USAGE_TOTAL_TOKENS,
             token_count_dict.get("total_tokens"),
         )
         _set_span_attribute(
             span,
-            SpanAttributes.LLM_USAGE_COMPLETION_TOKENS,
+            GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS,
             token_count_dict.get("response_tokens"),
         )
         _set_span_attribute(
             span,
-            SpanAttributes.LLM_USAGE_PROMPT_TOKENS,
+            GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS,
             token_count_dict.get("prompt_tokens"),
         )
@@
         _set_span_attribute(
             span,
             SpanAttributes.LLM_USAGE_TOTAL_TOKENS,
             input_tokens + output_tokens,
         )
         _set_span_attribute(
             span,
-            SpanAttributes.LLM_USAGE_COMPLETION_TOKENS,
+            GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS,
             output_tokens,
         )
         _set_span_attribute(
             span,
-            SpanAttributes.LLM_USAGE_PROMPT_TOKENS,
+            GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS,
             input_tokens,
         )

Also applies to: 229-237

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py (2)

223-225: Bug: calling undefined method _complete_instrumentation (sync path).

AnthropicStream defines _handle_completion, not _complete_instrumentation. This raises AttributeError on normal completion.

Apply:

-            if not self._instrumentation_completed:
-                self._complete_instrumentation()
+            if not self._instrumentation_completed:
+                self._handle_completion()

384-387: Guard span.end() when span might be None (async error path).

Calling self._span.end() unconditionally will throw if _span is None.

Apply:

-                if self._span and self._span.is_recording():
-                    self._span.set_status(Status(StatusCode.ERROR, str(e)))
-                self._span.end()
+                if self._span:
+                    if self._span.is_recording():
+                        self._span.set_status(Status(StatusCode.ERROR, str(e)))
+                    self._span.end()
🧹 Nitpick comments (3)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py (3)

399-401: Typo in comment.

“abuild_from_streaming_response” -> “build_from_streaming_response”.

Apply:

-        # This mirrors the logic from abuild_from_streaming_response
+        # This mirrors the logic from build_from_streaming_response

162-165: Avoid mutable default args for kwargs.

Use None and normalize inside to prevent accidental cross‑instance mutation.

Apply:

-        kwargs: dict = {},
+        kwargs: Optional[dict] = None,
@@
-        self._kwargs = kwargs
+        self._kwargs = kwargs or {}

Repeat for the async constructor.

Also applies to: 312-314


22-24: Guard unstable incubating import or pin semconv version
Wrap the opentelemetry.semconv._incubating.attributes import in a try/except (surfacing a clear ImportError) or pin your opentelemetry-semconv dependency to a compatible release to prevent future breakage.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 857150b and 8acb4c7.

📒 Files selected for processing (3)
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py (6 hunks)
  • packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py (9 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_prompt_caching.py (6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_prompt_caching.py
  • packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py
🧬 Code graph analysis (2)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py (2)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py (4)
  • set_span_attribute (21-25)
  • error_metrics_attributes (216-220)
  • shared_metrics_attributes (179-212)
  • count_prompt_tokens_from_request (224-242)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (1)
  • _set_token_usage (296-406)
packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py (2)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • LLMRequestTypeValues (269-274)
packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/utils.py (2)
  • to_dict (64-77)
  • dump_object (52-61)
🪛 Ruff (0.13.1)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py

230-230: Use raise without specifying exception name

Remove exception name

(TRY201)


286-286: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (11)
packages/opentelemetry-instrumentation-openai/tests/traces/test_prompt_caching.py (8)

70-71: Token attribute rename to input/output — LGTM

Assertions now target gen_ai.usage.input_tokens/output_tokens as per OTel GenAI semconv. Looks good.

Also applies to: 74-75


131-134: Events+content: usage token keys updated — LGTM

Values align with the recorded cassette; key names match the new convention.


221-224: No-content events: usage token keys updated — LGTM


315-319: Async legacy path: usage token keys updated — LGTM


376-380: Async + content: usage token keys updated — LGTM


468-471: Async + no-content: usage token keys updated — LGTM


72-72: Remove or guard cache_read_input_tokens assertion

This attribute is not part of the core rename and may not be emitted; direct indexing will KeyError when absent. Either drop it or guard with a default.

Apply one of these:

-    assert cache_creation_span.attributes["gen_ai.usage.cache_read_input_tokens"] == 0
+    assert cache_creation_span.attributes.get("gen_ai.usage.cache_read_input_tokens", 0) == 0

or simply remove the assertion line.

Run to confirm the attribute is produced anywhere:

#!/bin/bash
# List occurrences of cache read/creation usage attributes in code (not tests)
rg -nP --type=py -g '!**/tests/**' 'gen_ai\.usage\.(cache_(read|creation)_input_tokens)'

70-71: Optional: use semconv constants to avoid key typos

To reduce churn and prevent string-key typos, consider asserting via semconv constants (e.g., GEN_AI_USAGE_INPUT_TOKENS/GEN_AI_USAGE_OUTPUT_TOKENS) instead of string literals.

Please verify the exact constant names available in your semconv version:

Also applies to: 74-75, 131-134, 221-224, 315-319, 376-380, 468-471

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py (3)

109-126: LGTM on token-type histogram attributes.

Using GenAIAttributes.GEN_AI_TOKEN_TYPE with "input"/"output" looks correct and consistent.


127-137: Use isinstance and retain the finish-reason constant

  • Replace the type check with:
-    if type(complete_response.get("events")) is list and choice_counter:
+    if isinstance(complete_response.get("events"), list) and choice_counter:
  • Keep using SpanAttributes.LLM_RESPONSE_FINISH_REASON: event.get("finish_reason") here—LLM_RESPONSE_STOP_REASON applies only to non-streaming stop reasons.

Likely an incorrect or invalid review comment.


227-232: Exception handling leaks span and loses traceback.

  • Use raise instead of raise e (TRY201).
  • Mirror async path: set span status and end the span to avoid open spans on errors.

Apply:

-        except Exception as e:
-            attributes = error_metrics_attributes(e)
-            if self._exception_counter:
-                self._exception_counter.add(1, attributes=attributes)
-            raise e
+        except Exception as e:
+            # Handle errors during streaming
+            if not self._instrumentation_completed:
+                attributes = error_metrics_attributes(e)
+                if self._exception_counter:
+                    self._exception_counter.add(1, attributes=attributes)
+                if self._span and self._span.is_recording():
+                    self._span.set_status(Status(StatusCode.ERROR, str(e)))
+                    self._span.end()
+                self._instrumentation_completed = True
+            raise

@martimfasantos
Copy link
Contributor Author

@nirga tests and linting are passing now!

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
packages/traceloop-sdk/traceloop/sdk/tracing/manual.py (1)

61-62: Switch total tokens to GenAI attribute

We still emit llm.usage.total_tokens via SpanAttributes.LLM_USAGE_TOTAL_TOKENS, so the span carries a mix of legacy and GenAI semantic keys. The GenAI spec defines gen_ai.usage.total_tokens; keeping the old key defeats the PR goal and leaves downstream tooling looking for the new field empty. Please update this attribute to the GenAI constant.

-        self._span.set_attribute(
-            SpanAttributes.LLM_USAGE_TOTAL_TOKENS, usage.total_tokens
-        )
+        self._span.set_attribute(
+            GenAIAttributes.GEN_AI_USAGE_TOTAL_TOKENS, usage.total_tokens
+        )
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)

65-78: Address remaining legacy attribute references flagged in previous review.

Per the bot's verification, test files still reference legacy SpanAttributes.LLM_USAGE_PROMPT_TOKENS, SpanAttributes.LLM_USAGE_COMPLETION_TOKENS, and SpanAttributes.LLM_COMPLETIONS constants that should be updated to the new GEN_AI_* equivalents:

  • packages/opentelemetry-instrumentation-openai/tests/traces/test_streaming_with_api_usage.py (lines ~38, 39, 67, 68)
  • packages/traceloop-sdk/tests/test_prompt_management.py (lines ~257, 273)

Complete the migration by updating these test references.

Based on learnings

packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py (1)

172-239: Migrate token-usage span attributes to GenAI conventions
In packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py (lines 172–239), you’re still setting spans with the legacy SpanAttributes.LLM_USAGE_* keys. Replace those with the GenAI semantic convention constants—GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS, GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, and GenAIAttributes.GEN_AI_USAGE_TOTAL_TOKENS—to match the rest of the instrumentation and satisfy existing tests.

packages/opentelemetry-instrumentation-langchain/tests/test_documents_chains.py (1)

99-119: Don't drop the gen_ai.choice log assertion

The LangChain instrumentation still emits both the user message and the choice events; forcing len(logs) == 1 and commenting out the gen_ai.choice verification means this test no longer catches regressions where the completion log disappears. That log is part of the GenAI trace semantics we rely on downstream. Please keep the expectation at two logs and update the gen_ai.choice payload assertion (if the payload changed) instead of removing it.

packages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.py (1)

49-76: Update test assertions to expect emitted metrics
In test_langchain_metrics.py, found_token_metric and found_duration_metric are initialized to False and asserted to remain False, but callback_handler.py calls token_histogram.record() and duration_histogram.record(). Adjust these flags to True and assert metrics are emitted.

packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (1)

104-116: Migrate legacy function attributes to GenAI conventions
Replace uses of SpanAttributes.LLM_REQUEST_FUNCTIONS.{i} with a dedicated execute_tool span (name="execute_tool {tool.name}", kind=INTERNAL) that populates the standardized attributes:

  • gen_ai.tool.name → tool_function.get("name")
  • gen_ai.tool.description → tool_function.get("description")
  • gen_ai.tool.type → one of "function"|"extension"|"datastore"
  • gen_ai.tool.call.id → unique call identifier
    Record the function’s parameters in gen_ai.input.messages per the GenAI messages schema instead of a separate .parameters attribute.
♻️ Duplicate comments (2)
packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py (1)

1297-1302: Use GenAIAttributes constants instead of string literals.

Lines 1297-1302 (and similar patterns at 1437-1442, 1539-1544, 1662-1664, 1818-1820, 1920-1922, 2002-2004, 2128-2130, 2240-2242) use string literals for token attribute keys. The rest of the file consistently uses GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS and GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS constants.

Apply this pattern consistently:

-    assert anthropic_span.attributes["gen_ai.usage.input_tokens"] == 514
+    assert anthropic_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS] == 514
     assert (
-        anthropic_span.attributes["gen_ai.usage.output_tokens"]
-        + anthropic_span.attributes["gen_ai.usage.input_tokens"]
+        anthropic_span.attributes[GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS]
+        + anthropic_span.attributes[GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS]
         == anthropic_span.attributes["llm.usage.total_tokens"]
     )

As per coding guidelines.

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py (1)

237-246: Remove duplicate metrics recording block.

Lines 237-246 duplicate the metrics recording from lines 228-235, causing duration to be recorded twice and response_id to be set redundantly.

Remove the duplicate block:

         if self._duration_histogram:
             duration = time.time() - self._start_time
             self._duration_histogram.record(
                 duration,
                 attributes=metric_attributes,
             )
 
-            # This mirrors the logic from build_from_streaming_response
-            metric_attributes = shared_metrics_attributes(self._complete_response)
-            set_span_attribute(self._span, GenAIAttributes.GEN_AI_RESPONSE_ID, self._complete_response.get("id"))
-
-            if self._duration_histogram:
-                duration = time.time() - self._start_time
-                self._duration_histogram.record(
-                    duration,
-                    attributes=metric_attributes,
-                )
-
             # Calculate token usage
🧹 Nitpick comments (6)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py (1)

219-224: Use bare raise to preserve exception context.

Line 222 uses raise e which replaces the original traceback. Use bare raise to preserve the full exception context.

         except Exception as e:
             attributes = error_metrics_attributes(e)
             if self._exception_counter:
                 self._exception_counter.add(1, attributes=attributes)
-            raise e
+            raise
         _process_response_item(item, self._complete_response)
packages/opentelemetry-instrumentation-openai/tests/traces/test_prompt_caching.py (1)

15-500: Ensure test coverage validates actual prompt caching behavior.

All six test functions follow the same pattern: they make two identical requests and assert token counts on the resulting spans. However, the tests don't explicitly validate that prompt caching actually occurred (e.g., checking cache hit/miss indicators or verifying that the second request was faster/cheaper due to caching).

Consider adding assertions that:

  1. Verify cache hit/miss status attributes if exposed by the OpenAI API.
  2. Check that the second span reflects cached behavior (e.g., lower latency, cache-specific attributes).
  3. Validate the response IDs differ between cache creation and read spans (currently checked, which is good).

If the OpenAI API exposes cache-related metadata, verify it's being captured:

#!/bin/bash
# Description: Check OpenAI API response structure for cache-related fields.

# Look for usage, cache, or token-related response handling in the instrumentation
ast-grep --pattern $'usage = $$$
$$$
cache$$$
$$$'

# Search for response.usage parsing in the instrumentation
rg -n 'response.*usage|usage.*cache|\.usage\.' --type=py -g '!test*' packages/opentelemetry-instrumentation-openai/
packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (1)

438-456: Consider using the GenAIAttributes alias for consistency.

This function imports GEN_AI_AGENT_NAME directly, while the rest of the file uses the GenAIAttributes module alias established at line 14. For consistency, consider using GenAIAttributes.GEN_AI_AGENT_NAME instead of the direct import.

Apply this diff to align with the file's import pattern:

 def test_agent_name_propagation_to_agent_spans(exporter, test_agent):
     """Test that agent name is set into agent spans through the context propagation mechanism."""
-    from opentelemetry.semconv._incubating.attributes.gen_ai_attributes import GEN_AI_AGENT_NAME
 
     query = "What is AI?"
     Runner.run_sync(
         test_agent,
         query,
     )
     spans = exporter.get_finished_spans()
 
     # Find the agent span
     agent_spans = [s for s in spans if s.name == "testAgent.agent"]
     assert len(agent_spans) == 1, f"Expected 1 agent span, got {len(agent_spans)}"
     agent_span = agent_spans[0]
 
     # Verify the agent span has the agent name attribute set via context propagation
-    assert GEN_AI_AGENT_NAME in agent_span.attributes, f"Agent span should have {GEN_AI_AGENT_NAME} attribute"
-    assert agent_span.attributes[GEN_AI_AGENT_NAME] == "testAgent", (
-        f"Expected agent name 'testAgent', got '{agent_span.attributes[GEN_AI_AGENT_NAME]}'"
+    assert GenAIAttributes.GEN_AI_AGENT_NAME in agent_span.attributes, f"Agent span should have {GenAIAttributes.GEN_AI_AGENT_NAME} attribute"
+    assert agent_span.attributes[GenAIAttributes.GEN_AI_AGENT_NAME] == "testAgent", (
+        f"Expected agent name 'testAgent', got '{agent_span.attributes[GenAIAttributes.GEN_AI_AGENT_NAME]}'"
     )
packages/opentelemetry-instrumentation-langchain/tests/test_chains.py (1)

763-778: The async callbacks for streaming do not implement choice-event emission hooks, so tests correctly omit those assertions. To support gen_ai.choice in async streaming, implement corresponding async chunk handlers (e.g., on_chat_model_new_token/on_llm_new_token) in TraceloopCallbackHandler; otherwise, add a note to the docs clarifying that async streaming does not emit choice events.

packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (2)

54-59: Review empty string handling.

The helper function _set_span_attribute sets attributes even when the value is an empty string (line 59). This differs from typical OpenTelemetry practice where empty strings are often treated similarly to None and skipped.

Consider whether empty strings should be set as attributes or skipped like None values:

 def _set_span_attribute(span: Span, key: str, value: Any) -> None:
     if value is not None:
-        if value != "":
-            span.set_attribute(key, value)
-        else:
-            span.set_attribute(key, "")
+        if value != "":
+            span.set_attribute(key, value)

This would align with the common pattern seen in other instrumentation packages (see /packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py).


292-324: Improve error handling specificity.

The code catches a bare Exception (line 321) which is flagged by static analysis. While the try-except prevents crashes when processing usage metadata, catching all exceptions can hide unexpected errors.

Consider catching specific exceptions related to attribute access and data processing:

     try:
         for generations in response.generations:
             for generation in generations:
                 if (
                     hasattr(generation, "message")
                     and hasattr(generation.message, "usage_metadata")
                     and generation.message.usage_metadata is not None
                 ):
                     input_tokens += (
                         generation.message.usage_metadata.get("input_tokens")
                         or generation.message.usage_metadata.get("prompt_tokens")
                         or 0
                     )
                     output_tokens += (
                         generation.message.usage_metadata.get("output_tokens")
                         or generation.message.usage_metadata.get("completion_tokens")
                         or 0
                     )
                     total_tokens = input_tokens + output_tokens
 
                     if generation.message.usage_metadata.get("input_token_details"):
                         input_token_details = generation.message.usage_metadata.get(
                             "input_token_details", {}
                         )
                         cache_read_tokens += input_token_details.get("cache_read", 0)
-    except Exception as e:
+    except (AttributeError, TypeError, KeyError) as e:
         # If there's any issue processing usage metadata, continue without it
         print(f"DEBUG: Error processing usage metadata: {e}")
-        pass

Also, remove the redundant pass after the print statement, and consider using a logger instead of print for consistency with the rest of the codebase.

Based on coding guidelines.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 8acb4c7 and 93f9d82.

📒 Files selected for processing (24)
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py (7 hunks)
  • packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py (35 hunks)
  • packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py (15 hunks)
  • packages/opentelemetry-instrumentation-anthropic/tests/test_thinking.py (0 hunks)
  • packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py (8 hunks)
  • packages/opentelemetry-instrumentation-cohere/tests/test_chat.py (3 hunks)
  • packages/opentelemetry-instrumentation-langchain/langgraph_with_otel_sdk.py (0 hunks)
  • packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (4 hunks)
  • packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (13 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.py (10 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/test_chains.py (14 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/test_documents_chains.py (3 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/test_lcel.py (17 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/test_llms.py (43 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/test_structured_output.py (4 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py (4 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/test_tool_calls.py (19 hunks)
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (5 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_chat_response_format.py (0 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_prompt_caching.py (6 hunks)
  • packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1 hunks)
  • packages/traceloop-sdk/tests/test_manual.py (2 hunks)
  • packages/traceloop-sdk/tests/test_tasks.py (1 hunks)
  • packages/traceloop-sdk/traceloop/sdk/tracing/manual.py (3 hunks)
💤 Files with no reviewable changes (3)
  • packages/opentelemetry-instrumentation-anthropic/tests/test_thinking.py
  • packages/opentelemetry-instrumentation-langchain/langgraph_with_otel_sdk.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_chat_response_format.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • packages/opentelemetry-instrumentation-langchain/tests/test_tool_call_content.py
  • packages/opentelemetry-instrumentation-cohere/tests/test_chat.py
  • packages/traceloop-sdk/tests/test_manual.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/opentelemetry-instrumentation-langchain/tests/test_documents_chains.py
  • packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py
  • packages/opentelemetry-instrumentation-langchain/tests/test_tool_calls.py
  • packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py
  • packages/opentelemetry-instrumentation-langchain/tests/test_chains.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_prompt_caching.py
  • packages/opentelemetry-instrumentation-langchain/tests/test_structured_output.py
  • packages/traceloop-sdk/tests/test_tasks.py
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py
  • packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
  • packages/traceloop-sdk/traceloop/sdk/tracing/manual.py
  • packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py
  • packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py
  • packages/opentelemetry-instrumentation-langchain/tests/test_llms.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
  • packages/opentelemetry-instrumentation-langchain/tests/test_lcel.py
  • packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py
  • packages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.py
🧬 Code graph analysis (11)
packages/opentelemetry-instrumentation-langchain/tests/test_documents_chains.py (1)
packages/opentelemetry-instrumentation-langchain/tests/test_chains.py (1)
  • assert_message_in_logs (818-826)
packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py (3)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (1)
  • _set_span_attribute (54-59)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (2)
  • LLMRequestTypeValues (277-282)
  • SpanAttributes (64-245)
packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/utils.py (2)
  • to_dict (64-77)
  • dump_object (52-61)
packages/opentelemetry-instrumentation-langchain/tests/test_tool_calls.py (1)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (1)
  • _ (1094-1124)
packages/opentelemetry-instrumentation-langchain/tests/test_chains.py (1)
packages/opentelemetry-instrumentation-langchain/tests/test_documents_chains.py (1)
  • assert_message_in_logs (155-163)
packages/opentelemetry-instrumentation-langchain/tests/test_structured_output.py (1)
packages/opentelemetry-instrumentation-langchain/tests/test_chains.py (1)
  • assert_message_in_logs (818-826)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py (2)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py (4)
  • set_span_attribute (21-25)
  • error_metrics_attributes (216-220)
  • shared_metrics_attributes (179-212)
  • count_prompt_tokens_from_request (224-242)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (1)
  • _set_token_usage (296-406)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (3)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (3)
  • _set_span_attribute (54-59)
  • set_chat_response_usage (280-373)
  • set_chat_response (206-277)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (64-245)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/utils.py (1)
  • should_emit_events (80-87)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (2)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (1)
  • _set_span_attribute (30-37)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/utils.py (1)
  • CallbackFilteredJSONEncoder (22-46)
packages/opentelemetry-instrumentation-langchain/tests/test_llms.py (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (64-245)
packages/opentelemetry-instrumentation-langchain/tests/test_lcel.py (1)
packages/opentelemetry-instrumentation-langchain/tests/test_chains.py (1)
  • assert_message_in_logs (818-826)
packages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.py (2)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • Meters (36-61)
packages/opentelemetry-instrumentation-milvus/tests/conftest.py (1)
  • reader (37-41)
🪛 Ruff (0.13.1)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py

222-222: Use raise without specifying exception name

Remove exception name

(TRY201)


278-278: Do not catch blind exception: Exception

(BLE001)

packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py

321-321: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (36)
packages/traceloop-sdk/tests/test_tasks.py (1)

25-29: Robust span selection looks good.

Filtering for answer_question.task before asserting keeps the test stable when extra spans appear. Nice resilience improvement.

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (2)

69-71: Good practice maintaining backward compatibility.

Retaining the legacy LLM_USAGE_CACHE_* constants with identical values ensures existing instrumentation code won't break during migration to the GEN_AI_* namespace.


65-67: Confirm missing upstream definitions — additions are appropriate
The attributes gen_ai.usage.cache_creation_input_tokens and gen_ai.usage.cache_read_input_tokens aren’t defined in OpenTelemetry semantic conventions v1.28.0+, so defining them here is correct.

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py (1)

497-504: LGTM: Defensive attribute handling.

The __getattr__ and stub _complete_instrumentation methods in both wrapper classes prevent AttributeError when accessing these attributes, since the actual instrumentation logic resides in the wrapped stream objects (AnthropicStream and AnthropicAsyncStream).

Also applies to: 554-561

packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py (7)

8-10: LGTM!

Import correctly adds GenAIAttributes from the incubating OTel semantic conventions.


26-113: LGTM!

All prompt content attributes correctly migrated to GenAIAttributes.GEN_AI_PROMPT across completion, chat, rerank, and embedding request types.


132-155: Request attributes correctly migrated; TODO noted for penalty attributes.

Model, temperature, top_p, top_k, and stop_sequences now use GenAIAttributes.GEN_AI_REQUEST_*. The TODO at lines 149-155 correctly flags that frequency and presence penalty migration is pending.


242-283: LGTM!

Chat response attributes correctly migrated to GenAIAttributes.GEN_AI_COMPLETION prefix and GEN_AI_RESPONSE_FINISH_REASONS.


286-297: LGTM!

Generations response attributes correctly use GenAIAttributes.GEN_AI_RESPONSE_ID and GEN_AI_COMPLETION prefix.


299-314: LGTM!

Rerank response attributes correctly use GenAIAttributes.GEN_AI_RESPONSE_ID and GEN_AI_COMPLETION prefix.


58-77: Retain legacy function request attributes. No GenAI semantic conventions for functions or tools exist in packages/opentelemetry-semantic-conventions-ai, so keeping SpanAttributes.LLM_REQUEST_FUNCTIONS is correct until new conventions are defined.

Likely an incorrect or invalid review comment.

packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (4)

13-15: LGTM! Clean module alias for GenAI attributes.

The import creates a clear module alias that centralizes access to GenAI semantic convention constants throughout the test file.


141-146: Past issue resolved: token attributes correctly updated.

The test now correctly asserts the new GEN_AI_USAGE_INPUT_TOKENS and GEN_AI_USAGE_OUTPUT_TOKENS attributes, aligning with the OpenTelemetry semantic conventions migration.


300-303: LGTM! Metrics test updated for new token type semantics.

The assertion correctly validates token types as "output" and "input", aligning with the migration from completion_tokens/prompt_tokens to output_tokens/input_tokens.


200-210: Verify attribute key construction with module constants.

I couldn't find the GenAIAttributes definition in this repo (it's imported from opentelemetry.semconv._incubating.attributes), so confirm GEN_AI_COMPLEION is a string constant or a string-valued Enum member — the f-string f"{GenAIAttributes.GEN_AI_COMPLETION}.tool.name" requires a string (use .value or str() if it's an Enum). If it's not string-valued, update the instrumentation/tests to use the string value so the dotted-key attribute names match.

packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (3)

58-60: LGTM! GenAIAttributes import aligns with OTel semantic conventions.

The import correctly introduces the GenAIAttributes alias for accessing gen_ai_attributes from the incubating semantic conventions, supporting the migration from legacy LLM_* attributes to standardized GEN_AI_* attributes.


626-631: Good addition for backward compatibility.

Calling set_chat_response in both event emission and legacy attribute modes ensures span attributes are consistently populated regardless of configuration, maintaining compatibility for downstream consumers.


359-360: LGTM! Attribute key migration correctly implements OTel semantic conventions.

All attribute keys have been consistently migrated from legacy SpanAttributes.LLM_* to standardized GenAIAttributes.GEN_AI_* naming:

  • LLM_SYSTEMGEN_AI_SYSTEM
  • LLM_USAGE_PROMPT_TOKENSGEN_AI_USAGE_INPUT_TOKENS
  • LLM_USAGE_COMPLETION_TOKENSGEN_AI_USAGE_OUTPUT_TOKENS
  • Token and duration histogram attributes updated consistently

This aligns with the PR objective to adopt OpenTelemetry's GenAI semantic conventions.

Also applies to: 555-564, 593-622, 635-641

packages/opentelemetry-instrumentation-langchain/tests/test_structured_output.py (2)

29-38: LGTM! Test updated for GenAI attribute conventions.

The test correctly:

  • Discards the unused result value
  • Uses GenAIAttributes.GEN_AI_PROMPT for prompt content assertions
  • Aligns with the migration to OpenTelemetry semantic conventions

54-75: Verify removal of choice event assertions in structured output tests.

Both test_structured_output_with_events_with_content and test_structured_output_with_events_with_no_content have commented-out assertions for AI choice events, reducing test coverage. Expected log counts dropped from 2 to 1.

Are choice events no longer emitted for structured outputs, or is this incomplete test migration? If this is expected behavior, consider documenting why choice events aren't validated for structured outputs.

Based on the PR context indicating "some tests are outdated and will not be further updated," please confirm whether:

  1. Structured output flows intentionally skip choice event emission
  2. This test coverage gap is acceptable for this PR scope

Also applies to: 79-105

packages/opentelemetry-instrumentation-langchain/tests/test_tool_calls.py (2)

217-270: LGTM! GenAI attribute migrations correctly applied in active tests.

All active span assertions correctly use GenAIAttributes.GEN_AI_PROMPT and GenAIAttributes.GEN_AI_COMPLETION with proper indexed paths for messages, roles, content, and tool calls. The migration is consistent with OpenTelemetry semantic conventions.

Also applies to: 505-529, 722-772, 921-923, 1070-1104


31-79: Align test_tool_calls with skip strategy or update assertions
Span-based checks in test_tool_calls (lines 37–74) are fully commented out, leaving only log validation, while all other test_tool_calls_* tests are skipped with reason="Direct model invocations do not create langchain spans". Either add the same @pytest.mark.skip decorator here or refactor this test to assert the intended current span/log behavior.

packages/opentelemetry-instrumentation-langchain/tests/test_chains.py (1)

126-133: LGTM! Attribute key migrations correctly implement OTel conventions.

The test assertions correctly migrate from SpanAttributes.LLM_* to GenAIAttributes.GEN_AI_*:

  • LLM_REQUEST_MODELGEN_AI_REQUEST_MODEL
  • LLM_RESPONSE_MODELGEN_AI_RESPONSE_MODEL
  • LLM_PROMPTSGEN_AI_PROMPT

Consistent across synchronous and asynchronous test variants.

Also applies to: 216-222, 337-343

packages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.py (2)

7-10: LGTM! Import changes and attribute migrations align with OTel conventions.

The test correctly:

  • Removes unused SpanAttributes import
  • Introduces GenAIAttributes alias
  • Migrates assertions to use GEN_AI_TOKEN_TYPE and GEN_AI_SYSTEM

This addresses the past review comment requesting replacement of SpanAttributes.LLM_SYSTEM references.

Also applies to: 51-73, 103-125


196-272: LGTM! Langgraph test correctly validates metrics with GenAI attributes.

The test_langgraph_metrics test correctly:

  • Expects metrics to BE generated (unlike the chain tests)
  • Uses GenAIAttributes.GEN_AI_SYSTEM and GEN_AI_TOKEN_TYPE
  • Validates token usage, duration, and generation choices metrics

This test's success suggests metrics work for direct OpenAI client invocations but may have issues with LangChain chain integrations (see previous comment).

packages/opentelemetry-instrumentation-langchain/tests/test_llms.py (5)

209-209: Clarify commented assertion pattern.

Multiple tests have commented-out assertions checking for gen_ai.choice events with comments like "logs[2] does not exist" or "logs[1] may not exist". This pattern appears consistently throughout the file.

Are these commented assertions intentional? If choice events are no longer emitted in the new GenAI attribute model, consider removing these commented lines entirely rather than leaving them in the code. If they represent known test gaps or future work, document this with a TODO comment or tracking issue.

Also applies to: 253-253, 357-357, 406-406, 567-567, 627-627, 766-766, 818-818, 964-964, 1022-1022


289-301: Validate new workflow output approach.

The test now validates completion content through the workflow span's TRACELOOP_ENTITY_OUTPUT attribute rather than directly checking the completion span attributes. This is a change in validation strategy.

Ensure this new validation approach is consistent across all similar tests and that it correctly captures the AI response content in all scenarios.


341-341: Verify log count reduction is correct.

Multiple tests have been updated to assert len(logs) == 2 instead of the previous count (likely 3). This suggests that one event type (likely the gen_ai.choice event) is no longer emitted.

Confirm that the reduction in log count is expected behavior with the new GenAI attribute model. The PR description mentions that tests are "outdated and will not be further updated" - if these log count changes reflect known behavioral changes, document this in the test or PR comments.

Also applies to: 392-392, 539-539, 612-612, 748-748, 804-804, 946-946, 1008-1008, 1113-1113, 1189-1189, 1422-1422, 1501-1501


1071-1071: LGTM: System attribute correctly migrated.

The assertion correctly validates that the GEN_AI_SYSTEM attribute is set to the expected vendor name based on the LLM type.


145-154: Attribute migration verified — the instrumentation in span_utils.py and callback_handler.py correctly sets GenAIAttributes.GEN_AI_REQUEST_MODEL, GEN_AI_SYSTEM, as well as prompt (.0.content) and completion (.0.content) attributes via _set_span_attribute, so the tests will pass as written.

packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (6)

18-20: LGTM: GenAI attributes imported correctly.

The import of gen_ai_attributes as GenAIAttributes aligns with the PR's objective to adopt OpenTelemetry's standardized GenAI semantic conventions.


78-80: LGTM: Request and response model attributes migrated.

The migration from legacy attribute names to GEN_AI_REQUEST_MODEL and GEN_AI_RESPONSE_MODEL is correct and aligns with OpenTelemetry semantic conventions.


91-97: LGTM: Request parameter attributes migrated.

All request parameters (GEN_AI_REQUEST_MAX_TOKENS, GEN_AI_REQUEST_TEMPERATURE, GEN_AI_REQUEST_TOP_P) are correctly migrated to use GenAI attributes.


213-243: LGTM: Improved content extraction in response handling.

The refactored set_chat_response function now has more robust content extraction logic that tries multiple sources (generation.text, generation.message.content) and handles both string and structured content. The migration to GEN_AI_COMPLETION prefix is correct.


334-350: LGTM: Token usage attributes correctly migrated.

The migration from legacy token attribute names to GEN_AI_USAGE_INPUT_TOKENS, GEN_AI_USAGE_OUTPUT_TOKENS, and GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS is correct and aligns with OpenTelemetry semantic conventions.


353-373: LGTM: Token histogram attributes migrated.

The token histogram recording correctly uses the new GenAI attributes (GEN_AI_SYSTEM, GEN_AI_TOKEN_TYPE, GEN_AI_RESPONSE_MODEL) for both input and output token metrics.

@rolandgvc
Copy link

hey guys, what's the latest on this? would love to see it in main

@martimfasantos
Copy link
Contributor Author

hey guys, what's the latest on this? would love to see it in main

@nirga any updates on this?

@nirga
Copy link
Member

nirga commented Oct 16, 2025

sorry for the delay here folks @martimfasantos

  • we had 3 major holidays and most of the team were out

@martimfasantos
Copy link
Contributor Author

any updates here? @nirga

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/__init__.py (1)

143-155: Keep prompts on GenAI attribute keys

When the prompt is a plain string we still emit SpanAttributes.LLM_PROMPTS.0.user, so half the calls use the legacy field. That breaks the rename goal and makes downstream data inconsistent. Please route both branches through the GenAI prompt attribute.

Apply this diff to fix the issue:

     if isinstance(prompt, list):
         for index, input_text in enumerate(prompt):
             _set_span_attribute(
                 span,
-                f"{SpanAttributes.GEN_AI_PROMPT}.{index}.user",
+                f"{GenAIAttributes.GEN_AI_PROMPT}.{index}.user",
                 str(input_text).strip(),
             )
     else:
         _set_span_attribute(
             span,
-            f"{SpanAttributes.LLM_PROMPTS}.0.user",
+            f"{GenAIAttributes.GEN_AI_PROMPT}.0.user",
             str(prompt).strip(),
         )
♻️ Duplicate comments (3)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (2)

10-12: Critical: Incorrect import causes undefined name errors.

The import aliases the gen_ai_attributes module as GenAIAttributes, but the code uses it as a class with constants (e.g., GenAIAttributes.GEN_AI_AGENT_NAME). This will fail at runtime and in linting.

Apply this diff to import the correct class:

-from opentelemetry.semconv._incubating.attributes import (
-    gen_ai_attributes as GenAIAttributes
-)
+from opentelemetry.semconv_ai import GenAIAttributes

163-165: Remove non-spec gen_ai.completion.tool. attributes.*

The GenAI semantic conventions don't define a gen_ai.completion.tool.* namespace—tools are captured via gen_ai.tool.* (line 161) or LLM_REQUEST_FUNCTIONS (lines 358-376). These attributes can fail semantic convention validation.

Based on learnings

Apply this diff to remove the non-spec attributes:

             tool_attributes = {
                 SpanAttributes.TRACELOOP_SPAN_KIND: TraceloopSpanKindValues.TOOL.value,
                 "gen_ai.tool.name": tool_name,
                 "gen_ai.system": "openai_agents",
-                f"{GenAIAttributes.GEN_AI_COMPLETION}.tool.name": tool_name,
-                f"{GenAIAttributes.GEN_AI_COMPLETION}.tool.type": "FunctionTool",
-                f"{GenAIAttributes.GEN_AI_COMPLETION}.tool.strict_json_schema": True
             }

             if hasattr(span_data, 'description') and span_data.description:
                 # Only use description if it's not a generic class description
                 desc = span_data.description
                 if desc and not desc.startswith("Represents a Function Span"):
-                    tool_attributes[f"{GenAIAttributes.GEN_AI_COMPLETION}.tool.description"] = desc
+                    tool_attributes["gen_ai.tool.description"] = desc

Also applies to: 172-172

packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/__init__.py (1)

166-201: Map remaining params to GenAI request attributes

random_seed, max_new_tokens, and repetition_penalty are still exported under the old Watsonx-specific SpanAttributes. The GenAI spec now provides direct equivalents, so these should ride the new keys to finish the migration. Only decoding_method and min_new_tokens need to stay on the legacy namespace for now. Based on learnings.

Apply this diff to fix the issue:

         _set_span_attribute(
             span,
             SpanAttributes.LLM_DECODING_METHOD,
             modelParameters.get("decoding_method", None),
         )
         _set_span_attribute(
             span,
-            SpanAttributes.LLM_RANDOM_SEED,
+            GenAIAttributes.GEN_AI_REQUEST_SEED,
             modelParameters.get("random_seed", None),
         )
         _set_span_attribute(
             span,
-            SpanAttributes.LLM_MAX_NEW_TOKENS,
+            GenAIAttributes.GEN_AI_REQUEST_MAX_TOKENS,
             modelParameters.get("max_new_tokens", None),
         )
         _set_span_attribute(
             span,
             SpanAttributes.LLM_MIN_NEW_TOKENS,
             modelParameters.get("min_new_tokens", None),
         )
         _set_span_attribute(
             span, GenAIAttributes.GEN_AI_REQUEST_TOP_K, modelParameters.get("top_k", None)
         )
         _set_span_attribute(
             span,
-            SpanAttributes.LLM_REPETITION_PENALTY,
+            GenAIAttributes.GEN_AI_REQUEST_FREQUENCY_PENALTY,
             modelParameters.get("repetition_penalty", None),
         )
🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (1)

438-438: Consider using the existing GenAIAttributes alias for consistency.

The GEN_AI_AGENT_NAME constant is imported directly here, but it's already available via the GenAIAttributes alias imported at Line 13. For consistency with the rest of the file, consider using GenAIAttributes.GEN_AI_AGENT_NAME directly instead of this additional import.

Apply this diff to use the existing alias:

-    from opentelemetry.semconv._incubating.attributes.gen_ai_attributes import GEN_AI_AGENT_NAME

Then update references in the function to use GenAIAttributes.GEN_AI_AGENT_NAME.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 93f9d82 and 039dc6e.

📒 Files selected for processing (4)
  • packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/__init__.py (3 hunks)
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (12 hunks)
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (5 hunks)
  • packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/__init__.py (14 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/__init__.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
  • packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/__init__.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
🧠 Learnings (1)
📚 Learning: 2025-08-17T15:06:48.109Z
Learnt from: CR
PR: traceloop/openllmetry#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Semantic conventions must follow the OpenTelemetry GenAI specification

Applied to files:

  • packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/__init__.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
  • packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/__init__.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
🧬 Code graph analysis (1)
packages/opentelemetry-instrumentation-watsonx/opentelemetry/instrumentation/watsonx/__init__.py (2)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (64-245)
packages/opentelemetry-instrumentation-cohere/opentelemetry/instrumentation/cohere/span_utils.py (1)
  • _set_span_attribute (18-22)
🔇 Additional comments (9)
packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (7)

13-14: LGTM! Import aligns with the GenAI attribute migration.

The import correctly introduces the GenAIAttributes alias, providing access to the new OpenTelemetry GenAI semantic conventions as intended by this PR.


114-115: LGTM! Agent attributes correctly migrated.

The assertions now properly reference the GenAI semantic convention constants for agent name and system attributes.


137-138: LGTM! Prompt attributes correctly migrated.

The test assertions now properly use the GenAI semantic convention constants for prompt attributes.


141-146: LGTM! Usage token attributes successfully migrated.

The test assertions now correctly use the new GenAI semantic convention constants (GEN_AI_USAGE_INPUT_TOKENS and GEN_AI_USAGE_OUTPUT_TOKENS), replacing the legacy prompt/completion token naming. This addresses the previous review feedback and aligns with the OpenTelemetry specification.


149-151: LGTM! Completion attributes correctly migrated.

The test assertions properly use the GenAI semantic convention constants for completion attributes.


200-210: LGTM! Tool attributes correctly migrated.

The tool-related assertions now properly reference GenAI semantic convention constants, including tool name, type, description, and JSON schema attributes.


300-303: LGTM! Metrics token type correctly migrated.

The metrics assertion properly uses the GenAI semantic convention constant for token type, with values ("output", "input") that align with the new token naming conventions.

packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/__init__.py (2)

28-30: LGTM: Import aligns with OpenTelemetry GenAI semantic conventions.

The import of GenAIAttributes is correctly structured and follows the migration to OpenTelemetry's standardized GenAI attributes.

Based on learnings.


184-188: Original review comment flagged a valid concern but verification confirms no action is needed.

The partial migration in this file is intentional and consistent across the entire codebase. Here's what verification revealed:

  1. GenAI spec does NOT define a REQUEST_TYPE equivalent: The GenAI semantic conventions use gen_ai.operation.name for operation types (different naming pattern), not gen_ai.request.type.

  2. This pattern is systematic, not an oversight: Across 18+ instrumentation packages (vertexai, mistralai, ollama, groq, cohere, anthropic, watsonx, and others), the same pattern exists—GenAIAttributes.GEN_AI_SYSTEM is migrated while SpanAttributes.LLM_REQUEST_TYPE remains unchanged.

  3. LLM_REQUEST_TYPE is maintained intentionally: Since it has no direct GenAI counterpart, it's preserved during the transition from LLM to GenAI semantic conventions.

The code is correct as-is and requires no changes.

Copy link
Member

@nirga nirga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in approving this @martimfasantos! I was out for a couple of weeks and this has slipped between the cracks

@nirga nirga changed the title fix(instrumentation): updated GenAI attributes to use OTel's feat(instrumentation): updated GenAI attributes to use OTel's Oct 30, 2025
@nirga nirga merged commit 4c13c34 into traceloop:main Oct 30, 2025
9 checks passed
@martimfasantos martimfasantos deleted the task-update-genai-attributes branch October 31, 2025 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants