Skip to content

🐛 Bug Report: Number of tokens not reported when streaming OpenAI/AOAI #627

@maciejwie

Description

@maciejwie

Which component is this bug for?

OpenAI Instrumentation

📜 Description

llm.usage.prompt_tokens and llm.usage.total_tokens are not reported when streaming is used. Per discussion in Slack, this may be because this information is not included in the response from OpenAI/Azure OpenAI. Especially when TRACELOOP_TRACE_CONTENT is false, this is an important statistic and would be required for derived fields like token throughput.

👟 Reproduction steps

The following code correctly reports the number of tokens:

prompt = "Say Hi!"

response_non_streaming = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": prompt}]
)

The following code does not report the number of tokens:

prompt = "Say Hi!"

response_streaming = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": prompt}],
    stream=True
)

👍 Expected behavior

llm.usage.prompt_tokens and llm.usage.total_tokens should be reported regardless of stream setting.

👎 Actual Behavior with Screenshots

Example with non-streamed ada:
image

Example with streaming gpt-3.5:
image

🤖 Python Version

No response

📃 Provide any additional context for the Bug.

If this value is not reported by OpenAI/Azure OpenAI directly, then it may be a design decision of whether it's worth counting in this library, or whether only values returned by the APIs should be reported.

👀 Have you spent some time to check if this bug has been raised before?

  • I checked and didn't find similar issue

Are you willing to submit PR?

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions