-
Notifications
You must be signed in to change notification settings - Fork 868
Description
Which component is this bug for?
OpenAI Instrumentation
📜 Description
llm.usage.prompt_tokens and llm.usage.total_tokens are not reported when streaming is used. Per discussion in Slack, this may be because this information is not included in the response from OpenAI/Azure OpenAI. Especially when TRACELOOP_TRACE_CONTENT is false, this is an important statistic and would be required for derived fields like token throughput.
👟 Reproduction steps
The following code correctly reports the number of tokens:
prompt = "Say Hi!"
response_non_streaming = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": prompt}]
)
The following code does not report the number of tokens:
prompt = "Say Hi!"
response_streaming = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": prompt}],
stream=True
)
👍 Expected behavior
llm.usage.prompt_tokens and llm.usage.total_tokens should be reported regardless of stream setting.
👎 Actual Behavior with Screenshots
Example with non-streamed ada:

Example with streaming gpt-3.5:

🤖 Python Version
No response
📃 Provide any additional context for the Bug.
If this value is not reported by OpenAI/Azure OpenAI directly, then it may be a design decision of whether it's worth counting in this library, or whether only values returned by the APIs should be reported.
👀 Have you spent some time to check if this bug has been raised before?
- I checked and didn't find similar issue
Are you willing to submit PR?
None