extproc: add genai metrics to track token usage and latency#459
Merged
mathetake merged 10 commits intoenvoyproxy:mainfrom Mar 5, 2025
Merged
extproc: add genai metrics to track token usage and latency#459mathetake merged 10 commits intoenvoyproxy:mainfrom
mathetake merged 10 commits intoenvoyproxy:mainfrom
Conversation
**Commit Message** Add prometheus metrics to measure request count and latency, and token count by backend and model. Signed-off-by: Huamin Chen <hchen@redhat.com>
Signed-off-by: Huamin Chen <hchen@redhat.com>
Signed-off-by: Huamin Chen <hchen@redhat.com>
Signed-off-by: Huamin Chen <hchen@redhat.com>
Signed-off-by: Huamin Chen <hchen@redhat.com>
Signed-off-by: Ignasi Barrera <ignasi@tetrate.io>
Signed-off-by: Ignasi Barrera <ignasi@tetrate.io>
mathetake
reviewed
Mar 5, 2025
| } | ||
|
|
||
| // startMetricsServer starts the HTTP server for Prometheus metrics. | ||
| func startMetricsServer(addr string, logger *slog.Logger) (*http.Server, metric.Meter) { |
Member
There was a problem hiding this comment.
can you add a unit test for this
|
|
||
| var _ extprocv3.ExternalProcessor_ProcessServer = &mockExternalProcessingStream{} | ||
|
|
||
| type mockChatCompletionMetrics struct { |
Member
There was a problem hiding this comment.
let's add comments like elsewhere
Suggested change
| type mockChatCompletionMetrics struct { | |
| // mockChatCompletionMetrics implements ... | |
| type mockChatCompletionMetrics struct { |
Signed-off-by: Ignasi Barrera <ignasi@tetrate.io>
mathetake
approved these changes
Mar 5, 2025
mathetake
approved these changes
Mar 5, 2025
This was referenced Mar 5, 2025
yuzisun
pushed a commit
that referenced
this pull request
Mar 8, 2025
**Commit Message** This changes the stat collection behavior so that token latency metrics are only recorded on stream=true requests. This was brought up in an offline discussion and otherwise the metrics doesn't make sense. **Related Issues/PRs (if applicable)** #459 Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
aabchoo
pushed a commit
that referenced
this pull request
Mar 14, 2025
**Commit Message** extproc: add GenAI metrics to track token usage and latency Adds GenAI metrics according to the OpenTelemetry Semantic Conventions for Generative AI Metrics [1]. Note those metrics are still in experimental phase and may still be subject to change. 1: https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-metrics/ **Related Issues/PRs (if applicable)** This is a follow-up of #432, implementing the remaining review comments. --------- Signed-off-by: Huamin Chen <hchen@redhat.com> Signed-off-by: Ignasi Barrera <ignasi@tetrate.io>
aabchoo
added a commit
that referenced
this pull request
Mar 14, 2025
**Commit Message** PR to backport `mockChatCompletionMetrics`, chat completion stream fix, and openai content type. Including: - #459 (468 uses mock components introduced here) - #468 - #486 --------- Signed-off-by: Huamin Chen <hchen@redhat.com> Signed-off-by: Ignasi Barrera <ignasi@tetrate.io> Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> Signed-off-by: Aaron Choo <achoo30@bloomberg.net> Co-authored-by: Ignasi Barrera <ignasi@tetrate.io> Co-authored-by: Takeshi Yoneda <t.y.mathetake@gmail.com> Co-authored-by: Dan Sun <dsun20@bloomberg.net>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Commit Message
extproc: add GenAI metrics to track token usage and latency
Adds GenAI metrics according to the OpenTelemetry Semantic Conventions for Generative AI Metrics [1].
Note those metrics are still in experimental phase and may still be subject to change.
1: https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-metrics/
Related Issues/PRs (if applicable)
This is a follow-up of #432, implementing the remaining review comments.
Special notes for reviewers (if applicable)
This PR contains all the commits in the original PR intact, and the only added piece are the last two commits: 8a09826 and 3f5dde2
Thr first commit contains:
The second commit contains:
A refactoring of the above to use the OpenTelemetry SDK instead of the Prometheus one, to decouple the core from Prometheus. I left this in a separate commit because I don't know if we really care about this?
Example metrics: