OTLP: label caching for OTLP-to-Prometheus conversion to reduce allocations and improve latency#17860
Merged
aknuds1 merged 14 commits intoprometheus:mainfrom Jan 16, 2026
Merged
Conversation
ebd647c to
c868373
Compare
a8dc0d8 to
b4f73e5
Compare
Contributor
Author
|
Putting this in draft while I compare existing benchmarks. |
ArthurSens
reviewed
Jan 14, 2026
Member
ArthurSens
left a comment
There was a problem hiding this comment.
that's some great improvements, nice job! I have just some comments
storage/remote/otlptranslator/prometheusremotewrite/metrics_to_prw.go
Outdated
Show resolved
Hide resolved
Add per-request caching to reduce redundant computation and allocations during OTLP metric conversion: 1. Per-request label sanitization cache: Cache sanitized label names within a request to avoid repeated string allocations for commonly repeated labels like __name__, job, instance. 2. Resource-level label caching: Precompute and cache job, instance, promoted resource attributes, and external labels once per ResourceMetrics boundary instead of for each datapoint. 3. Scope-level label caching: Precompute and cache scope metadata labels (otel_scope_name, otel_scope_version, etc.) once per ScopeMetrics boundary. 4. LabelNamer instance caching: Reuse the LabelNamer struct across datapoints within the same resource context. These optimizations significantly reduce allocations and improve latency for OTLP ingestion workloads with many datapoints per resource/scope. Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Add regression tests to ensure: 1. Scope labels are not added to target_info when PromoteScopeMetadata is enabled. The fix ensures scope labels are only merged when both the cache exists AND promoteScope is true. 2. Promoted resource attributes are not added to target_info. Added getPromotedAttributeNames() method to get the list of promoted attributes, which are then added to ignoreAttrs in addResourceTargetInfo() to prevent them from appearing in target_info. Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Remove resource and scope parameters from createAttributes since these are now cached via setResourceContext and setScopeContext. Update all call sites and tests to properly initialize context before calling internal add* functions. Also fix target_info to not include scope labels by temporarily clearing c.scopeLabels during target_info generation, since target_info is a resource-level metric. Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
The sanitizedLabels cache stores label name sanitization results which depend only on the label name and settings. Since settings are constant within a FromMetrics call, clearing the cache at each resource boundary is unnecessary and reduces caching effectiveness. Label names like __name__, job, instance will now remain cached across all resources in a request instead of being re-sanitized for each one. Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
…arget_info Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
…ed methods Remove the redundant clearResourceContext() call at the end of the ResourceMetrics loop since setResourceContext() unconditionally overwrites the cached labels at the start of each iteration. Also remove two methods that became unused after the earlier change to include promoted attributes in target_info: - addPromotedAttributes - getPromotedAttributeNames Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Use labels.Builder instead of ScratchBuilder for building promoted resource attribute labels. Builder.Get()/Set() properly handles duplicate label names that can arise when different OTLP attribute names sanitize to the same Prometheus label name (e.g., "foo.bar" and "foo_bar" both become "foo_bar"). Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Use labels.Builder with Set() instead of ScratchBuilder.Add() for scope attributes caching. This handles duplicate label names that can arise when different OTLP attribute names sanitize to the same Prometheus label name. Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
6ba9dcd to
18a502e
Compare
Since the caching refactor now stores resource and scope labels in the converter's cached state, these parameters are no longer needed in the datapoint-level functions. Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
d2a79f0 to
75638c3
Compare
ArthurSens
approved these changes
Jan 15, 2026
Member
ArthurSens
left a comment
There was a problem hiding this comment.
I realized that building the labelBuilder in NewPrometheusConverter is a bit harder than I initially expected, since settings aren't passed as an argument there.
LGTM
Add a nil check for resourceLabels at the start of createAttributes to return a clear error instead of panicking if the caller forgets to call setResourceContext first. Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com> Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue(s) does the PR fix:
Does this PR introduce a user-facing change?
Summary
This PR includes the following changes:
Benchmark Results
OTLP-to-Prometheus label caching benchmarks (Apple M4 Pro):
Summary: 34% faster, 17% less memory, 81% fewer allocations (geomean).