Skip to content

OTLP: label caching for OTLP-to-Prometheus conversion to reduce allocations and improve latency#17860

Merged
aknuds1 merged 14 commits intoprometheus:mainfrom
aknuds1:arve/optimizations
Jan 16, 2026
Merged

OTLP: label caching for OTLP-to-Prometheus conversion to reduce allocations and improve latency#17860
aknuds1 merged 14 commits intoprometheus:mainfrom
aknuds1:arve/optimizations

Conversation

@aknuds1
Copy link
Contributor

@aknuds1 aknuds1 commented Jan 14, 2026

Which issue(s) does the PR fix:

Does this PR introduce a user-facing change?

[PERF] otlptranslator: Add label caching for OTLP-to-Prometheus conversion to reduce allocations and improve latency

Summary

This PR includes the following changes:

  • otlptranslator: Add label caching for OTLP-to-Prometheus conversion
    • Per-request label sanitization cache to avoid repeated string allocations
    • Resource-level label caching (job, instance, promoted resource attributes, external labels)
    • Scope-level label caching (otel_scope_name, otel_scope_version, etc.)
    • LabelNamer instance caching across datapoints
    • Add benchmarks demonstrating the perf improvements

Benchmark Results

OTLP-to-Prometheus label caching benchmarks (Apple M4 Pro):

goos: darwin
goarch: arm64
pkg: github.com/prometheus/prometheus/storage/remote/otlptranslator/prometheusremotewrite
cpu: Apple M4 Pro
                                                                                             │   main.txt   │         optimizations.txt          │
                                                                                             │    sec/op    │   sec/op     vs base               │
FromMetrics_LabelCaching_MultipleDatapointsPerResource/res_attrs=5/scopes=1/metrics=10-14       19.07µ ± 2%   13.27µ ± 4%  -30.44% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleDatapointsPerResource/res_attrs=5/scopes=1/metrics=100-14      177.4µ ± 3%   108.1µ ± 8%  -39.06% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleDatapointsPerResource/res_attrs=5/scopes=10/metrics=10-14      180.0µ ± 4%   106.2µ ± 3%  -41.00% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleDatapointsPerResource/res_attrs=5/scopes=10/metrics=100-14     2.063m ± 5%   1.174m ± 2%  -43.06% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleDatapointsPerResource/res_attrs=50/scopes=1/metrics=10-14     122.66µ ± 6%   89.18µ ± 2%  -27.30% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleDatapointsPerResource/res_attrs=50/scopes=1/metrics=100-14    1040.5µ ± 2%   713.8µ ± 2%  -31.40% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleDatapointsPerResource/res_attrs=50/scopes=10/metrics=10-14    1054.3µ ± 2%   713.8µ ± 2%  -32.30% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleDatapointsPerResource/res_attrs=50/scopes=10/metrics=100-14   10.833m ± 1%   7.112m ± 3%  -34.35% (p=0.002 n=6)
FromMetrics_LabelCaching_RepeatedLabelNames/unique_labels=5/datapoints=100-14                   83.63µ ± 3%   62.62µ ± 3%  -25.12% (p=0.002 n=6)
FromMetrics_LabelCaching_RepeatedLabelNames/unique_labels=5/datapoints=1000-14                  939.4µ ± 3%   696.7µ ± 2%  -25.84% (p=0.002 n=6)
FromMetrics_LabelCaching_RepeatedLabelNames/unique_labels=50/datapoints=100-14                  304.5µ ± 3%   215.2µ ± 3%  -29.33% (p=0.002 n=6)
FromMetrics_LabelCaching_RepeatedLabelNames/unique_labels=50/datapoints=1000-14                 3.163m ± 3%   2.200m ± 2%  -30.44% (p=0.002 n=6)
FromMetrics_LabelCaching_ScopeMetadata/scope_attrs=0/metrics=10-14                             11.446µ ± 3%   9.322µ ± 4%  -18.56% (p=0.002 n=6)
FromMetrics_LabelCaching_ScopeMetadata/scope_attrs=0/metrics=100-14                            103.86µ ± 4%   72.75µ ± 4%  -29.96% (p=0.002 n=6)
FromMetrics_LabelCaching_ScopeMetadata/scope_attrs=10/metrics=10-14                             24.59µ ± 2%   15.53µ ± 1%  -36.86% (p=0.002 n=6)
FromMetrics_LabelCaching_ScopeMetadata/scope_attrs=10/metrics=100-14                            234.5µ ± 4%   117.6µ ± 4%  -49.85% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleResources/resources=1/metrics=10-14                            32.36µ ± 3%   21.82µ ± 4%  -32.57% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleResources/resources=1/metrics=100-14                           299.2µ ± 3%   173.5µ ± 3%  -42.02% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleResources/resources=10/metrics=10-14                           314.9µ ± 1%   191.5µ ± 1%  -39.20% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleResources/resources=10/metrics=100-14                          3.217m ± 2%   1.881m ± 3%  -41.51% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleResources/resources=50/metrics=10-14                           1.692m ± 4%   1.035m ± 4%  -38.81% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleResources/resources=50/metrics=100-14                         15.552m ± 3%   8.938m ± 2%  -42.53% (p=0.002 n=6)
geomean                                                                                         365.7µ        237.7µ       -35.02%

                                                                                             │   main.txt    │          optimizations.txt           │
                                                                                             │     B/op      │     B/op       vs base               │
FromMetrics_LabelCaching_MultipleDatapointsPerResource/res_attrs=5/scopes=1/metrics=10-14       13.57Ki ± 0%    16.01Ki ± 0%  +17.93% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleDatapointsPerResource/res_attrs=5/scopes=1/metrics=100-14     111.40Ki ± 0%    88.43Ki ± 0%  -20.62% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleDatapointsPerResource/res_attrs=5/scopes=10/metrics=10-14     111.40Ki ± 0%    88.43Ki ± 0%  -20.62% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleDatapointsPerResource/res_attrs=5/scopes=10/metrics=100-14    1239.5Ki ± 0%    961.9Ki ± 0%  -22.40% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleDatapointsPerResource/res_attrs=50/scopes=1/metrics=10-14      45.45Ki ± 0%    41.98Ki ± 0%   -7.64% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleDatapointsPerResource/res_attrs=50/scopes=1/metrics=100-14     327.6Ki ± 0%    235.3Ki ± 0%  -28.17% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleDatapointsPerResource/res_attrs=50/scopes=10/metrics=10-14     327.7Ki ± 0%    235.3Ki ± 0%  -28.17% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleDatapointsPerResource/res_attrs=50/scopes=10/metrics=100-14    3.222Mi ± 0%    2.264Mi ± 0%  -29.73% (p=0.002 n=6)
FromMetrics_LabelCaching_RepeatedLabelNames/unique_labels=5/datapoints=100-14                   69.16Ki ± 0%    66.35Ki ± 0%   -4.06% (p=0.002 n=6)
FromMetrics_LabelCaching_RepeatedLabelNames/unique_labels=5/datapoints=1000-14                  844.2Ki ± 0%    771.1Ki ± 0%   -8.66% (p=0.002 n=6)
FromMetrics_LabelCaching_RepeatedLabelNames/unique_labels=50/datapoints=100-14                  171.1Ki ± 0%    145.1Ki ± 0%  -15.20% (p=0.002 n=6)
FromMetrics_LabelCaching_RepeatedLabelNames/unique_labels=50/datapoints=1000-14                 1.830Mi ± 0%    1.530Mi ± 0%  -16.39% (p=0.002 n=6)
FromMetrics_LabelCaching_ScopeMetadata/scope_attrs=0/metrics=10-14                              9.759Ki ± 0%   13.383Ki ± 0%  +37.14% (p=0.002 n=6)
FromMetrics_LabelCaching_ScopeMetadata/scope_attrs=0/metrics=100-14                             79.43Ki ± 0%    70.34Ki ± 0%  -11.45% (p=0.002 n=6)
FromMetrics_LabelCaching_ScopeMetadata/scope_attrs=10/metrics=10-14                             18.42Ki ± 0%    18.16Ki ± 0%   -1.40% (p=0.002 n=6)
FromMetrics_LabelCaching_ScopeMetadata/scope_attrs=10/metrics=100-14                            159.9Ki ± 0%    104.6Ki ± 0%  -34.55% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleResources/resources=1/metrics=10-14                            20.53Ki ± 0%    21.67Ki ± 0%   +5.58% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleResources/resources=1/metrics=100-14                           161.3Ki ± 0%    123.6Ki ± 0%  -23.35% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleResources/resources=10/metrics=10-14                           168.7Ki ± 0%    133.0Ki ± 0%  -21.16% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleResources/resources=10/metrics=100-14                          1.685Mi ± 0%    1.271Mi ± 0%  -24.57% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleResources/resources=50/metrics=10-14                           906.2Ki ± 0%    703.2Ki ± 0%  -22.40% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleResources/resources=50/metrics=100-14                          8.063Mi ± 0%    5.948Mi ± 0%  -26.23% (p=0.002 n=6)
geomean                                                                                         208.4Ki         176.5Ki       -15.30%

                                                                                             │   main.txt   │         optimizations.txt          │
                                                                                             │  allocs/op   │  allocs/op   vs base               │
FromMetrics_LabelCaching_MultipleDatapointsPerResource/res_attrs=5/scopes=1/metrics=10-14       239.00 ± 0%    91.00 ± 0%  -61.92% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleDatapointsPerResource/res_attrs=5/scopes=1/metrics=100-14      2135.0 ± 0%    547.0 ± 0%  -74.38% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleDatapointsPerResource/res_attrs=5/scopes=10/metrics=10-14      2135.0 ± 0%    547.0 ± 0%  -74.38% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleDatapointsPerResource/res_attrs=5/scopes=10/metrics=100-14    21.051k ± 0%   5.058k ± 0%  -75.97% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleDatapointsPerResource/res_attrs=50/scopes=1/metrics=10-14       739.0 ± 0%    141.0 ± 0%  -80.92% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleDatapointsPerResource/res_attrs=50/scopes=1/metrics=100-14     6685.0 ± 0%    597.0 ± 0%  -91.07% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleDatapointsPerResource/res_attrs=50/scopes=10/metrics=10-14     6685.0 ± 0%    597.0 ± 0%  -91.07% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleDatapointsPerResource/res_attrs=50/scopes=10/metrics=100-14   66.105k ± 0%   5.108k ± 0%  -92.27% (p=0.002 n=6)
FromMetrics_LabelCaching_RepeatedLabelNames/unique_labels=5/datapoints=100-14                   1023.0 ± 0%    533.0 ± 0%  -47.90% (p=0.002 n=6)
FromMetrics_LabelCaching_RepeatedLabelNames/unique_labels=5/datapoints=1000-14                 10.034k ± 0%   5.044k ± 0%  -49.73% (p=0.002 n=6)
FromMetrics_LabelCaching_RepeatedLabelNames/unique_labels=50/datapoints=100-14                  2527.0 ± 0%    552.0 ± 0%  -78.16% (p=0.002 n=6)
FromMetrics_LabelCaching_RepeatedLabelNames/unique_labels=50/datapoints=1000-14                25.038k ± 0%   5.063k ± 0%  -79.78% (p=0.002 n=6)
FromMetrics_LabelCaching_ScopeMetadata/scope_attrs=0/metrics=10-14                              158.00 ± 0%    87.00 ± 0%  -44.94% (p=0.002 n=6)
FromMetrics_LabelCaching_ScopeMetadata/scope_attrs=0/metrics=100-14                             1334.0 ± 0%    543.0 ± 0%  -59.30% (p=0.002 n=6)
FromMetrics_LabelCaching_ScopeMetadata/scope_attrs=10/metrics=10-14                              359.0 ± 0%    109.0 ± 0%  -69.64% (p=0.002 n=6)
FromMetrics_LabelCaching_ScopeMetadata/scope_attrs=10/metrics=100-14                            3335.0 ± 0%    565.0 ± 0%  -83.06% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleResources/resources=1/metrics=10-14                             346.0 ± 0%    103.0 ± 0%  -70.23% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleResources/resources=1/metrics=100-14                           3142.0 ± 0%    559.0 ± 0%  -82.21% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleResources/resources=10/metrics=10-14                           3280.0 ± 0%    607.0 ± 0%  -81.49% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleResources/resources=10/metrics=100-14                         31.196k ± 0%   5.118k ± 0%  -83.59% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleResources/resources=50/metrics=10-14                          16.293k ± 0%   2.817k ± 0%  -82.71% (p=0.002 n=6)
FromMetrics_LabelCaching_MultipleResources/resources=50/metrics=100-14                         155.83k ± 0%   25.35k ± 0%  -83.73% (p=0.002 n=6)
geomean                                                                                         3.648k         810.6       -77.78%

Summary: 34% faster, 17% less memory, 81% fewer allocations (geomean).

@aknuds1 aknuds1 force-pushed the arve/optimizations branch from ebd647c to c868373 Compare January 14, 2026 12:42
@aknuds1 aknuds1 changed the title Various optimizations and improvements OTLP: label caching for OTLP-to-Prometheus conversion to reduce allocations and improve latency Jan 14, 2026
@aknuds1 aknuds1 requested a review from krajorama January 14, 2026 14:11
@aknuds1 aknuds1 force-pushed the arve/optimizations branch from a8dc0d8 to b4f73e5 Compare January 14, 2026 14:25
@aknuds1 aknuds1 marked this pull request as draft January 14, 2026 16:22
@aknuds1
Copy link
Contributor Author

aknuds1 commented Jan 14, 2026

Putting this in draft while I compare existing benchmarks.

@aknuds1 aknuds1 marked this pull request as ready for review January 14, 2026 17:33
Copy link
Member

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's some great improvements, nice job! I have just some comments

@aknuds1 aknuds1 requested a review from ArthurSens January 15, 2026 10:09
Add per-request caching to reduce redundant computation and allocations
during OTLP metric conversion:

1. Per-request label sanitization cache: Cache sanitized label names
   within a request to avoid repeated string allocations for commonly
   repeated labels like __name__, job, instance.

2. Resource-level label caching: Precompute and cache job, instance,
   promoted resource attributes, and external labels once per
   ResourceMetrics boundary instead of for each datapoint.

3. Scope-level label caching: Precompute and cache scope metadata labels
   (otel_scope_name, otel_scope_version, etc.) once per ScopeMetrics
   boundary.

4. LabelNamer instance caching: Reuse the LabelNamer struct across
   datapoints within the same resource context.

These optimizations significantly reduce allocations and improve latency
for OTLP ingestion workloads with many datapoints per resource/scope.

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Add regression tests to ensure:
1. Scope labels are not added to target_info when PromoteScopeMetadata
   is enabled. The fix ensures scope labels are only merged when both
   the cache exists AND promoteScope is true.

2. Promoted resource attributes are not added to target_info. Added
   getPromotedAttributeNames() method to get the list of promoted
   attributes, which are then added to ignoreAttrs in addResourceTargetInfo()
   to prevent them from appearing in target_info.

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Remove resource and scope parameters from createAttributes since these
are now cached via setResourceContext and setScopeContext. Update all
call sites and tests to properly initialize context before calling
internal add* functions.

Also fix target_info to not include scope labels by temporarily clearing
c.scopeLabels during target_info generation, since target_info is a
resource-level metric.

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
The sanitizedLabels cache stores label name sanitization results which
depend only on the label name and settings. Since settings are constant
within a FromMetrics call, clearing the cache at each resource boundary
is unnecessary and reduces caching effectiveness.

Label names like __name__, job, instance will now remain cached across
all resources in a request instead of being re-sanitized for each one.

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
…arget_info

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
…ed methods

Remove the redundant clearResourceContext() call at the end of the
ResourceMetrics loop since setResourceContext() unconditionally
overwrites the cached labels at the start of each iteration.

Also remove two methods that became unused after the earlier change
to include promoted attributes in target_info:
- addPromotedAttributes
- getPromotedAttributeNames

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Use labels.Builder instead of ScratchBuilder for building promoted
resource attribute labels. Builder.Get()/Set() properly handles
duplicate label names that can arise when different OTLP attribute
names sanitize to the same Prometheus label name (e.g., "foo.bar"
and "foo_bar" both become "foo_bar").

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Use labels.Builder with Set() instead of ScratchBuilder.Add() for
scope attributes caching. This handles duplicate label names that
can arise when different OTLP attribute names sanitize to the same
Prometheus label name.

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
@aknuds1 aknuds1 force-pushed the arve/optimizations branch from 6ba9dcd to 18a502e Compare January 15, 2026 10:27
Since the caching refactor now stores resource and scope labels in
the converter's cached state, these parameters are no longer needed
in the datapoint-level functions.

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
@aknuds1 aknuds1 force-pushed the arve/optimizations branch from d2a79f0 to 75638c3 Compare January 15, 2026 10:32
Copy link
Member

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that building the labelBuilder in NewPrometheusConverter is a bit harder than I initially expected, since settings aren't passed as an argument there.

LGTM

Copy link
Member

@jesusvazquez jesusvazquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM nice work arve

Add a nil check for resourceLabels at the start of createAttributes
to return a clear error instead of panicking if the caller forgets
to call setResourceContext first.

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Copy link
Member

@krajorama krajorama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice job

aknuds1 and others added 2 commits January 16, 2026 10:56
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
@aknuds1 aknuds1 enabled auto-merge (squash) January 16, 2026 10:09
@aknuds1 aknuds1 merged commit 4afa76d into prometheus:main Jan 16, 2026
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants