Skip to content

fix(textparse/protobuf): metric family name corrupted by NHCB parser#17156

Merged
krajorama merged 8 commits intomainfrom
krajo/native-nhcb-protobuf
Sep 8, 2025
Merged

fix(textparse/protobuf): metric family name corrupted by NHCB parser#17156
krajorama merged 8 commits intomainfrom
krajo/native-nhcb-protobuf

Conversation

@krajorama
Copy link
Member

@krajorama krajorama commented Sep 6, 2025

When PrometheusProto scrape protocol is used to scrape a target and there is an NHCB conversion, the wrong metadata is written into the WAL. This means the Remote-Write 2.0 will send empty metadata with NHCB samples.

The metric family name is returned by a reference by the NHCB parser , directly from the ProtoBuf parser. However after it is returned , the NHCB parser reads ahead and the slice is overwritten while still being referenced in scrape. See #17075 (comment)

We could copy the data, but we wanted to implement direct NHCB conversion in the ProtoBuf parser anyway, so
this PR does that.

NHCB generic parser is no longer needed on top of the ProtoBuf parser.

The implementation adds the conversion after classic series are returned by the parser to be consistent with how the generic NHCB parser works. The general parser emits classic series while collecting data for NHCB, hence the order is classic series first, NHCB next.

However this means that the code needs to be able to skip classic series if NHCB conversion is enabled, but classic
histograms are not needed. According to #13532.

Which issue(s) does the PR fix:

Related to: #17075

Does this PR introduce a user-facing change?

[BUGFIX] Fix metadata entries handling on `metadata-wal-records` feature for native histograms with custom buckets when scraped with `PrometheusProto` scrape protocol

The NHCB conversion does some validation, but we can only return error
from Parser.Next() not Parser.Histogram(). So the conversion needs to
happen in Next().

There are 2 cases:
1. "always_scrape_classic_histograms" is enabled, in which case we
convert after returning the classic series. This is to be consistent
with the PromParser text parser, which collects NHCB while spitting out
classic series; then returns the NHCB.
2. "always_scrape_classic_histograms" is disabled. In which case we never
return the classic series.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Do not return the first classic series from the EntryType state,
switch to EntrySeries. This means we need to start the histogram
field state from -3 , not -2.

In EntrySeries, skip classic series if needed.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
@krajorama krajorama force-pushed the krajo/native-nhcb-protobuf branch from 66edf6f to 9a047ee Compare September 6, 2025 19:09
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
@krajorama krajorama marked this pull request as ready for review September 6, 2025 21:44
NHCB parse doesn't always copy the metric name from the underlying
parser. When called via HELP, UNIT, the string is directly referenced
which means that the read-ahead of NHCB can corrupt it.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
@bwplotka
Copy link
Member

bwplotka commented Sep 8, 2025

Do you mind changing the changelog entry?

[BUGFIX] Remote Write 2.0 has metadata for native histograms with custom buckets when scraped with PrometheusProto scrape protocol.

This bug is affecting metadata records in general, not only remote write (it also affects RW 1.0 and metadata API). What about

[BUGFIX] Fix metadata entries handling on metadata-wal-recordsfeature for native histograms with custom buckets when scraped withPrometheusProto` scrape protocol.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing! Generally LGTM, thanks for this. This will be way more efficient.

Some nits and one question, otherwise good to go

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!!!

@krajorama krajorama merged commit acd9aa0 into main Sep 8, 2025
46 checks passed
@krajorama krajorama deleted the krajo/native-nhcb-protobuf branch September 8, 2025 15:26
Naman-B-Parlecha added a commit to Naman-B-Parlecha/prometheus that referenced this pull request Sep 17, 2025
Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

feat(histogram): Refactor ConvertNHCBToClassicHistogram and add unit tests

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

feat(histogram): Add ConvertNHCBToClassic flag and refactor related types and tests

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

Replace gopkg.in/yaml.v2 with go.yaml.in/yaml/v2 (prometheus#17151)

* Replace gopkg.in/yaml.v2 with go.yaml.in/yaml/v2
* Upgrade to client_golang@v1.23.2

---------

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>

OTLP to directly write to an interface which can hide storage details (prometheus#16951)

* OTLP writer writes directly to appender

Do not convert to Remote-Write 1.0 protocol. Convert to TSDB Appender interface instead.

For downstream projects that still convert OTLP to something else (e.g. Mimir using
its own RW 1.0+2.0 compatible protocol), introduce a compatibility layer between
OTLP decoding and TSDB Appender. This is the CombinedAppender that hides the
implementation. Name is subject to change.

---------

Signed-off-by: David Ashpole <dashpole@google.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>
Co-authored-by: David Ashpole <dashpole@google.com>
Co-authored-by: Jesus Vazquez <jesusvazquez@users.noreply.github.com>
Co-authored-by: Arve Knudsen <arve.knudsen@gmail.com>

fix(textparse/protobuf): metric family name corrupted by NHCB parser (prometheus#17156)

* fix(textparse): implement NHCB parsing in ProtoBuf parser directly

The NHCB conversion does some validation, but we can only return error
from Parser.Next() not Parser.Histogram(). So the conversion needs to
happen in Next().

There are 2 cases:
1. "always_scrape_classic_histograms" is enabled, in which case we
convert after returning the classic series. This is to be consistent
with the PromParser text parser, which collects NHCB while spitting out
classic series; then returns the NHCB.
2. "always_scrape_classic_histograms" is disabled. In which case we never
return the classic series.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

* refactor(textparse): skip classic series instead of adding NHCB around

Do not return the first classic series from the EntryType state,
switch to EntrySeries. This means we need to start the histogram
field state from -3 , not -2.

In EntrySeries, skip classic series if needed.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

* reuse nhcb converter

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

* test(textparse/nhcb): test corrupting metric family name

NHCB parse doesn't always copy the metric name from the underlying
parser. When called via HELP, UNIT, the string is directly referenced
which means that the read-ahead of NHCB can corrupt it.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

Handle error gracefully for the `desymbolizeLabels` function in prompb/io/prometheus/write/v2/symbols.go (prometheus#17160)

Signed-off-by: pipiland <user123@Minhs-Macbook.local>

---------

Signed-off-by: pipiland <user123@Minhs-Macbook.local>
Co-authored-by: pipiland <user123@Minhs-Macbook.local>

feat(histogram): Refactor NHCB to classic conversion with enhanced validation and update tests

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

feat(histogram): Update NHCB to classic conversion logic and remove tests

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

refactor(histogram): Update ConvertNHCBToClassicHistogram

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

test cases for new conversion

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

refactor(histogram): Simplify label building in ConvertNHCBToClassicHistogram

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants