Skip to content

Phase out native histogram feature flag, add scrape config option#17315

Merged
beorn7 merged 2 commits intomainfrom
beorn7/histogram2
Oct 15, 2025
Merged

Phase out native histogram feature flag, add scrape config option#17315
beorn7 merged 2 commits intomainfrom
beorn7/histogram2

Conversation

@beorn7
Copy link
Member

@beorn7 beorn7 commented Oct 9, 2025

Part of #16572.

Still TODO (in this PR):

  • Update feature flag documentation.
  • Update config documentation.
  • Verify that the various combination of options still work as expected.
  • Add more tests.

This commit adds a global and local scrape config option
scrape_native_histograms, which has to be set to true to ingest
native histograms.

To ease the transition, the feature flag is changed to simply set the
default of scrape_native_histograms to true.

Further implications:

  • The default scrape protocols now depend on the
    scrape_native_histograms setting.
  • Everywhere else, histograms are now "on by default".

Documentation beyond the one for the feature flag and the scrape
config are deliberately left out. See
#17232 for that.

Does this PR introduce a user-facing change?

[CHANGE] Turn native histograms into a regular feature, to be activated with `scrape_native_histograms` config setting.

@beorn7 beorn7 changed the title Phase out Phase out native histogram feature flag, add scrape config option Oct 9, 2025
@beorn7 beorn7 force-pushed the beorn7/histogram2 branch 5 times, most recently from 0ceecfd to f4f0fc5 Compare October 12, 2025 17:02
@beorn7 beorn7 marked this pull request as ready for review October 12, 2025 17:28
@beorn7 beorn7 force-pushed the beorn7/histogram2 branch from f4f0fc5 to 2e1b338 Compare October 12, 2025 22:43
@beorn7 beorn7 requested a review from juliusv as a code owner October 12, 2025 22:43
Copy link
Member

@krajorama krajorama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the ground rules aren't clear here. Can we agree that explicitly setting something (either directly or indirectly) in the configuration file has precedence over the ? This is not stated explicitly in the docs , but I think so far we worked this way and maybe we should state it.

That would mean that if I use the created-timestamp-zero-ingestion feature flag it can only overwrite the scrape protocol if it is not set globally or in the local scrape config. And it doesn't matter if the scrape protocol is set directly or set by using the scrape_native_histograms parameter.

Hmm?

@beorn7
Copy link
Member Author

beorn7 commented Oct 13, 2025

Can we agree that explicitly setting something (either directly or indirectly) in the configuration file has precedence over the ?

Is there a word missing? What did you want to write after the "the"?

@beorn7
Copy link
Member Author

beorn7 commented Oct 13, 2025

That would mean that if I use the created-timestamp-zero-ingestion feature flag it can only overwrite the scrape protocol if it is not set globally or in the local scrape config. And it doesn't matter if the scrape protocol is set directly or set by using the scrape_native_histograms parameter.

I don't think that logic makes sense. created-timestamp-zero-ingestion has to have precedence over implicitly selecting a non-proto exposition format by saying scrape_native_histogram: false.

Most users should and will never touch scrape_protocols. In the easier future, we can always set it to "proto first" because everyone will be happy with that. However, it changes to many moving parts (in Prometheus, on the network, in the scraped target) that we can just do that without a major release (IMHO). So we have to make sure that nobody suddenly gets protobuf scraping without having requested it implicitly or explicitly. On the other hand, everyone requesting a feature that needs protobuf needs to also activate protobuf negotiation or will be very confused otherwise.

So still assuming a user will not and does not want to touch scrape_protocols, there are the following scenarios:

  1. User wants neither CT zero injection nor NH scraping → does not set created-timestamp-zero-ingestion and sets scrape_native_histogram: false → no protobuf negotiation.
  2. User wants no CT zero injection but wants NH scraping → does not set created-timestamp-zero-ingestion and sets scrape_native_histogram: true → needs protobuf negotiation.
  3. User wants CT zero injection but no NH scraping → sets created-timestamp-zero-ingestion and sets scrape_native_histogram: false → needs protobuf negotiation
  4. User wants both CT zero injection and NH scraping → sets created-timestamp-zero-ingestion and sets scrape_native_histogram: true → needs protobuf negotiation

If I understand your idea above correctly, you would say scrape_native_histogram: false means that we do not negotiate protobuf, even if the user has set the created-timestamp-zero-ingestion feature flag.

However, the logic as implemented in this PR handles all four cases as needed. The problem is the asymmetry between the feature flag (which is one global setting) and the scrape_native_histogram setting (which has two levels of hierarchy: the global setting and the "local", i.e. per scrape config, setting. So this is the model implemented here:

  1. created-timestamp-zero-ingestion feature flag sets the default of scrape_protocols to "protobuf first". Otherwise, the default is an empty list AKA "empty").
  2. The global scrape config level leaves the default of scrape_protocols as is ("protobuf first" or "empty") (but you could override the value explicitly if you wanted).
  3. The local scrape config level might have its own explicitly set value for scrape_protocols, which would take absolute preference. If it does not, the inherited default value is checked. If it is a non-"empty" value, it is used directly (and it doesn't matter if the reason is the feature flag in step (1) or an explicit setting in step (2)). However, if it is indeed still "empty", the documented rule kicks in: If the locally effective setting of scrape_histograms is true, it is "protobuf first", otherwise it is "no protobuf".

So this IMHO fulfills all "intuitive" demands of the user:

  • As requested above, if the user hasn't touched scrape_protocols at all, protobuf is negotiated if and only if any feature is requested that needs protobuf. And this even works per job in the scrape config.
  • If the user has explicitly set scrape_protocols on any level (global or local), it always has preference.

@beorn7 beorn7 force-pushed the beorn7/histogram2 branch from 2e1b338 to 6ebeb0f Compare October 13, 2025 21:30
@beorn7
Copy link
Member Author

beorn7 commented Oct 14, 2025

As a general note to give us some relief: In v4, we should really give the scrape_protocols setting a static default again (with "protobuf first") and thereby remove this whole semantic complication again. I'll file a corresponding issue with the "breaking" label once this is in.

@beorn7 beorn7 force-pushed the beorn7/histogram2 branch from 437eec4 to db9c71a Compare October 14, 2025 17:24
@beorn7
Copy link
Member Author

beorn7 commented Oct 14, 2025

With the recent push:

  • Accepted your suggestions.
  • Changed the documentation for the global scrape_protocols setting avoiding the notion of a default value altogether.
  • Changed the documentation for --enable-feature=created-timestamp-zero-ingestion to be consistent with those changes.

Let's maybe meet tomorrow and discuss the details in person based on this version.

Applied the analyzer "modernize" to the test files.

$ go run golang.org/x/tools/gopls/internal/analysis/modernize/cmd/modernize@latest -fix -test ./...

Signed-off-by: beorn7 <beorn@grafana.com>
@beorn7 beorn7 force-pushed the beorn7/histogram2 branch 2 times, most recently from ec0c01e to 917a038 Compare October 14, 2025 17:57
Copy link
Member

@krajorama krajorama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as a compromise/workaround. Some nit comments.

I've approved mostly because of

I mostly think claiming that the default is already set here at the global level is confusing because it doesn't show up as such in the web view of the config (and that's where I would look first if I want to understand what's going on):

To me this means that we need an internal API that keeps track of what feature requests are enabled/disabled to avoid having to encode it in the default configuration. Feature flags behave differently from configuration. This is explicitly stated at the top of our documentation. Implementing that would mean that we can show the default on the UI , filtered through the feature flag.

@beorn7 beorn7 force-pushed the beorn7/histogram2 branch from 917a038 to bc966fc Compare October 15, 2025 12:24
@beorn7
Copy link
Member Author

beorn7 commented Oct 15, 2025

FTR: I don't think the problem with the "empty default" is because of feature flags. This particular problem is because we have a "dynamic default" that has to be set in each local scrape config separately. It will go away with #17336 .

@beorn7
Copy link
Member Author

beorn7 commented Oct 15, 2025

In different news: In some parts of the documentation, the list of scrape protocols was missing PrometheusText1.0.0 (even before this change). I'll add this, and then merge on green.

The detailed plan for this is laid out in
#16572 .

This commit adds a global and local scrape config option
`scrape_native_histograms`, which has to be set to true to ingest
native histograms.

To ease the transition, the feature flag is changed to simply set the
default of `scrape_native_histograms` to true.

Further implications:

- The default scrape protocols now depend on the
  `scrape_native_histograms` setting.
- Everywhere else, histograms are now "on by default".

Documentation beyond the one for the feature flag and the scrape
config are deliberately left out. See
#17232 for that.

Signed-off-by: beorn7 <beorn@grafana.com>
@beorn7 beorn7 force-pushed the beorn7/histogram2 branch from bc966fc to ad7d1ae Compare October 15, 2025 12:50
@beorn7
Copy link
Member Author

beorn7 commented Oct 15, 2025

FTR: I don't think the problem with the "empty default" is because of feature flags. This particular problem is because we have a "dynamic default" that has to be set in each local scrape config separately. It will go away with #17336 .

Ironically, if you set --enable-feature=created-timestamp-zero-ingestion, we are back to a static default effectively. It also shows up in the web UI:
image

@beorn7 beorn7 merged commit 460d19c into main Oct 15, 2025
46 checks passed
@beorn7 beorn7 deleted the beorn7/histogram2 branch October 15, 2025 13:30
songy23 pushed a commit to open-telemetry/opentelemetry-collector-contrib that referenced this pull request Dec 12, 2025
## Summary

This PR upgrades the `github.com/prometheus/prometheus` dependency from
v0.307.3 to v0.308.0.

## Changes Made

#### Fixed `RemoteWriteProtoMsg` unrecognized

In prometheus/prometheus v0.308.0, the remote write protocol message
types were moved from `github.com/prometheus/prometheus/config` to
`github.com/prometheus/client_golang/exp/api/remote` as part of
[prometheus/prometheus#17197](prometheus/prometheus#17197).

#### Fixed `CreatedTimestamp` → `StartTimestamp`

In
[prometheus/prometheus#17411](prometheus/prometheus#17411),
the Remote Write 2.0 spec was updated to 2.0-rc.4. The
`CreatedTimestamp` field was renamed to `StartTimestamp` and moved from
`TimeSeries` to individual `Sample` and `Histogram` messages.

#### Fixed not enough arguments in`api_v1.NewAPI` 

The `api_v1.NewAPI` function signature changed due to
[prometheus/prometheus#17470](prometheus/prometheus#17470)
- Added `appendMetadata bool` parameter for remote write metadata
handling. Added the new parameter `appendMetadata = false` -
prometheusreceiver scrapes targets, doesn't receive remote write
requests

## Remaining Issues

### `EnableNativeHistogramsIngestion` removed from `scrape.Options`

There's one change in this upgrade that I'd love to get the community's
input on before finalizing.

In
[prometheus/prometheus#17315](prometheus/prometheus#17315),
Prometheus removed the `EnableNativeHistogramsIngestion` field from
`scrape.Options`. Native histogram ingestion is now controlled through
the Prometheus scrape config option `scrape_native_histograms` rather
than programmatically.

This creates an interesting situation for us. Currently, we have the
feature gate `receiver.prometheusreceiver.EnableNativeHistograms` that
controls whether native histograms are converted to OTel exponential
histograms. But with this Prometheus change, the *ingestion* of native
histograms during scraping is now a separate concern controlled by
Prometheus config.

I see two paths forward here. We could automatically inject
`scrape_native_histograms: true` into the scrape config whenever our
feature gate is enabled, preserving the current behavior where enabling
the feature gate is all users need to do. Alternatively, we could simply
remove the line and require users to configure
`scrape_native_histograms: true` in their Prometheus scrape config
themselves, which is more explicit but adds an extra configuration step.

What do you think? I'm also wondering if this constitutes a breaking
change that should be called out in the changelog. Would appreciate any
thoughts!

---------

Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants