Skip to content

Conversation

@bwplotka
Copy link
Collaborator

@bwplotka bwplotka commented Jun 27, 2025

After time we realized that a bug caused the operator config export filtering logic to be not used from the GMP 0.14 release onwards. We decided to keep this field NOT applicable (changing this field being noop) going forward, so this PR makes the deprecation/removal of this functionality clear with the operation and export code removed.

  • This feature since the beginning was extremely hard to use and confusing to users, users constantly assumed list of matchers are "AND-ed" while they are "OR-ed" or they are were filtering partial histograms (demo).
  • This feature is not stopping collectors (Prometheus) to scrape the filtered out series, it only NOT send them forward. This limits the use of them for collector efficiency scenarios.
  • We have evidences of users already starting to "misconfigure" their collection with this field, but they were "lucky" those filters are accidently turned off. "Fixing" them will immediately break those users in a hidden way.
  • Finally, better alternatives exists https://cloud.google.com/stackdriver/docs/managed-prometheus/setup-managed#filter-metrics

This feature, for dynamic config based setting in export pkg is already deprecated: GoogleCloudPlatform/prometheus#237

Post-mortem

For security hardening, when introducing reloadable config pieces (compression, credentials and matchers) in 0.14 we updated export package to load things from the correct place, but the matchers are initialized and copied into seriesCache and never updated later:

seriesCache: newSeriesCache(logger, reg, opts.MetricTypePrefix, opts.Matchers),

This affects both collector and rule-eval AFAIK.

We didn't have a good tests to test this e2e too.

AIs:

  • Update public docs
  • Remove this functionality (this PR and future)
  • Ensure proper e2e test coverage of compression option (credentialFile is tested).
  • Add this deprecation to the 0.14 changelog

NOTE: We don't plan to add deprecation warning to 0.15.

@bwplotka
Copy link
Collaborator Author

bwplotka commented Jun 27, 2025

@bwplotka bwplotka changed the title feat(operator): deprecate the operatorconfig.collection.filter.matchOneOf functionality feat(operator): deprecate operatorconfig.collection.filter.matchOneOf Jun 27, 2025
@bwplotka bwplotka force-pushed the filtering branch 2 times, most recently from 26d83ae to 4941d55 Compare June 27, 2025 11:41
@pintohutch
Copy link
Collaborator

func (m *Matchers) Matches(lset labels.Labels) bool {
if len(*m) == 0 {
return true
}
for _, sel := range *m {
if sel.Matches(lset) {
return true
}
}
return false
}
is a logical OR, no?

@lyanco
Copy link
Collaborator

lyanco commented Jul 7, 2025

I'm not sure anymore whether it was an OR or an AND... but also I very much remember people shooting themselves in the foot with this back in 2022 because it wasn't implemented the way people thought it would be implemented, to the point where we had to change the examples in the docs to be negative matchers on metrics that don't exist because people kept copy-pasting the older examples verbatim and accidentally filtering out all their metrics.

@bwplotka
Copy link
Collaborator Author

Yup, double checked and indeed it's OR-ed, which only proves user can get misleading. Another (production cx) example is with broken histograms (demo).

Anyway, PR still stands I think we should proceed PTAL @pintohutch @dashpole

@lyanco you are ok with this approach (deprecating and removing?)

@bwplotka
Copy link
Collaborator Author

While we wished we could reenabled this under some different, bit easier to use config API surface, the decision with @lyanco is to remove it for as other alternatives exists and focus on more important tasks.

Copy link
Collaborator

@dashpole dashpole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving go code changes. I don't have context on whether this is a good idea or not :)

@bwplotka
Copy link
Collaborator Author

NOTE: Tests are failing on some struct unmarshalling. I fixed those errors on #1711 and I expect #1711 to be merged first and this conflict resolved and merged.

@pintohutch
Copy link
Collaborator

Given the footgun-ness of the feature as Lee described, I agree we should proceed with the PRs and be explicit in our kubebuilder tags to mark the CR field as deprecated.

@bwplotka bwplotka force-pushed the filtering branch 2 times, most recently from cda7708 to d0b8468 Compare August 28, 2025 10:21
@bwplotka bwplotka force-pushed the filtering branch 2 times, most recently from af99d2e to df6370a Compare August 28, 2025 12:42
@bwplotka
Copy link
Collaborator Author

Should be rdy for review PTAL!

@bwplotka bwplotka force-pushed the filtering branch 2 times, most recently from 792de57 to a685b1d Compare August 28, 2025 14:53
@bernot-dev
Copy link
Collaborator

Looks like tests are still failing.

@bwplotka
Copy link
Collaborator Author

Looks like tests are still failing.

Yea, I mentioned this on our standup. It's odd.. couldn't figure out why without local debugging session to do... will check tomorrow ):

@bwplotka bwplotka force-pushed the filtering branch 3 times, most recently from c2a6fae to f9557b5 Compare August 29, 2025 02:58
@bwplotka bwplotka changed the title feat(operator): deprecate operatorconfig.collection.filter.matchOneOf feat(operator): deprecate operatorconfig(..)matchOneO; fix collection compression propagation Aug 29, 2025
@bwplotka bwplotka changed the title feat(operator): deprecate operatorconfig(..)matchOneO; fix collection compression propagation feat(operator): deprecate operatorconfig.collection.filter.matchOneOf Aug 29, 2025
@bwplotka
Copy link
Collaborator Author

OK, I found the bug -- it's due to me extending the tests, not related to this change.

Fixed in another PR (#1751) and chained this PR on it.

@bwplotka bwplotka changed the base branch from main to compression-fix August 29, 2025 03:38
@bwplotka
Copy link
Collaborator Author

Don't merge before #1751 is merged pls (:

@bwplotka bwplotka requested a review from dashpole August 29, 2025 03:54
@bwplotka bwplotka force-pushed the compression-fix branch 2 times, most recently from 2f4f7c7 to 4bd847d Compare August 29, 2025 04:19
@bwplotka bwplotka changed the base branch from compression-fix to main August 29, 2025 15:04
After time we realized that a bug caused the operator config export
filtering logic to be not used from the GMP 0.14 release onwards. We
decided to keep this field NOT applicable (changing this field being
noop) going forward, so this PR makes the deprecation/removal of this
functionality clear with the operation and export code removed.

* This feature since the beginning was extremely hard to use and
confusing to users, users constantly assumed list of matchers are
"AND-ed" while they are "OR-ed" or they are were filtering partial
histograms
([demo](a74a72f)).
* This feature is not stopping collectors (Prometheus) to scrape the
filtered out series, it only NOT send them forward. This limits the use
of them for collector efficiency scenarios.
* We have evidences of users already starting to "misconfigure" their
collection with this field, but they were "lucky" those filters are
accidentally turned off. "Fixing" them will immediately break those
users in a hidden way.
* Finally, better alternatives exists
https://cloud.google.com/stackdriver/docs/managed-prometheus/setup-managed#filter-metrics

This feature, for dynamic config based setting in export pkg is already
deprecated: GoogleCloudPlatform/prometheus#237

For security hardening, when introducing reloadable config pieces
(compression, credentials and matchers) in 0.14 we updated export
package to load things from the correct place, but the matchers are
initialized and copied into seriesCache and never updated later:
https://github.com/GoogleCloudPlatform/prometheus-engine/blob/10c5314b02cc99bd0ee2e71f2588b7aac870c329/pkg/export/export.go#L419

This affects both collector and rule-eval AFAIK.

We didn't have a good tests to test this e2e too.

AIs:
* [X] Update public docs
* [x] Remove this functionality (this PR and future)
* [x] Ensure proper e2e test coverage of compression option
(credentialFile is tested).
* [X] Add this deprecation to the 0.14 changelog

NOTE: We don't plan to add deprecation warning to 0.15.

Signed-off-by: bwplotka <bwplotka@google.com>
@bwplotka
Copy link
Collaborator Author

Given new revelations, let's reconsider this (:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants