Skip to content

Add secret provider interface under feature flag#13955

Open
TheSpiritXIII wants to merge 1 commit intoprometheus:mainfrom
TheSpiritXIII:secret-providers
Open

Add secret provider interface under feature flag#13955
TheSpiritXIII wants to merge 1 commit intoprometheus:mainfrom
TheSpiritXIII:secret-providers

Conversation

@TheSpiritXIII
Copy link

@TheSpiritXIII TheSpiritXIII commented Apr 18, 2024

Fixes: #8551
Partially fixes: prometheus/alertmanager#3108
Group Discussion: https://groups.google.com/g/prometheus-developers/c/WKOej_pnhXg
Initial Design Doc: https://docs.google.com/document/d/1EqHd2EwQxf9SYD8-gl3sgkwaU6A10GhiN7aw-2kx7NU/edit

Description

Interface

This PR adds a Provider interface, which is registered similarly to service discovery interfaces. There is a respective RegisterProvider which can let us register a provider within an init method.

Implementers implement Apply(configs []Config[yaml.Node]) ([]Secret, error). A sample config looks like this:

secrets:
- name: secret_a
  kubernetes_sp_config:
    api_server: "localhost:8080"
    secret:
      namespace: ns1
      name: secret1
      key: password

Your scrape config would look as follows:

scrape_config:
  job_name: job1
  basic_auth:
    username: admin
    password_ref: secret_a # use secret_a which loads from a Kubernetes secret.

Providers output a Secret which is just a func(ctx context.Context, node *yaml.Node) -> (string, error). This allows secrets to be fetched on-demand, or cached. Up to the implementer.

Usage

Before using this functionality, users must enable the "secret-providers" feature flag.

Default Providers

This PR only adds two default providers out of the box: inline and file, which allows users to configure their existing secrets with the new configuration, e.g. for BasicAuth replace password and password_file respectively.

secrets:
- name: secret_a
  inline: hunter2

Or:

secrets:
- name: secret_a
  file: password.txt

Maintainability

Test coverage here is ≥90 to ensure no breaking changes.

One note is that I've provided both yaml.v2 and yaml.v3 versions of the Configs object.

Pull Requests

  1. Use common interface to fetch secrets in HTTP client config common#538
  2. Add support for secret refs via a secret manager common#572
  3. This one!
  4. Kubernetes Secret Provider: [WIP] Add a native Kubernetes secret provider #13956

@beorn7
Copy link
Member

beorn7 commented May 2, 2024

bwplotka requested a review from beorn7 last week

Generally, this feature looks good to me. For a serious review, I needed to get deeper into the subject, for which I won't have time anytime soon. I have a huge review backlog already, and we want to get native histograms stable for the Prom 3 release in September, so I'm afraid I cannot add more things to my plate right now.

Signed-off-by: Daniel Hrabovcak <thespiritxiii@gmail.com>
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I reviewed initially! Thanks for this work, looks like a lot of thought and work was put into it.

I started commenting on implementation details (see some comments), but tried to keep it high level for now, as I wanted to double check if we really want the YAML format proposed here. I think I preferred the format actually proposed in your design doc (BTW can we add this doc to https://github.com/prometheus/proposals with the gist of it, for tracking purposes and I guess final discussion around format?)

Current idea is for users with e.g. multiple Kubernetes secrets do as follows (used your example to construct more complex, realistic one)

secrets:
- name: secret_a # Can be referenced in various parts as `password_ref: secret_a`
  kubernetes_sp_config:
    // Realistically user has to add more info in practice, similar to: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config
    api_server: "localhost:8080"
    secrets_to_access it....
    version...
    tls_ca: ....
    proxy_url...
    follow_redirects...
    namespaces...
    selectors....
    enable_http2
    secret:
      namespace: ns1
      name: secret1
      key: password
- name: secret_b # Can be referenced in various parts as `password_ref: secret_b`
  kubernetes_sp_config:
    // Realistically user has to add more info in practice, similar to: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config
    api_server: "localhost:8080"
    secrets_to_access it....
    version...
    tls_ca: ....
    proxy_url...
    follow_redirects...
    namespaces...
    selectors....
    enable_http2
    secret:
      namespace: ns1
      name: secret2
      key: password
- name: secret_c # Can be referenced in various parts as `password_ref: secret_c`
  kubernetes_sp_config:
    // Realistically user has to add more info in practice, similar to: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config
    api_server: "localhost:8080"
    secrets_to_access it....
    version...
    tls_ca: ....
    proxy_url...
    follow_redirects...
    namespaces...
    selectors....
    enable_http2
    secret:
      namespace: ns1
      name: secret3
      key: password
 - and so on...

This feels a bit verbose 🤔 plus we will hit various config size limits even sooner for large Prometheus setups.

I know we do "kind of" similar for scrape configuration, but it's totally doable (and recommended) to put ALL your Kubernetes jobs under a single job with relabelling rules that will do what you need to group into smaller job labels (e.g. per pod).

The counter argument would be that maybe typically we don't need so many secrets in practice? Would nice to check on production Prometheuses.. I bet we could find like 2-4 against exactly same provider, which would mean we could invest in some grouping format?

Let me know what's the best place to play with different alternatives (I guess https://github.com/prometheus/proposals) and if even changing format is on the table here. 🤗

Sorry for challenging format again!

reloader: func(cfg *config.Config) error {
if secretManager == nil {
if cfg.SecretProviders != nil && len(*cfg.SecretProviders) > 0 {
return errors.New("secret providers are disabled")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we give a bit more context on this error to the config applier/Prometheus logs? What flags has to be enabled?

)

replace (
github.com/prometheus/common => github.com/TheSpiritXIII/prometheus-common v0.50.0-gmp.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could upgrade common and remove this line, no?


// Configs is an nested collection of Config. For custom marshalling and
// unmarshalling, use a pointer to this type.
type Configs []ProviderConfig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is bit confusing. I had to clone this code and navigate to understand what is happening. Essentially why Configs is not listing actual configs but ProviderConfigs? If Provider Config is what callers of this package should use, why not rename this?

EDIT: After some time I realize your idea -- the type like Configs is NOT what is unmarshalled really - this type is what we will use internally, the unmarshal format is entirely different.

I am not sure about this, it's very confusing. It's like

type A struct{ ... bunch of field...}

type BA struct{ ... bunch of different exported fields}

type ArrayA []BA

func (c *ArrayA) UnmarshalYAML(...

// later...
unmarshal(bytes, &ArrayA)
// Surprise, why you tried to give me YAML with BA structure?????

I guess alternative this is because you try to group by provider? Even if they potentially target different APIs? Is it worth the readabily penalty here 🤔

ScrapeConfigs []*ScrapeConfig `yaml:"scrape_configs,omitempty"`
StorageConfig StorageConfig `yaml:"storage,omitempty"`
TracingConfig TracingConfig `yaml:"tracing,omitempty"`
SecretProviders *secrets.Configs `yaml:"secrets,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, but this requires a bit of documentation for users (and reviewers, and future code readers) to understand this.

Perhaps https://github.com/prometheus/prometheus/blob/67326d8e5cab4251341b4c76bc6e9f9668865614/docs/configuration/configuration.md is a proper file for this?

// unmarshalling, use a pointer to this type.
type Configs []ProviderConfig

// UnmarshalYAML implements yaml.v2.Unmarshaler.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we explain in comment, briefly, what does this custom unmarshal? 🤗

// configuration from being loaded and must be avoided unless strictly
// necessary, such as a decoding issue. Errors are best reported
// during secret fetch-time.
Apply(configs []Config[T]) ([]Secret, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm.. it feels here we will ALWAYS use []Config[yaml.Node] (or []MountConfig[yaml.Node]) - is it left over from yaml 3 and 2 support?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you strongly insist on having two parsers supported here, we could at least give clear comment. For Prometheus it feels we don't need this and we can simply switch to yaml3 here once we want?


// Close closes this provider, unregistering any provider-specific
// metrics if the metrics registry is provided.
Close(reg prometheus.Registerer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm nice pattern, but we used to simply pass reg in constructor for this. Maybe to keep consistency, remove this argument and keep the commentary about unregistering metrics?

)

func init() {
secrets.RegisterProviderExact("file", func(_ context.Context, _ secrets.ProviderOptions, _ prometheus.Registerer) secrets.Provider[*yaml.Node] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Honest opinion is that to me NOT having those magic "funcs" and having a bit of boilerplate for defining provider would read better on how provider is actually implemented.

Currently this is requires a lot of context on how things are done, how this applies config, how this closes and unregisters metrics. No idea unless I clone this code in my IDE and navigate 🤔

I am not sure we should optimize for provider addition being oneliner 🙈

I wonder if removing this Map* nodes would not help to read this flow better.

filename: "scrape_config_files_scrape_protocols2.bad.yml",
errMsg: `parsing YAML file testdata/scrape_config_files_scrape_protocols2.bad.yml: duplicated protocol in scrape_protocols, got [OpenMetricsText1.0.0 PrometheusProto OpenMetricsText1.0.0] for scrape config with job name "node"`,
},
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a few examples how inline/file would work (valid configs with inline and file?)

@@ -0,0 +1,2 @@
secrets:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add one for unexisting provider?

@bboreham
Copy link
Member

bboreham commented Feb 4, 2025

Hello from the bug-scrub!

It looks like there was a lot of work done here, both in the code and the review by @bwplotka. Thanks!
Do you think you will come back to it @TheSpiritXIII ?

(while scanning, I noticed that go.sum has a huge number of changes - do they disappear with go mod tidy?)

@bwplotka
Copy link
Member

bwplotka commented Feb 4, 2025

Yup, aware, we are on it!

@krajorama
Copy link
Member

Hello from the bug scrub!

@bwplotka @TheSpiritXIII is this PR still under development? thanks!

@siavashs
Copy link
Contributor

siavashs commented Nov 12, 2025

Pinged the author (@TheSpiritXIII ) to revive the PR.
If not done, we could take it over to update and merge.
Alertmanager issue prometheus/alertmanager#3108

@bboreham
Copy link
Member

Hello from the bug-scrub!

@siavashs as several months have gone by I think it would be fine for you to pick it up.
Please comment below if you do, so we are transparent.

@siavashs
Copy link
Contributor

Hello @bboreham, I will look into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Proposal: allow alert manager to read from secret resources Generally enable reading secrets from files

6 participants