Add secret provider interface under feature flag#13955
Add secret provider interface under feature flag#13955TheSpiritXIII wants to merge 1 commit intoprometheus:mainfrom
Conversation
8718dbb to
83c36b3
Compare
a78b342 to
f8ac453
Compare
|
Generally, this feature looks good to me. For a serious review, I needed to get deeper into the subject, for which I won't have time anytime soon. I have a huge review backlog already, and we want to get native histograms stable for the Prom 3 release in September, so I'm afraid I cannot add more things to my plate right now. |
Signed-off-by: Daniel Hrabovcak <thespiritxiii@gmail.com>
f8ac453 to
7ebab5b
Compare
bwplotka
left a comment
There was a problem hiding this comment.
Ok, I reviewed initially! Thanks for this work, looks like a lot of thought and work was put into it.
I started commenting on implementation details (see some comments), but tried to keep it high level for now, as I wanted to double check if we really want the YAML format proposed here. I think I preferred the format actually proposed in your design doc (BTW can we add this doc to https://github.com/prometheus/proposals with the gist of it, for tracking purposes and I guess final discussion around format?)
Current idea is for users with e.g. multiple Kubernetes secrets do as follows (used your example to construct more complex, realistic one)
secrets:
- name: secret_a # Can be referenced in various parts as `password_ref: secret_a`
kubernetes_sp_config:
// Realistically user has to add more info in practice, similar to: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config
api_server: "localhost:8080"
secrets_to_access it....
version...
tls_ca: ....
proxy_url...
follow_redirects...
namespaces...
selectors....
enable_http2
secret:
namespace: ns1
name: secret1
key: password
- name: secret_b # Can be referenced in various parts as `password_ref: secret_b`
kubernetes_sp_config:
// Realistically user has to add more info in practice, similar to: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config
api_server: "localhost:8080"
secrets_to_access it....
version...
tls_ca: ....
proxy_url...
follow_redirects...
namespaces...
selectors....
enable_http2
secret:
namespace: ns1
name: secret2
key: password
- name: secret_c # Can be referenced in various parts as `password_ref: secret_c`
kubernetes_sp_config:
// Realistically user has to add more info in practice, similar to: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config
api_server: "localhost:8080"
secrets_to_access it....
version...
tls_ca: ....
proxy_url...
follow_redirects...
namespaces...
selectors....
enable_http2
secret:
namespace: ns1
name: secret3
key: password
- and so on...This feels a bit verbose 🤔 plus we will hit various config size limits even sooner for large Prometheus setups.
I know we do "kind of" similar for scrape configuration, but it's totally doable (and recommended) to put ALL your Kubernetes jobs under a single job with relabelling rules that will do what you need to group into smaller job labels (e.g. per pod).
The counter argument would be that maybe typically we don't need so many secrets in practice? Would nice to check on production Prometheuses.. I bet we could find like 2-4 against exactly same provider, which would mean we could invest in some grouping format?
Let me know what's the best place to play with different alternatives (I guess https://github.com/prometheus/proposals) and if even changing format is on the table here. 🤗
Sorry for challenging format again!
| reloader: func(cfg *config.Config) error { | ||
| if secretManager == nil { | ||
| if cfg.SecretProviders != nil && len(*cfg.SecretProviders) > 0 { | ||
| return errors.New("secret providers are disabled") |
There was a problem hiding this comment.
Can we give a bit more context on this error to the config applier/Prometheus logs? What flags has to be enabled?
| ) | ||
|
|
||
| replace ( | ||
| github.com/prometheus/common => github.com/TheSpiritXIII/prometheus-common v0.50.0-gmp.0 |
There was a problem hiding this comment.
We could upgrade common and remove this line, no?
|
|
||
| // Configs is an nested collection of Config. For custom marshalling and | ||
| // unmarshalling, use a pointer to this type. | ||
| type Configs []ProviderConfig |
There was a problem hiding this comment.
This is bit confusing. I had to clone this code and navigate to understand what is happening. Essentially why Configs is not listing actual configs but ProviderConfigs? If Provider Config is what callers of this package should use, why not rename this?
EDIT: After some time I realize your idea -- the type like Configs is NOT what is unmarshalled really - this type is what we will use internally, the unmarshal format is entirely different.
I am not sure about this, it's very confusing. It's like
type A struct{ ... bunch of field...}
type BA struct{ ... bunch of different exported fields}
type ArrayA []BA
func (c *ArrayA) UnmarshalYAML(...
// later...
unmarshal(bytes, &ArrayA)
// Surprise, why you tried to give me YAML with BA structure?????
I guess alternative this is because you try to group by provider? Even if they potentially target different APIs? Is it worth the readabily penalty here 🤔
| ScrapeConfigs []*ScrapeConfig `yaml:"scrape_configs,omitempty"` | ||
| StorageConfig StorageConfig `yaml:"storage,omitempty"` | ||
| TracingConfig TracingConfig `yaml:"tracing,omitempty"` | ||
| SecretProviders *secrets.Configs `yaml:"secrets,omitempty"` |
There was a problem hiding this comment.
Nice, but this requires a bit of documentation for users (and reviewers, and future code readers) to understand this.
Perhaps https://github.com/prometheus/prometheus/blob/67326d8e5cab4251341b4c76bc6e9f9668865614/docs/configuration/configuration.md is a proper file for this?
| // unmarshalling, use a pointer to this type. | ||
| type Configs []ProviderConfig | ||
|
|
||
| // UnmarshalYAML implements yaml.v2.Unmarshaler. |
There was a problem hiding this comment.
Can we explain in comment, briefly, what does this custom unmarshal? 🤗
| // configuration from being loaded and must be avoided unless strictly | ||
| // necessary, such as a decoding issue. Errors are best reported | ||
| // during secret fetch-time. | ||
| Apply(configs []Config[T]) ([]Secret, error) |
There was a problem hiding this comment.
Hm.. it feels here we will ALWAYS use []Config[yaml.Node] (or []MountConfig[yaml.Node]) - is it left over from yaml 3 and 2 support?
There was a problem hiding this comment.
If you strongly insist on having two parsers supported here, we could at least give clear comment. For Prometheus it feels we don't need this and we can simply switch to yaml3 here once we want?
|
|
||
| // Close closes this provider, unregistering any provider-specific | ||
| // metrics if the metrics registry is provided. | ||
| Close(reg prometheus.Registerer) |
There was a problem hiding this comment.
hmm nice pattern, but we used to simply pass reg in constructor for this. Maybe to keep consistency, remove this argument and keep the commentary about unregistering metrics?
| ) | ||
|
|
||
| func init() { | ||
| secrets.RegisterProviderExact("file", func(_ context.Context, _ secrets.ProviderOptions, _ prometheus.Registerer) secrets.Provider[*yaml.Node] { |
There was a problem hiding this comment.
nit: Honest opinion is that to me NOT having those magic "funcs" and having a bit of boilerplate for defining provider would read better on how provider is actually implemented.
Currently this is requires a lot of context on how things are done, how this applies config, how this closes and unregisters metrics. No idea unless I clone this code in my IDE and navigate 🤔
I am not sure we should optimize for provider addition being oneliner 🙈
I wonder if removing this Map* nodes would not help to read this flow better.
| filename: "scrape_config_files_scrape_protocols2.bad.yml", | ||
| errMsg: `parsing YAML file testdata/scrape_config_files_scrape_protocols2.bad.yml: duplicated protocol in scrape_protocols, got [OpenMetricsText1.0.0 PrometheusProto OpenMetricsText1.0.0] for scrape config with job name "node"`, | ||
| }, | ||
| { |
There was a problem hiding this comment.
Can we add a few examples how inline/file would work (valid configs with inline and file?)
| @@ -0,0 +1,2 @@ | |||
| secrets: | |||
There was a problem hiding this comment.
Can we add one for unexisting provider?
|
Hello from the bug-scrub! It looks like there was a lot of work done here, both in the code and the review by @bwplotka. Thanks! (while scanning, I noticed that |
|
Yup, aware, we are on it! |
|
Hello from the bug scrub! @bwplotka @TheSpiritXIII is this PR still under development? thanks! |
|
Pinged the author (@TheSpiritXIII ) to revive the PR. |
|
Hello from the bug-scrub! @siavashs as several months have gone by I think it would be fine for you to pick it up. |
|
Hello @bboreham, I will look into it. |
Fixes: #8551
Partially fixes: prometheus/alertmanager#3108
Group Discussion: https://groups.google.com/g/prometheus-developers/c/WKOej_pnhXg
Initial Design Doc: https://docs.google.com/document/d/1EqHd2EwQxf9SYD8-gl3sgkwaU6A10GhiN7aw-2kx7NU/edit
Description
Interface
This PR adds a
Providerinterface, which is registered similarly to service discovery interfaces. There is a respectiveRegisterProviderwhich can let us register a provider within aninitmethod.Implementers implement
Apply(configs []Config[yaml.Node]) ([]Secret, error). A sample config looks like this:Your scrape config would look as follows:
Providers output a
Secretwhich is just afunc(ctx context.Context, node *yaml.Node) -> (string, error). This allows secrets to be fetched on-demand, or cached. Up to the implementer.Usage
Before using this functionality, users must enable the "secret-providers" feature flag.
Default Providers
This PR only adds two default providers out of the box:
inlineandfile, which allows users to configure their existing secrets with the new configuration, e.g. for BasicAuth replacepasswordandpassword_filerespectively.Or:
Maintainability
Test coverage here is ≥90 to ensure no breaking changes.
One note is that I've provided both
yaml.v2andyaml.v3versions of theConfigsobject.Pull Requests