Skip to content

Proposal: allow alert manager to read from secret resources #3108

@qinxx108

Description

@qinxx108

Problem

The prometheus alert manager (open source) right now support read password from file for part of the receivers, see issue (prometheus/prometheus#8551). And also besides read from file, we want to also be able to read from the other password provider for example AWS secret manager, Azure Key Vault provider etc.

We do have the secrets store CSI driver (https://secrets-store-csi-driver.sigs.k8s.io/getting-started/installation.html) to mount the secrets to pod so that we can use the receivers’s {KEY_NAME}_file to read the password. However that require user’s to maintain the additional CSI inside of the same cluster.

The ideal solution will be the alert manager itself can resolve the secrets in itself and cache that for a configurable time, re-fetch the secrets after timeout

Proposed configuration

In <pagerduty_config>

# The following two options are mutually exclusive.
# The PagerDuty integration key (when using PagerDuty integration type `Events API v2`).
routing_key: [<tmpl_secret>]
# The PagerDuty integration key (when using PagerDuty integration type `Prometheus`).
service_key: [<tmpl_secret>]

Will accept two new configuration

# The following two options are mutually exclusive.
# The PagerDuty integration key (when using PagerDuty integration type `Events API v2`).
routing_key_config: [<secret_config>]
# The PagerDuty integration key (when using PagerDuty integration type `Prometheus`).
service_key_config: [<secret_config>]

A new configuration type of <secret_config> will be introduced, the config will have the following fields so that users will use this to configure the type and id of the secrets to fetch from.

# "file", "aws_secret_manager", "azure_key_vault", etc.
type: <string>

aws_secret_manager: 
  
  # The arn of the secrets to fetch from
  secret_arn: <tmpl_string>
  
  # How long secrets stay in the memory. After timeout the secret will be refetch from the source
  [ timeout: <duration> | default = 5s ]
  
  # Configures AWS's Signature Verification 4 signing process to sign requests.
  sigv4:
    [ <sigv4_config> ]
  
file: 
  path: <filepath>

Alert manager Examples

aws_secret_manager type example

global:
  resolve_timeout: 5s
route:
  receiver: 'pager-duty-notifications'
  group_by: [alertname, datacenter, app]

receivers:
- name: 'pager-duty-notifications'
  pagerduty_configs:
    - service_key_config:
        type: aws_secret_manager
        aws_secret_manager: 
          secret_arn: 'arn:aws:secretsmanager:us-west-2:111:secret:test-123'
          sigv4:
            region: us-west-2

Prometheus example

basic_auth:
  [ username: <string>]
  [ password_config: <secret_config>]
basic_auth:
  username: test
  password_config: 
    type: aws_secret_manager
    aws_secret_manager: 
      secret_arn: 'arn:aws:secretsmanager:us-west-2:111:secret:test-123'
      sigv4:
        region: us-west-2

Please let me know what do you think

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    In progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions