Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

monitoring: configure alert notifications from site config#11427

Merged
bobheadxi merged 60 commits into
masterfrom
distribution/alert-notifications
Jun 13, 2020
Merged

monitoring: configure alert notifications from site config#11427
bobheadxi merged 60 commits into
masterfrom
distribution/alert-notifications

Conversation

@bobheadxi

@bobheadxi bobheadxi commented Jun 11, 2020

Copy link
Copy Markdown
Member

closes #10641

This PR introduces a new workflow:

  1. admin goes to site-admin/configuration and adds:
 "observability.alerts": [
    {
      "id": "email-notifier-1",
      "level": "warning",
      "notifier": {
        "type": "email",
        "addresses": "robert@sourcegraph.com"
      }
    },
    {
      "id": "email-notifier-2",
      "level": "critical",
      "notifier": {
        "type": "email",
        "addresses": "stephen@sourcegraph.com"
      }
    },
    {
      "id": "email-notifier-3",
      "level": "critical",
      "notifier": {
        "type": "email",
        "addresses": "robert@sourcegraph.com"
      }
    }
  ]
  1. the Overview dashboard (duplicate of the home) is updated to fire alerts to the above channels when x alerts by service dashboard is >0

This is done by a new grafana-wrapper that subscribes to site configuration updates and makes API calls to grafana to manipulate the alerts overview dashboard and notification channels.

TODO

@bobheadxi bobheadxi requested review from a team and emidoots June 11, 2020 09:07
@bobheadxi bobheadxi requested a review from a team as a code owner June 11, 2020 09:07
@keegancsmith keegancsmith removed request for a team June 11, 2020 11:47
@uwedeportivo

Copy link
Copy Markdown
Contributor

wow, super nice. i'll look at it in a bit

@uwedeportivo uwedeportivo left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what you can do about the grafana credentials

Comment thread docker-images/grafana/Dockerfile
Comment thread docker-images/grafana/cmd/grafana-wrapper/main.go
Comment thread docker-images/grafana/cmd/grafana-wrapper/main.go Outdated
Comment thread docker-images/grafana/cmd/grafana-wrapper/site.go
// update board
_, err = c.grafana.SetDashboard(ctx, *homeBoard, sdk.SetDashboardParams{Overwrite: true})
if err != nil {
problems = append(problems, newObservabilityAlertsProblem(fmt.Errorf("failed to update dashboard: %w", err)))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a logic error the linter caught here and in the return cases above. After appending to problems, c.problems isn't updated so this line does nothing.

http_port = 3370
# The (internal) http port to use
# In `sourcegraph/grafana`, 3370 is reverse-proxied to 3371
http_port = 3371

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I just realized that in the single-container Docker image sourcegraph/server we bundle just the Grafana server binary and use a different config file than this one. This may be a blocker to merging this PR, unless we state in the CHANGELOG / docs that observability.alerts does not work in single-container deployments (which would be a little odd)

https://sourcegraph.com/github.com/sourcegraph/sourcegraph@07ed1a324866e8d5f14262f9a9670cee10ca385d/-/blob/cmd/server/shared/monitoring.go#L11

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The release branch will be created Mon, so if you want to land this change in the release (I think we should) will either need to address this before-hand or document that it doesn't work in sourcegraph/server yet and link to a github issue (that sounds fine to me & better than not having this in the release).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created https://github.com/sourcegraph/sourcegraph/issues/11473 as follow-up and will include it in the changelog!

@emidoots emidoots left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left more feedback inline, but this is great! Feel free to address my comments as you see fit and then merge (no need to wait for me to review again).

Comment thread docker-images/grafana/cmd/grafana-wrapper/site.go Outdated
bobheadxi and others added 12 commits June 13, 2020 09:14
Co-authored-by: Stephen Gutekanst <stephen.gutekanst@gmail.com>
Co-authored-by: Stephen Gutekanst <stephen.gutekanst@gmail.com>
Co-authored-by: Stephen Gutekanst <stephen.gutekanst@gmail.com>
Co-authored-by: Stephen Gutekanst <stephen.gutekanst@gmail.com>
@bobheadxi

Copy link
Copy Markdown
Member Author

@slimsag thank you! sorry to do this so late but I went ahead and wrapped it up for branch cut (completely forgot about that!)

@emidoots

Copy link
Copy Markdown
Member

Great! Awesome work! (And get offline it’s the weekend :P)

@bobheadxi bobheadxi merged commit 8451bb7 into master Jun 13, 2020
@bobheadxi bobheadxi deleted the distribution/alert-notifications branch June 13, 2020 01:52
bobheadxi added a commit that referenced this pull request Jun 13, 2020
bobheadxi added a commit that referenced this pull request Jun 13, 2020
This reverts commit 8451bb7.
bobheadxi added a commit that referenced this pull request Jun 15, 2020
* monitoring: configure alert notifications from site config (#11427)

* never contribute warning if no notifiers are set

* check for grafana-wrapper status code

* return specific problem for observability alerts on sourcegraph/server

* rename GRAFANA_ to GRAFANA_INTERNAL_, add more docs

* indicate 3370 is a useful port with EXPOSE

* fix notifier schema

* improve changelog wording

Co-authored-by: Stephen Gutekanst <stephen.gutekanst@gmail.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

observability: Alerting should be easily configured through a file or site configuration

3 participants