Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

monitoring: automatically deploy grafana notifiers (round 2)#11483

Merged
bobheadxi merged 12 commits into
masterfrom
distribution/grafana-notifiers-2
Jun 15, 2020
Merged

monitoring: automatically deploy grafana notifiers (round 2)#11483
bobheadxi merged 12 commits into
masterfrom
distribution/grafana-notifiers-2

Conversation

@bobheadxi

@bobheadxi bobheadxi commented Jun 14, 2020

Copy link
Copy Markdown
Member

Why https://github.com/sourcegraph/sourcegraph/pull/11479 was reverted: https://github.com/sourcegraph/sourcegraph/pull/11479

This PR:

  • updated conf check:
    • disables conf check when no alerts are configured
    • explicitly checks for sourcegraph/server
  • still unable to work out where GRAFANA_PORT might be incorrectly set, so rename it to GRAFANA_INTERNAL_PORT (more descriptive anyway)

relevant changes here, this branch starts from a cherry-picked #11427 (already reviewed)

testing in:

@bobheadxi bobheadxi requested review from a team and emidoots and removed request for a team June 14, 2020 23:25
@codecov

codecov Bot commented Jun 14, 2020

Copy link
Copy Markdown

Codecov Report

Merging #11483 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #11483   +/-   ##
=======================================
  Coverage   46.85%   46.85%           
=======================================
  Files        1392     1392           
  Lines       78777    78777           
  Branches     6762     6762           
=======================================
  Hits        36913    36913           
  Misses      38359    38359           
  Partials     3505     3505           
Flag Coverage Δ
#go 51.17% <0.00%> (ø)
#storybook 6.91% <0.00%> (ø)
#typescript 35.53% <0.00%> (ø)
#unit 46.71% <0.00%> (ø)

Comment thread CHANGELOG.md Outdated
Comment thread cmd/frontend/internal/app/debug.go Outdated
Co-authored-by: Stephen Gutekanst <stephen.gutekanst@gmail.com>
@emidoots

Copy link
Copy Markdown
Member

"t=2020-06-13T04:34:54+0000 lvl=crit msg="failed to initialize configuration" cmd=grafana-wrapper error="grafana not reachable: Get "http://admin:***@127.0.0.1:tcp/10.55.253.135:30070/api/health\": context deadline exceeded (last error: Get "http://admin:***@127.0.0.1:tcp/10.55.253.135:30070/api/health\": dial tcp: lookup 127.0.0.1:tcp: no such host)"

10.55.253.135:30070

This is the cluster IP of Grafana and its external service port, configured here: https://github.com/sourcegraph/deploy-sourcegraph-dot-com/blob/release/base/grafana/grafana.Service.yaml#L13

What is stranger is the fact that it ended up in the URL at all, that URL should be:

http://admin:***@127.0.0.1:30070/api/health

Not:

http://admin:***@127.0.0.1:tcp/10.55.253.135:30070/api/health

@bobheadxi

bobheadxi commented Jun 15, 2020

Copy link
Copy Markdown
Member Author

Confirmed working in:

@emidoots

Copy link
Copy Markdown
Member

I think it means that GRAFANA_PORT was indeed set to 10.55.253.135:30070 which is SUPER weird

Comment thread docker-images/grafana/cmd/grafana-wrapper/main.go
@emidoots

Copy link
Copy Markdown
Member

Feel free to deploy your test images on k8s.sgdev.org via a PR to https://github.com/sourcegraph/deploy-sourcegraph-dogfood-k8s (no need to wait for review, just make sure you undo it after you're finished testing)

# The (internal) http port to use
# In `sourcegraph/grafana`, 3370 is reverse-proxied to 3371 - if you change this,
# make sure you set GRAFANA_INTERNAL_PORT on your Grafana instance.
http_port = 3371

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config file we use in single-container sourcegraph/server deployments like sourcegraph.sgdev.org was not updated to be 3371:

https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/docker-images/grafana/config/grafana-single-container.ini#L37-38

Maybe that was the cause?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mentioned in the previous PR that the single-container deployments use the grafana binary directly? also, I'm planning to keep single-server disabled for the time being (https://github.com/sourcegraph/sourcegraph/issues/11473)

@bobheadxi bobheadxi Jun 15, 2020

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what i mean is to avoid having to add another configuration to every deployment, I've kept the container "interface" the same by putting a reverse proxy on 3370 and having grafana be served internally on 3371, so this configuration does not need to change for single-container at the moment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, as long as you have this disabled for server it does not matter and this PR as-is LGTM - but if you are looking for the cause of the issue this may have been it.

Effectively, Grafana was serving on 3370 not 3371 as you intended - I have no idea what the exact behavior would’ve been for either Grafana itself or Grafana-wrapper in this case (I.e Grafana 3370 was already bound)

@bobheadxi

Copy link
Copy Markdown
Member Author

alright... I'm pretty sure this won't behave badly anymore - https://github.com/sourcegraph/sourcegraph/pull/11483#issuecomment-643844736

@bobheadxi bobheadxi merged commit 4266960 into master Jun 15, 2020
@bobheadxi bobheadxi deleted the distribution/grafana-notifiers-2 branch June 15, 2020 02:45
LawnGnome added a commit that referenced this pull request Jun 16, 2020
The changes in #11483 made it difficult to start Grafana within a Linux
development environment, and #11486 obscured the error in the goreman output.
This restores the previous behaviour around the UID the container will run as,
and ensures that logs will be dumped if the container exits unexpectedly.
LawnGnome added a commit that referenced this pull request Jun 16, 2020
* dev: fix Grafana startup on Linux

The changes in #11483 made it difficult to start Grafana within a Linux
development environment, and #11486 obscured the error in the goreman output.
This restores the previous behaviour around the UID the container will run as,
and ensures that logs will be dumped if the container exits unexpectedly.

* Always overwrite grafana.log
@bobheadxi bobheadxi mentioned this pull request Jun 18, 2020
37 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

observability: Alerting should be easily configured through a file or site configuration

2 participants