Make the Postgres healthchecks more lenient by chrismwendt · Pull Request #824 · sourcegraph/deploy-sourcegraph-docker

chrismwendt · 2022-06-01T19:23:35Z

Checklist

Sister deploy-sourcegraph change: Add a startup probe to codeintel-db deploy-sourcegraph#4136
All images have a valid tag and SHA256 sum

Test plan

N/A

caugustus-sourcegraph · 2022-06-01T19:46:00Z

      interval: 10s
      timeout: 1s
-      retries: 10
+      retries: 360


If the health of a running container can fail for a full hour, is this healthcheck really even doing anything valuable? Perhaps modifying the start_period is a better option here, if this is intended to address the same slow startup issue as in sourcegraph/deploy-sourcegraph#4136.

https://docs.docker.com/engine/reference/builder/#healthcheck

start period provides initialization time for containers that need time to bootstrap. Probe failure during that period will not be counted towards the maximum number of retries. However, if a health check succeeds during the start period, the container is considered started and all consecutive failures will be counted towards the maximum number of retries.

An hour feels too long here (and on the Kubernetes startup probe), but I don't have any hard data to gauge typical recovery startup times. It might not make any difference - most of the the non-OOM failures we see aren't recoverable from a restart (example: bad file system permissions).

👍 switched to start_period.

caugustus-sourcegraph · 2022-06-01T19:46:49Z

      interval: 10s
      timeout: 1s
-      retries: 10
+      retries: 360


Is it intentional that codeinsights-db isn't included in this change?

Updated codeinsights-db, too.

more lenient postgres healthcheck

272deb6

chrismwendt requested a review from efritz June 1, 2022 19:23

chrismwendt mentioned this pull request Jun 1, 2022

Add a startup probe to codeintel-db sourcegraph/deploy-sourcegraph#4136

Merged

4 tasks

efritz approved these changes Jun 1, 2022

View reviewed changes

caugustus-sourcegraph reviewed Jun 1, 2022

View reviewed changes

chrismwendt added 2 commits June 1, 2022 19:37

use start_period instead

0ef9a82

codeinsights-db, too

cb09f8e

chrismwendt merged commit 4bcdf13 into master Jun 2, 2022

chrismwendt deleted the more-lenient-postgres-healthcheck branch June 2, 2022 02:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the Postgres healthchecks more lenient#824

Make the Postgres healthchecks more lenient#824
chrismwendt merged 3 commits into
masterfrom
more-lenient-postgres-healthcheck

chrismwendt commented Jun 1, 2022

Uh oh!

caugustus-sourcegraph Jun 1, 2022 •

edited

Loading

Uh oh!

chrismwendt Jun 2, 2022

Uh oh!

caugustus-sourcegraph Jun 1, 2022

Uh oh!

chrismwendt Jun 2, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

chrismwendt commented Jun 1, 2022

Checklist

Test plan

Uh oh!

caugustus-sourcegraph Jun 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chrismwendt Jun 2, 2022

Choose a reason for hiding this comment

Uh oh!

caugustus-sourcegraph Jun 1, 2022

Choose a reason for hiding this comment

Uh oh!

chrismwendt Jun 2, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

caugustus-sourcegraph Jun 1, 2022 •

edited

Loading