feat(appliance): roll frontend deployment when pgsql secret changes by craigfurman · Pull Request #63845 · sourcegraph/sourcegraph-public-snapshot

craigfurman · 2024-07-16T08:44:11Z

Idea for https://linear.app/sourcegraph/issue/REL-14/ensure-pods-roll-when-referenced-secret-changes, does not close it though. Please see that issue for more context and alternative approaches considered.

This PR is designed to illustrate the idea more clearly than typing a description of it - do feel free to push back, just because it's already PR'ed doesn't mean it's "too late!". If the general approach is 👍, I can make similar PRs for the other services and their dependencies (basically everything that depends on codeintel/codeinsights DBs and Redis) to close this.

This approach is a bit of a departure from how our helm chart works. That hashes hashes values blocks to determine these annotations. The issue is, that these values blocks often contain references to secrets, and therefore don't change when the content of the secret changes. Example:

https://github.com/sourcegraph/deploy-sourcegraph-helm/blob/84ea4b213d8c08390f3b3335c3f226016a0408b5/charts/sourcegraph/templates/frontend/sourcegraph-frontend.Deployment.yaml#L32 - this annotation, when changed, causes pods to roll on deployment.

https://github.com/sourcegraph/deploy-sourcegraph-helm/blob/84ea4b213d8c08390f3b3335c3f226016a0408b5/charts/sourcegraph/templates/_helpers.tpl#L259-L264 - the template helper hashes some values blocks

https://github.com/sourcegraph/deploy-sourcegraph-helm/blob/84ea4b213d8c08390f3b3335c3f226016a0408b5/charts/sourcegraph/values.yaml#L722-L736 - if the auth block references an existing secret instead of being set directly, this won't change when the secret content changes.

This PR takes a more general approach of looking for that secret, which must always eventually exist - whether admin-provisioned or provisioned by the pgsql service definition - and hashes that.

Draft because I'm looking for approach review, also because this depends on an unmerged branch. This can be reviewed independently, but that bug fix (https://github.com/sourcegraph/sourcegraph/pull/63826) is also necessary for this to actually work, otherwise the admin-provisioned secret is blown away by the reconcile loop.

Test plan

envtest tests included.

Changelog

craigfurman · 2024-07-16T08:47:32Z

+	ifChanged := struct {
+		config.FrontendSpec
+		PG *config.DatabaseConnectionSpec `json:"pg,omitempty"`
+	}{
+		FrontendSpec: cfg,
+		PG:           dbConnSpec,
+	}
+
+	return reconcileObject(ctx, r, ifChanged, &dep, &appsv1.Deployment{}, sg, owner)


Example of how individual resources in individual services can add data to the "ifChanged" parameter of reconcileObject. Usually, just the config element itself is passed. Please follow the calls in https://github.com/sourcegraph/sourcegraph/blob/main/internal/appliance/reconciler/kubernetes.go through for more context - reconcileObject() throws the version into the mix as well. So in general we update kubernetes resources when the service config or the version has changed, and now we see a facility to throw more arbitrary data into there too.

craigfurman · 2024-07-16T08:48:19Z

@@ -0,0 +1,561 @@
+resources:


We can see the config hash annotation changing compared to default.yaml, which would cause a rolling deploy.

❯ diff internal/appliance/reconciler/testdata/golden-fixtures/frontend/default.yaml internal/appliance/reconciler/testdata/golden-fixtures/frontend/after-create-pg-secret.yaml 6c6 < appliance.sourcegraph.com/configHash: b5dce290e22d1afb4c9102ac4c245490ab01dd3be13653de391536cfe0e323b0 --- > appliance.sourcegraph.com/configHash: e32d95c60ed3e814ce53f11507d1c894ae3b417b653a9f307379b9252d6d5785 8c8 < generation: 1 --- > generation: 2

Dang, I just realised, this won't cause a rolling deploy! We change the configHash annotation on the deployment object, which causes our reconciler's createOrUpdateObject() to update the object, but without updating Deployment.spec.template no rolling deploy will actually occur! That was really dumb of me. Will fix.

craigfurman · 2024-07-16T08:48:44Z

+	// We have to make a config change to trigger the reconcile loop
+	suite.awaitReconciliation(namespace, func() {
+		cfgMap := suite.newConfigMap(namespace, "frontend/default")
+		cfgMap.GetAnnotations()["force-reconcile"] = "1"


This might make a good appliance web app feature - a "force re-reconcile" button. It could be implemented the same way this test does, incrementing a number on a configmap annotation, or just putting something random in there.

I ended up having to do something similar (but not quite as elegant) in the POC with just forcing a reconcile every 5 seconds or so no matter what. Not ideal though, and a button to trigger it is likely a far better idea 😄 .

I haven't checked what the PoC does, but controller-runtime actually lets you do something like that. Maybe we can make the duration configurable. e.g. at the end of Reconcile() in reconcile.go:

return ctrl.Result{ RequeueAfter: time.Hour, }, nil

Separate issue I think, if we want to do this

Ensure configHash changes by adding connection secret to the ifChanged object, which causes createOrUpdateObject() in kubernetes.go to update the object. Set the "checksum/auth" annotation on the pod template in frontend's deployment to a hash of this connection object, to cause a rolling deployment. This annotation exists in helm, but this mechanism is slightly more general as it always hashes secret values, whereas helm will sometimes be solely hashing the name of a secret whose values may have changed.

craigfurman · 2024-07-18T15:50:56Z

+		if err != nil {
+			return err
+		}
+		template.Template.ObjectMeta.Annotations["checksum/auth"] = dbConnHash


this is what actually causes the rolling deploy, after the appliance updates the deployment

craigfurman · 2024-07-18T15:51:18Z

@jdpleiness this is ready for review again now, sorry for the churn

…es (#63954) Builds on https://github.com/sourcegraph/sourcegraph/pull/63845, and closes https://linear.app/sourcegraph/issue/REL-14/ensure-pods-roll-when-referenced-secret-changes. When one of the 3 DB configs changes, an annotation "checksum/auth" on frontend's deployment's spec.template.metadata should change on next reconcile. This will cause pods to roll, picking up the new secret values. It should also cause the top-level annotation configHash to change, which indicates to the appliance that the kubernetes resource should be updated. A similar mechanism is implemented for "checksum/redis", on every service that uses redis (obtained by grepping the helm chart for the same annotation).

…63845) This approach is a bit of a departure from how our helm chart works. That hashes hashes values blocks to determine these annotations. The issue is, that these values blocks often contain references to secrets, and therefore don't change when the _content_ of the secret changes. This PR takes a more general approach of looking for that secret, which must always eventually exist - whether admin-provisioned or provisioned by the pgsql service definition - and hashes that.

…es (#63954) Builds on https://github.com/sourcegraph/sourcegraph/pull/63845, and closes https://linear.app/sourcegraph/issue/REL-14/ensure-pods-roll-when-referenced-secret-changes. When one of the 3 DB configs changes, an annotation "checksum/auth" on frontend's deployment's spec.template.metadata should change on next reconcile. This will cause pods to roll, picking up the new secret values. It should also cause the top-level annotation configHash to change, which indicates to the appliance that the kubernetes resource should be updated. A similar mechanism is implemented for "checksum/redis", on every service that uses redis (obtained by grepping the helm chart for the same annotation).

craigfurman added the no-changelog Exclude this PR from the next changelog. label Jul 16, 2024

cla-bot Bot added the cla-signed label Jul 16, 2024

craigfurman commented Jul 16, 2024

View reviewed changes

craigfurman requested review from a team and DaedalusG and removed request for a team July 16, 2024 08:49

Base automatically changed from appliance-dont-delete-unowned-objects to main July 16, 2024 16:18

craigfurman force-pushed the appliance-roll-deployment-when-secrets-change branch from e8ef1bb to 75897da Compare July 16, 2024 16:28

craigfurman requested a review from jdpleiness July 17, 2024 09:50

craigfurman marked this pull request as ready for review July 18, 2024 11:32

craigfurman marked this pull request as draft July 18, 2024 13:53

craigfurman force-pushed the appliance-roll-deployment-when-secrets-change branch from 75897da to 4912b99 Compare July 18, 2024 15:49

craigfurman commented Jul 18, 2024

View reviewed changes

craigfurman marked this pull request as ready for review July 18, 2024 15:51

jdpleiness approved these changes Jul 19, 2024

View reviewed changes

craigfurman merged commit 619fc57 into main Jul 19, 2024

craigfurman deleted the appliance-roll-deployment-when-secrets-change branch July 19, 2024 14:00

craigfurman mentioned this pull request Jul 19, 2024

feat(appliance): redeploy various services when pg/redis config changes #63954

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(appliance): roll frontend deployment when pgsql secret changes#63845

feat(appliance): roll frontend deployment when pgsql secret changes#63845
craigfurman merged 1 commit into
mainfrom
appliance-roll-deployment-when-secrets-change

craigfurman commented Jul 16, 2024 •

edited

Loading

Uh oh!

craigfurman Jul 16, 2024

Uh oh!

craigfurman Jul 16, 2024

Uh oh!

craigfurman Jul 18, 2024

Uh oh!

craigfurman Jul 18, 2024

Uh oh!

craigfurman Jul 16, 2024

Uh oh!

jdpleiness Jul 17, 2024

Uh oh!

craigfurman Jul 17, 2024 •

edited

Loading

Uh oh!

craigfurman Jul 18, 2024

Uh oh!

craigfurman commented Jul 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

craigfurman commented Jul 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test plan

Changelog

Uh oh!

craigfurman Jul 16, 2024

Choose a reason for hiding this comment

Uh oh!

craigfurman Jul 16, 2024

Choose a reason for hiding this comment

Uh oh!

craigfurman Jul 18, 2024

Choose a reason for hiding this comment

Uh oh!

craigfurman Jul 18, 2024

Choose a reason for hiding this comment

Uh oh!

craigfurman Jul 16, 2024

Choose a reason for hiding this comment

Uh oh!

jdpleiness Jul 17, 2024

Choose a reason for hiding this comment

Uh oh!

craigfurman Jul 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

craigfurman Jul 18, 2024

Choose a reason for hiding this comment

Uh oh!

craigfurman commented Jul 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

craigfurman commented Jul 16, 2024 •

edited

Loading

craigfurman Jul 17, 2024 •

edited

Loading