Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

feat(appliance): roll frontend deployment when pgsql secret changes#63845

Merged
craigfurman merged 1 commit into
mainfrom
appliance-roll-deployment-when-secrets-change
Jul 19, 2024
Merged

feat(appliance): roll frontend deployment when pgsql secret changes#63845
craigfurman merged 1 commit into
mainfrom
appliance-roll-deployment-when-secrets-change

Conversation

@craigfurman

@craigfurman craigfurman commented Jul 16, 2024

Copy link
Copy Markdown
Contributor

Idea for https://linear.app/sourcegraph/issue/REL-14/ensure-pods-roll-when-referenced-secret-changes, does not close it though. Please see that issue for more context and alternative approaches considered.

This PR is designed to illustrate the idea more clearly than typing a description of it - do feel free to push back, just because it's already PR'ed doesn't mean it's "too late!". If the general approach is 👍, I can make similar PRs for the other services and their dependencies (basically everything that depends on codeintel/codeinsights DBs and Redis) to close this.

This approach is a bit of a departure from how our helm chart works. That hashes hashes values blocks to determine these annotations. The issue is, that these values blocks often contain references to secrets, and therefore don't change when the content of the secret changes. Example:

https://github.com/sourcegraph/deploy-sourcegraph-helm/blob/84ea4b213d8c08390f3b3335c3f226016a0408b5/charts/sourcegraph/templates/frontend/sourcegraph-frontend.Deployment.yaml#L32 - this annotation, when changed, causes pods to roll on deployment.

https://github.com/sourcegraph/deploy-sourcegraph-helm/blob/84ea4b213d8c08390f3b3335c3f226016a0408b5/charts/sourcegraph/templates/_helpers.tpl#L259-L264 - the template helper hashes some values blocks

https://github.com/sourcegraph/deploy-sourcegraph-helm/blob/84ea4b213d8c08390f3b3335c3f226016a0408b5/charts/sourcegraph/values.yaml#L722-L736 - if the auth block references an existing secret instead of being set directly, this won't change when the secret content changes.

This PR takes a more general approach of looking for that secret, which must always eventually exist - whether admin-provisioned or provisioned by the pgsql service definition - and hashes that.

Draft because I'm looking for approach review, also because this depends on an unmerged branch. This can be reviewed independently, but that bug fix (https://github.com/sourcegraph/sourcegraph/pull/63826) is also necessary for this to actually work, otherwise the admin-provisioned secret is blown away by the reconcile loop.

Test plan

envtest tests included.

Changelog

@craigfurman craigfurman added the no-changelog Exclude this PR from the next changelog. label Jul 16, 2024
@cla-bot cla-bot Bot added the cla-signed label Jul 16, 2024
Comment on lines +158 to +174
ifChanged := struct {
config.FrontendSpec
PG *config.DatabaseConnectionSpec `json:"pg,omitempty"`
}{
FrontendSpec: cfg,
PG: dbConnSpec,
}

return reconcileObject(ctx, r, ifChanged, &dep, &appsv1.Deployment{}, sg, owner)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example of how individual resources in individual services can add data to the "ifChanged" parameter of reconcileObject. Usually, just the config element itself is passed. Please follow the calls in https://github.com/sourcegraph/sourcegraph/blob/main/internal/appliance/reconciler/kubernetes.go through for more context - reconcileObject() throws the version into the mix as well. So in general we update kubernetes resources when the service config or the version has changed, and now we see a facility to throw more arbitrary data into there too.

@@ -0,0 +1,561 @@
resources:

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can see the config hash annotation changing compared to default.yaml, which would cause a rolling deploy.

❯ diff internal/appliance/reconciler/testdata/golden-fixtures/frontend/default.yaml internal/appliance/reconciler/testdata/golden-fixtures/frontend/after-create-pg-secret.yaml

6c6
<         appliance.sourcegraph.com/configHash: b5dce290e22d1afb4c9102ac4c245490ab01dd3be13653de391536cfe0e323b0
---
>         appliance.sourcegraph.com/configHash: e32d95c60ed3e814ce53f11507d1c894ae3b417b653a9f307379b9252d6d5785
8c8
<       generation: 1
---
>       generation: 2

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dang, I just realised, this won't cause a rolling deploy! We change the configHash annotation on the deployment object, which causes our reconciler's createOrUpdateObject() to update the object, but without updating Deployment.spec.template no rolling deploy will actually occur! That was really dumb of me. Will fix.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

// We have to make a config change to trigger the reconcile loop
suite.awaitReconciliation(namespace, func() {
cfgMap := suite.newConfigMap(namespace, "frontend/default")
cfgMap.GetAnnotations()["force-reconcile"] = "1"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might make a good appliance web app feature - a "force re-reconcile" button. It could be implemented the same way this test does, incrementing a number on a configmap annotation, or just putting something random in there.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up having to do something similar (but not quite as elegant) in the POC with just forcing a reconcile every 5 seconds or so no matter what. Not ideal though, and a button to trigger it is likely a far better idea 😄 .

@craigfurman craigfurman Jul 17, 2024

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't checked what the PoC does, but controller-runtime actually lets you do something like that. Maybe we can make the duration configurable. e.g. at the end of Reconcile() in reconcile.go:

	return ctrl.Result{
		RequeueAfter: time.Hour,
	}, nil

Separate issue I think, if we want to do this

@craigfurman craigfurman requested review from a team and DaedalusG and removed request for a team July 16, 2024 08:49
Base automatically changed from appliance-dont-delete-unowned-objects to main July 16, 2024 16:18
@craigfurman craigfurman force-pushed the appliance-roll-deployment-when-secrets-change branch from e8ef1bb to 75897da Compare July 16, 2024 16:28
@craigfurman craigfurman requested a review from jdpleiness July 17, 2024 09:50
@craigfurman craigfurman marked this pull request as ready for review July 18, 2024 11:32
@craigfurman craigfurman marked this pull request as draft July 18, 2024 13:53
Ensure configHash changes by adding connection secret to the ifChanged
object, which causes createOrUpdateObject() in kubernetes.go to update
the object. Set the "checksum/auth" annotation on the pod template
in frontend's deployment to a hash of this connection object, to cause a
rolling deployment.

This annotation exists in helm, but this mechanism is slightly more
general as it always hashes secret values, whereas helm will sometimes
be solely hashing the name of a secret whose values may have changed.
@craigfurman craigfurman force-pushed the appliance-roll-deployment-when-secrets-change branch from 75897da to 4912b99 Compare July 18, 2024 15:49
if err != nil {
return err
}
template.Template.ObjectMeta.Annotations["checksum/auth"] = dbConnHash

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what actually causes the rolling deploy, after the appliance updates the deployment

@craigfurman craigfurman marked this pull request as ready for review July 18, 2024 15:51
@craigfurman

Copy link
Copy Markdown
Contributor Author

@jdpleiness this is ready for review again now, sorry for the churn

@craigfurman craigfurman merged commit 619fc57 into main Jul 19, 2024
@craigfurman craigfurman deleted the appliance-roll-deployment-when-secrets-change branch July 19, 2024 14:00
craigfurman referenced this pull request Jul 22, 2024
…es (#63954)

Builds on
https://github.com/sourcegraph/sourcegraph/pull/63845, and closes
https://linear.app/sourcegraph/issue/REL-14/ensure-pods-roll-when-referenced-secret-changes.

When one of the 3 DB configs changes, an annotation "checksum/auth" on
frontend's deployment's spec.template.metadata should change on next
reconcile. This will cause pods to roll, picking up the new secret
values. It should also cause the top-level annotation configHash to
change, which indicates to the appliance that the kubernetes resource
should be updated.

A similar mechanism is implemented for "checksum/redis", on every
service that uses redis (obtained by grepping the helm chart for the
same annotation).
craigfurman pushed a commit that referenced this pull request Jul 31, 2024
…63845)

This approach is a bit of a departure from how our helm chart works.
That hashes hashes values blocks to determine these annotations. The
issue is, that these values blocks often contain references to secrets,
and therefore don't change when the _content_ of the secret changes.

This PR takes a more general approach of looking for that secret, which
must always eventually exist - whether admin-provisioned or provisioned
by the pgsql service definition - and hashes that.
craigfurman referenced this pull request Jul 31, 2024
…es (#63954)

Builds on
https://github.com/sourcegraph/sourcegraph/pull/63845, and closes
https://linear.app/sourcegraph/issue/REL-14/ensure-pods-roll-when-referenced-secret-changes.

When one of the 3 DB configs changes, an annotation "checksum/auth" on
frontend's deployment's spec.template.metadata should change on next
reconcile. This will cause pods to roll, picking up the new secret
values. It should also cause the top-level annotation configHash to
change, which indicates to the appliance that the kubernetes resource
should be updated.

A similar mechanism is implemented for "checksum/redis", on every
service that uses redis (obtained by grepping the helm chart for the
same annotation).
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

cla-signed no-changelog Exclude this PR from the next changelog.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants