feat(appliance): self-update by craigfurman · Pull Request #63780 · sourcegraph/sourcegraph-public-snapshot

craigfurman · 2024-07-11T10:11:31Z

https://linear.app/sourcegraph/issue/REL-212/appliance-can-self-upgrade

I've presented this as a few commits because a surprising amount of the code complexity is in the "only allow 2 minor versions" constraint. This part is in its own commit, after the core self-update feature. If we want to make changes there, hopefully this splitting makes it easier, and we can even split the PR if wanted.

An integration test subpackage is introduced, which uses the recently-extracted k8senvtest. The final commit introduces a 2nd test case which required a bit of refactoring to actually be able to run >1 envtest-using test in a package without using testify/suite, as reconciler does. The isolation required to do this also gives us parallel test support for free, which is something that testify/suite doesn't support. I've verified that the tests in this PR work in parallel. If we want, we can port the same pattern to the reconciler package which might speed things up a bit, but I don't think that's a huge priority. This part is in its own commit because it does introduce a little complexity around TestMain(). Again, we can split it out and discuss is separately if we wish.

Commit log

chore(appliance): extract constant for configmap name

To the reconciler, this is just a value, but to higher-level packages
like appliance, there is a single configmap that is an entity. Let's
make sure all high-level orchestration packages can reference our name
for it. This could itself be extracted to injected config if there was a
motivation for it.

chore(appliance): extract NewRandomNamespace() in k8senvtest

From reconciler tests, so that we can reuse it in self-update tests.

feat(appliance): self-update

Add a worker thread to the appliance that periodically polls release
registry for newer versions, and updates its own Kubernetes deployment.

If the APPLIANCE_DEPLOYMENT_NAME environment variable is not set, this
feature is disabled. This PR will be accompanied by one to the
appliance's helm chart to add this variable by default.

fix(appliance): only self-update 2 minor versions above deployed SG

chore(appliance): self-update integration test extra case

Check that self-update doesn't run when SG is not yet deployed.

Test plan

Automated tests included. https://linear.app/sourcegraph/issue/REL-255/validate-appliance-self-update-end-to-end is split out so that we can perform further testing when Docker images for the appliance have been released.

Changelog

Appliance can automatically self-update up to 2 minor revisions beyond the currently-deployed sourcegraph version.

To the reconciler, this is just a value, but to higher-level packages like appliance, there is a single configmap that is an entity. Let's make sure all high-level orchestration packages can reference our name for it. This could itself be extracted to injected config if there was a motivation for it.

From reconciler tests, so that we can reuse it in self-update tests.

Add a worker thread to the appliance that periodically polls release registry for newer versions, and updates its own Kubernetes deployment. If the APPLIANCE_DEPLOYMENT_NAME environment variable is not set, this feature is disabled. This PR will be accompanied by one to the appliance's helm chart to add this variable by default.

Check that self-update doesn't run when SG is not yet deployed.

craigfurman

Big chunky PR (sorry) so I left some hopefully-helpful breadcrumbs. Of course feel free to raise more issues, and can zoom if it gets complicated.

craigfurman · 2024-07-11T10:15:47Z

See companion sourcegraph/deploy-sourcegraph-helm#501, which enables this by default in the helm chart

I'm thinking about if disabled by default is what we want, if we enable it in the Helm charts and the potential yaml files anyway. WDYT?

Do you mean that we might want to enable it by default at this layer too? There'd certainly be some sense in that, but it would involve proposing a sensible default for the deployment name. When doing local tire-kicking, outside of kube, there won't be a deployment for the appliance and the process will crash.

craigfurman · 2024-07-11T10:27:18Z

+
+type noopReconicler struct{}
+
+func (noopReconicler) SetupWithManager(_ ctrl.Manager) error { return nil }


We don't need a reconciler for these tests, we're just using the k8s control plane as a metadata store.

craigfurman · 2024-07-11T10:29:44Z

@@ -0,0 +1,17 @@
+# k8senvtest
+
+A wrapper package for sigs.k8s.io/controller-runtime/pkg/envtest. Has


Better late than never! (this README that is, not that envtest needs a general wrapper)

craigfurman · 2024-07-11T10:31:53Z

+	k8sClient client.Client
+)
+
+func TestMain(m *testing.M) {


Using TestMain() allows us to control once-per-package setup and teardown. Some downsides are:

we have to manage error handling without test niceties

we can't use logtest since we don't have a T yet

we have to manage process exit code so can't truly make use of defer

Still, I think this is a little lighter-weight than testify/suite, which I did use for reconciler. We don't need all of its features, and this way, we get test parallelisation for free!

@jdpleiness WDYT? Arguably we could just use a mock k8sClient here and skip all the k8s control plane stuff, but it's kind of nice to be lazy and get validation that we're using k8s correctly for free.

This pattern might be handy for appliance API tests too, since we're crafting a configmap.

but it's kind of nice to be lazy and get validation that we're using k8s correctly for free.

Yeah agreed. I like this much better.

craigfurman · 2024-07-11T10:33:43Z

+		var dep appsv1.Deployment
+		depName := types.NamespacedName{Name: "appliance", Namespace: nsName}
+		require.NoError(t, k8sClient.Get(ctx, depName, &dep))
+		return strings.HasSuffix(dep.Spec.Template.Spec.Containers[0].Image, "4.5.7")


I didn't feel the need to use golden test fixtures here, I'm just asserting on a few fields.

craigfurman · 2024-07-11T10:35:59Z

+		// Wait for SG to be deployed before alloweing self-update
+		if kerrors.IsNotFound(err) {
+			u.Logger.Info("Sourcegraph ConfigMap not found, exiting appliance self-update")
+			return nil


note - if we haven't got an SG version yet, it is probably simpler to just not self-update until we do.

craigfurman · 2024-07-11T10:37:01Z

+	if len(versionStrs) == 0 {
+		return "", errors.New("no versions found")
+	}
+	latestVersion, err := appliance.HighestVersionNoMoreThanNMinorFromBase(versionStrs, currentSGVersion, 2)


I know it's a bit cheeky to import the appliance package here. If this ever becomes a problem that creates a circular dependency, we can always split appliance/versions.go into another package, maybe even the appliance/config package.

Yeah, should be fine. Easy to move later if not like you said.

craigfurman · 2024-07-11T10:39:01Z

+func (u *SelfUpdate) Loop(ctx context.Context) error {
+	u.Logger.Info("starting self-update loop")
+
+	ticker := time.NewTicker(u.Interval)


note: the first tick is after the interval, so it's painful to kick the tires on this without hacking the hardcoded value in cmd/appliance. Maybe we can extract that to a flag with a sensible default.

I haven't truly run this e2e since there were no appliance docker images available at the time of writing, so I split out https://linear.app/sourcegraph/issue/REL-255/validate-appliance-self-update-end-to-end for that final validation.

craigfurman · 2024-07-11T10:39:20Z

+	Namespace      string
+}
+
+func (u *SelfUpdate) Loop(ctx context.Context) error {


All the logic is in here. The rest is main-plumbing and tests.

craigfurman · 2024-07-11T10:40:08Z

 		}
 		return nil
 	})
+	if config.selfDeploymentName != "" {


note: this is empty by default so self-update is disabled by default when just running the exe. But, sourcegraph/deploy-sourcegraph-helm#501 enables this by default for helm users, as all real users will be.

We might have a method where we provide just a yaml file for users and have them run kubectl apply -f as well. But this is also trivial to add there as well 👍

craigfurman · 2024-07-11T11:07:57Z

+	}
+
+	dep.Spec.Template.Spec.Containers[0].Image = replaceTag(dep.Spec.Template.Spec.Containers[0].Image, newTag)
+	if err := u.K8sClient.Update(ctx, &dep); err != nil {


This will cause a rolling deployment of the appliance. @jdpleiness we were talking the other day about whether we need to wait for the reconcile loop to finish when the context is cmd/appliance is canceled by the signal handler - good news, the controller-runtime already handles this 🎊

I tested this by interrupting the server immediately after triggering a reconcile. The logs show the signal getting handled, with a note about this exact scenario, and the reconciliation completing before shutdown.

So we don't need to do anything.

So we don't need to do anything.

Nice! 💯

jdpleiness

Overall looks good! A few questions but nothing blocking.

jdpleiness · 2024-07-11T15:24:42Z

 		}
 		return nil
 	})
+	if config.selfDeploymentName != "" {


We might have a method where we provide just a yaml file for users and have them run kubectl apply -f as well. But this is also trivial to add there as well 👍

jdpleiness · 2024-07-11T15:28:38Z

+	k8sClient client.Client
+)
+
+func TestMain(m *testing.M) {


but it's kind of nice to be lazy and get validation that we're using k8s correctly for free.

Yeah agreed. I like this much better.

jdpleiness · 2024-07-11T15:29:26Z

+
+	rc := m.Run()
+
+	// Our earlier defer won't run after we call os.Exit() below


jdpleiness · 2024-07-11T15:35:24Z

+	}
+
+	dep.Spec.Template.Spec.Containers[0].Image = replaceTag(dep.Spec.Template.Spec.Containers[0].Image, newTag)
+	if err := u.K8sClient.Update(ctx, &dep); err != nil {


So we don't need to do anything.

Nice! 💯

jdpleiness · 2024-07-11T15:37:12Z

+	if len(versionStrs) == 0 {
+		return "", errors.New("no versions found")
+	}
+	latestVersion, err := appliance.HighestVersionNoMoreThanNMinorFromBase(versionStrs, currentSGVersion, 2)


Yeah, should be fine. Easy to move later if not like you said.

jdpleiness · 2024-07-11T15:40:29Z

+
+	// Otherwise, allow the crossing of one major boundary (if one is available)
+	var highestMinorToleratedInNextMajor *uint64
+	for _, version := range versions {


Agreed with that. This isn't like a super hot path or anything anyhow.

jdpleiness · 2024-07-11T15:45:43Z

+			expectedVersion: "4.1.1",
+		},
+		{
+			name:            "tolerates crossing a major version boundary, but then no further minor versions",


This is probably a good call. I think we have few cases where we advise to upgrade to the major like then first, then proceed with upgrading from there.

jdpleiness · 2024-07-11T15:46:54Z

+// If you're wondering why we define an interface here, Java-style, instead of
+// using the more Go-ish idiom of encouraging consumers define interfaces, it's
+// because there are a couple of packages we want to use mocks for this
+// interface in, and its simpler to generate them in one place and import them
+// everywhere.


jdpleiness · 2024-07-11T16:20:41Z

I'm thinking about if disabled by default is what we want, if we enable it in the Helm charts and the potential yaml files anyway. WDYT?

craigfurman · 2024-07-11T16:59:20Z

Thanks for looking @jdpleiness! Since you approved I'm going to merge this before it gets stale, please feel free to follow-up on any remaining threads and we can revert bits / do follow-ons.

**chore(appliance): extract constant for configmap name** To the reconciler, this is just a value, but to higher-level packages like appliance, there is a single configmap that is an entity. Let's make sure all high-level orchestration packages can reference our name for it. This could itself be extracted to injected config if there was a motivation for it. **chore(appliance): extract NewRandomNamespace() in k8senvtest** From reconciler tests, so that we can reuse it in self-update tests. **feat(appliance): self-update** Add a worker thread to the appliance that periodically polls release registry for newer versions, and updates its own Kubernetes deployment. If the APPLIANCE_DEPLOYMENT_NAME environment variable is not set, this feature is disabled. This PR will be accompanied by one to the appliance's helm chart to add this variable by default. **fix(appliance): only self-update 2 minor versions above deployed SG** **chore(appliance): self-update integration test extra case** Check that self-update doesn't run when SG is not yet deployed. https://linear.app/sourcegraph/issue/REL-212/appliance-can-self-upgrade

cla-bot Bot added the cla-signed label Jul 11, 2024

craigfurman force-pushed the appliance-self-upgrade branch from 52b6095 to d03aee1 Compare July 11, 2024 10:15

Craig Furman added 4 commits July 11, 2024 11:17

chore(appliance): extract NewRandomNamespace() in k8senvtest

21ca8c5

From reconciler tests, so that we can reuse it in self-update tests.

fix(appliance): only self-update 2 minor versions above deployed SG

2ab10f3

chore(appliance): self-update integration test extra case

e3306a4

Check that self-update doesn't run when SG is not yet deployed.

craigfurman force-pushed the appliance-self-upgrade branch from d03aee1 to e3306a4 Compare July 11, 2024 10:21

chore(k8senvtest): add readme

004c6f7

craigfurman commented Jul 11, 2024

View reviewed changes

craigfurman requested review from a team and jdpleiness July 11, 2024 10:41

craigfurman marked this pull request as ready for review July 11, 2024 10:41

craigfurman commented Jul 11, 2024

View reviewed changes

jdpleiness approved these changes Jul 11, 2024

View reviewed changes

fix comment

8f27a80

craigfurman merged commit a20b065 into main Jul 11, 2024

craigfurman deleted the appliance-self-upgrade branch July 11, 2024 16:59


		type noopReconicler struct{}

		func (noopReconicler) SetupWithManager(_ ctrl.Manager) error { return nil }

		@@ -0,0 +1,17 @@
		# k8senvtest

		A wrapper package for sigs.k8s.io/controller-runtime/pkg/envtest. Has


		rc := m.Run()

		// Our earlier defer won't run after we call os.Exit() below

Conversation

craigfurman commented Jul 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Commit log

Test plan

Changelog

Uh oh!

craigfurman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

craigfurman Jul 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jdpleiness left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

craigfurman commented Jul 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

craigfurman commented Jul 11, 2024 •

edited

Loading

craigfurman Jul 11, 2024 •

edited

Loading