feat(appliance): self-update#63780
Conversation
To the reconciler, this is just a value, but to higher-level packages like appliance, there is a single configmap that is an entity. Let's make sure all high-level orchestration packages can reference our name for it. This could itself be extracted to injected config if there was a motivation for it.
52b6095 to
d03aee1
Compare
From reconciler tests, so that we can reuse it in self-update tests.
Add a worker thread to the appliance that periodically polls release registry for newer versions, and updates its own Kubernetes deployment. If the APPLIANCE_DEPLOYMENT_NAME environment variable is not set, this feature is disabled. This PR will be accompanied by one to the appliance's helm chart to add this variable by default.
Check that self-update doesn't run when SG is not yet deployed.
d03aee1 to
e3306a4
Compare
craigfurman
left a comment
There was a problem hiding this comment.
Big chunky PR (sorry) so I left some hopefully-helpful breadcrumbs. Of course feel free to raise more issues, and can zoom if it gets complicated.
There was a problem hiding this comment.
See companion sourcegraph/deploy-sourcegraph-helm#501, which enables this by default in the helm chart
There was a problem hiding this comment.
I'm thinking about if disabled by default is what we want, if we enable it in the Helm charts and the potential yaml files anyway. WDYT?
There was a problem hiding this comment.
Do you mean that we might want to enable it by default at this layer too? There'd certainly be some sense in that, but it would involve proposing a sensible default for the deployment name. When doing local tire-kicking, outside of kube, there won't be a deployment for the appliance and the process will crash.
|
|
||
| type noopReconicler struct{} | ||
|
|
||
| func (noopReconicler) SetupWithManager(_ ctrl.Manager) error { return nil } |
There was a problem hiding this comment.
We don't need a reconciler for these tests, we're just using the k8s control plane as a metadata store.
| @@ -0,0 +1,17 @@ | |||
| # k8senvtest | |||
|
|
|||
| A wrapper package for sigs.k8s.io/controller-runtime/pkg/envtest. Has | |||
There was a problem hiding this comment.
Better late than never! (this README that is, not that envtest needs a general wrapper)
| k8sClient client.Client | ||
| ) | ||
|
|
||
| func TestMain(m *testing.M) { |
There was a problem hiding this comment.
Using TestMain() allows us to control once-per-package setup and teardown. Some downsides are:
- we have to manage error handling without test niceties
- we can't use logtest since we don't have a T yet
- we have to manage process exit code so can't truly make use of
defer
Still, I think this is a little lighter-weight than testify/suite, which I did use for reconciler. We don't need all of its features, and this way, we get test parallelisation for free!
@jdpleiness WDYT? Arguably we could just use a mock k8sClient here and skip all the k8s control plane stuff, but it's kind of nice to be lazy and get validation that we're using k8s correctly for free.
This pattern might be handy for appliance API tests too, since we're crafting a configmap.
There was a problem hiding this comment.
but it's kind of nice to be lazy and get validation that we're using k8s correctly for free.
Yeah agreed. I like this much better.
| var dep appsv1.Deployment | ||
| depName := types.NamespacedName{Name: "appliance", Namespace: nsName} | ||
| require.NoError(t, k8sClient.Get(ctx, depName, &dep)) | ||
| return strings.HasSuffix(dep.Spec.Template.Spec.Containers[0].Image, "4.5.7") |
There was a problem hiding this comment.
I didn't feel the need to use golden test fixtures here, I'm just asserting on a few fields.
| // Wait for SG to be deployed before alloweing self-update | ||
| if kerrors.IsNotFound(err) { | ||
| u.Logger.Info("Sourcegraph ConfigMap not found, exiting appliance self-update") | ||
| return nil |
There was a problem hiding this comment.
note - if we haven't got an SG version yet, it is probably simpler to just not self-update until we do.
| if len(versionStrs) == 0 { | ||
| return "", errors.New("no versions found") | ||
| } | ||
| latestVersion, err := appliance.HighestVersionNoMoreThanNMinorFromBase(versionStrs, currentSGVersion, 2) |
There was a problem hiding this comment.
I know it's a bit cheeky to import the appliance package here. If this ever becomes a problem that creates a circular dependency, we can always split appliance/versions.go into another package, maybe even the appliance/config package.
There was a problem hiding this comment.
Yeah, should be fine. Easy to move later if not like you said.
| func (u *SelfUpdate) Loop(ctx context.Context) error { | ||
| u.Logger.Info("starting self-update loop") | ||
|
|
||
| ticker := time.NewTicker(u.Interval) |
There was a problem hiding this comment.
note: the first tick is after the interval, so it's painful to kick the tires on this without hacking the hardcoded value in cmd/appliance. Maybe we can extract that to a flag with a sensible default.
I haven't truly run this e2e since there were no appliance docker images available at the time of writing, so I split out https://linear.app/sourcegraph/issue/REL-255/validate-appliance-self-update-end-to-end for that final validation.
| Namespace string | ||
| } | ||
|
|
||
| func (u *SelfUpdate) Loop(ctx context.Context) error { |
There was a problem hiding this comment.
All the logic is in here. The rest is main-plumbing and tests.
| } | ||
| return nil | ||
| }) | ||
| if config.selfDeploymentName != "" { |
There was a problem hiding this comment.
note: this is empty by default so self-update is disabled by default when just running the exe. But, sourcegraph/deploy-sourcegraph-helm#501 enables this by default for helm users, as all real users will be.
There was a problem hiding this comment.
We might have a method where we provide just a yaml file for users and have them run kubectl apply -f as well. But this is also trivial to add there as well 👍
| } | ||
|
|
||
| dep.Spec.Template.Spec.Containers[0].Image = replaceTag(dep.Spec.Template.Spec.Containers[0].Image, newTag) | ||
| if err := u.K8sClient.Update(ctx, &dep); err != nil { |
There was a problem hiding this comment.
This will cause a rolling deployment of the appliance. @jdpleiness we were talking the other day about whether we need to wait for the reconcile loop to finish when the context is cmd/appliance is canceled by the signal handler - good news, the controller-runtime already handles this 🎊
I tested this by interrupting the server immediately after triggering a reconcile. The logs show the signal getting handled, with a note about this exact scenario, and the reconciliation completing before shutdown.
So we don't need to do anything.
There was a problem hiding this comment.
So we don't need to do anything.
Nice! 💯
jdpleiness
left a comment
There was a problem hiding this comment.
Overall looks good! A few questions but nothing blocking.
| } | ||
| return nil | ||
| }) | ||
| if config.selfDeploymentName != "" { |
There was a problem hiding this comment.
We might have a method where we provide just a yaml file for users and have them run kubectl apply -f as well. But this is also trivial to add there as well 👍
| k8sClient client.Client | ||
| ) | ||
|
|
||
| func TestMain(m *testing.M) { |
There was a problem hiding this comment.
but it's kind of nice to be lazy and get validation that we're using k8s correctly for free.
Yeah agreed. I like this much better.
|
|
||
| rc := m.Run() | ||
|
|
||
| // Our earlier defer won't run after we call os.Exit() below |
| } | ||
|
|
||
| dep.Spec.Template.Spec.Containers[0].Image = replaceTag(dep.Spec.Template.Spec.Containers[0].Image, newTag) | ||
| if err := u.K8sClient.Update(ctx, &dep); err != nil { |
There was a problem hiding this comment.
So we don't need to do anything.
Nice! 💯
| if len(versionStrs) == 0 { | ||
| return "", errors.New("no versions found") | ||
| } | ||
| latestVersion, err := appliance.HighestVersionNoMoreThanNMinorFromBase(versionStrs, currentSGVersion, 2) |
There was a problem hiding this comment.
Yeah, should be fine. Easy to move later if not like you said.
|
|
||
| // Otherwise, allow the crossing of one major boundary (if one is available) | ||
| var highestMinorToleratedInNextMajor *uint64 | ||
| for _, version := range versions { |
There was a problem hiding this comment.
Agreed with that. This isn't like a super hot path or anything anyhow.
| expectedVersion: "4.1.1", | ||
| }, | ||
| { | ||
| name: "tolerates crossing a major version boundary, but then no further minor versions", |
There was a problem hiding this comment.
This is probably a good call. I think we have few cases where we advise to upgrade to the major like then first, then proceed with upgrading from there.
| // If you're wondering why we define an interface here, Java-style, instead of | ||
| // using the more Go-ish idiom of encouraging consumers define interfaces, it's | ||
| // because there are a couple of packages we want to use mocks for this | ||
| // interface in, and its simpler to generate them in one place and import them | ||
| // everywhere. |
There was a problem hiding this comment.
I'm thinking about if disabled by default is what we want, if we enable it in the Helm charts and the potential yaml files anyway. WDYT?
|
Thanks for looking @jdpleiness! Since you approved I'm going to merge this before it gets stale, please feel free to follow-up on any remaining threads and we can revert bits / do follow-ons. |
**chore(appliance): extract constant for configmap name** To the reconciler, this is just a value, but to higher-level packages like appliance, there is a single configmap that is an entity. Let's make sure all high-level orchestration packages can reference our name for it. This could itself be extracted to injected config if there was a motivation for it. **chore(appliance): extract NewRandomNamespace() in k8senvtest** From reconciler tests, so that we can reuse it in self-update tests. **feat(appliance): self-update** Add a worker thread to the appliance that periodically polls release registry for newer versions, and updates its own Kubernetes deployment. If the APPLIANCE_DEPLOYMENT_NAME environment variable is not set, this feature is disabled. This PR will be accompanied by one to the appliance's helm chart to add this variable by default. **fix(appliance): only self-update 2 minor versions above deployed SG** **chore(appliance): self-update integration test extra case** Check that self-update doesn't run when SG is not yet deployed. https://linear.app/sourcegraph/issue/REL-212/appliance-can-self-upgrade
https://linear.app/sourcegraph/issue/REL-212/appliance-can-self-upgrade
I've presented this as a few commits because a surprising amount of the code complexity is in the "only allow 2 minor versions" constraint. This part is in its own commit, after the core self-update feature. If we want to make changes there, hopefully this splitting makes it easier, and we can even split the PR if wanted.
An integration test subpackage is introduced, which uses the recently-extracted k8senvtest. The final commit introduces a 2nd test case which required a bit of refactoring to actually be able to run >1 envtest-using test in a package without using testify/suite, as reconciler does. The isolation required to do this also gives us parallel test support for free, which is something that testify/suite doesn't support. I've verified that the tests in this PR work in parallel. If we want, we can port the same pattern to the reconciler package which might speed things up a bit, but I don't think that's a huge priority. This part is in its own commit because it does introduce a little complexity around TestMain(). Again, we can split it out and discuss is separately if we wish.
Commit log
chore(appliance): extract constant for configmap name
To the reconciler, this is just a value, but to higher-level packages
like appliance, there is a single configmap that is an entity. Let's
make sure all high-level orchestration packages can reference our name
for it. This could itself be extracted to injected config if there was a
motivation for it.
chore(appliance): extract NewRandomNamespace() in k8senvtest
From reconciler tests, so that we can reuse it in self-update tests.
feat(appliance): self-update
Add a worker thread to the appliance that periodically polls release
registry for newer versions, and updates its own Kubernetes deployment.
If the APPLIANCE_DEPLOYMENT_NAME environment variable is not set, this
feature is disabled. This PR will be accompanied by one to the
appliance's helm chart to add this variable by default.
fix(appliance): only self-update 2 minor versions above deployed SG
chore(appliance): self-update integration test extra case
Check that self-update doesn't run when SG is not yet deployed.
Test plan
Automated tests included. https://linear.app/sourcegraph/issue/REL-255/validate-appliance-self-update-end-to-end is split out so that we can perform further testing when Docker images for the appliance have been released.
Changelog