Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

feat(appliance): self-update#63780

Merged
craigfurman merged 7 commits into
mainfrom
appliance-self-upgrade
Jul 11, 2024
Merged

feat(appliance): self-update#63780
craigfurman merged 7 commits into
mainfrom
appliance-self-upgrade

Conversation

@craigfurman

@craigfurman craigfurman commented Jul 11, 2024

Copy link
Copy Markdown
Contributor

https://linear.app/sourcegraph/issue/REL-212/appliance-can-self-upgrade

I've presented this as a few commits because a surprising amount of the code complexity is in the "only allow 2 minor versions" constraint. This part is in its own commit, after the core self-update feature. If we want to make changes there, hopefully this splitting makes it easier, and we can even split the PR if wanted.

An integration test subpackage is introduced, which uses the recently-extracted k8senvtest. The final commit introduces a 2nd test case which required a bit of refactoring to actually be able to run >1 envtest-using test in a package without using testify/suite, as reconciler does. The isolation required to do this also gives us parallel test support for free, which is something that testify/suite doesn't support. I've verified that the tests in this PR work in parallel. If we want, we can port the same pattern to the reconciler package which might speed things up a bit, but I don't think that's a huge priority. This part is in its own commit because it does introduce a little complexity around TestMain(). Again, we can split it out and discuss is separately if we wish.

Commit log

chore(appliance): extract constant for configmap name

To the reconciler, this is just a value, but to higher-level packages
like appliance, there is a single configmap that is an entity. Let's
make sure all high-level orchestration packages can reference our name
for it. This could itself be extracted to injected config if there was a
motivation for it.

chore(appliance): extract NewRandomNamespace() in k8senvtest

From reconciler tests, so that we can reuse it in self-update tests.

feat(appliance): self-update

Add a worker thread to the appliance that periodically polls release
registry for newer versions, and updates its own Kubernetes deployment.

If the APPLIANCE_DEPLOYMENT_NAME environment variable is not set, this
feature is disabled. This PR will be accompanied by one to the
appliance's helm chart to add this variable by default.

fix(appliance): only self-update 2 minor versions above deployed SG

chore(appliance): self-update integration test extra case

Check that self-update doesn't run when SG is not yet deployed.

Test plan

Automated tests included. https://linear.app/sourcegraph/issue/REL-255/validate-appliance-self-update-end-to-end is split out so that we can perform further testing when Docker images for the appliance have been released.

Changelog

  • Appliance can automatically self-update up to 2 minor revisions beyond the currently-deployed sourcegraph version.

To the reconciler, this is just a value, but to higher-level packages
like appliance, there is a single configmap that is an entity. Let's
make sure all high-level orchestration packages can reference our name
for it. This could itself be extracted to injected config if there was a
motivation for it.
@cla-bot cla-bot Bot added the cla-signed label Jul 11, 2024
@craigfurman craigfurman force-pushed the appliance-self-upgrade branch from 52b6095 to d03aee1 Compare July 11, 2024 10:15
Craig Furman added 4 commits July 11, 2024 11:17
From reconciler tests, so that we can reuse it in self-update tests.
Add a worker thread to the appliance that periodically polls release
registry for newer versions, and updates its own Kubernetes deployment.

If the APPLIANCE_DEPLOYMENT_NAME environment variable is not set, this
feature is disabled. This PR will be accompanied by one to the
appliance's helm chart to add this variable by default.
Check that self-update doesn't run when SG is not yet deployed.
@craigfurman craigfurman force-pushed the appliance-self-upgrade branch from d03aee1 to e3306a4 Compare July 11, 2024 10:21

@craigfurman craigfurman left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big chunky PR (sorry) so I left some hopefully-helpful breadcrumbs. Of course feel free to raise more issues, and can zoom if it gets complicated.

Comment thread cmd/appliance/shared/config.go Outdated

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See companion sourcegraph/deploy-sourcegraph-helm#501, which enables this by default in the helm chart

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about if disabled by default is what we want, if we enable it in the Helm charts and the potential yaml files anyway. WDYT?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that we might want to enable it by default at this layer too? There'd certainly be some sense in that, but it would involve proposing a sensible default for the deployment name. When doing local tire-kicking, outside of kube, there won't be a deployment for the appliance and the process will crash.


type noopReconicler struct{}

func (noopReconicler) SetupWithManager(_ ctrl.Manager) error { return nil }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need a reconciler for these tests, we're just using the k8s control plane as a metadata store.

@@ -0,0 +1,17 @@
# k8senvtest

A wrapper package for sigs.k8s.io/controller-runtime/pkg/envtest. Has

@craigfurman craigfurman Jul 11, 2024

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better late than never! (this README that is, not that envtest needs a general wrapper)

k8sClient client.Client
)

func TestMain(m *testing.M) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using TestMain() allows us to control once-per-package setup and teardown. Some downsides are:

  • we have to manage error handling without test niceties
  • we can't use logtest since we don't have a T yet
  • we have to manage process exit code so can't truly make use of defer

Still, I think this is a little lighter-weight than testify/suite, which I did use for reconciler. We don't need all of its features, and this way, we get test parallelisation for free!

@jdpleiness WDYT? Arguably we could just use a mock k8sClient here and skip all the k8s control plane stuff, but it's kind of nice to be lazy and get validation that we're using k8s correctly for free.

This pattern might be handy for appliance API tests too, since we're crafting a configmap.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it's kind of nice to be lazy and get validation that we're using k8s correctly for free.

Yeah agreed. I like this much better.

var dep appsv1.Deployment
depName := types.NamespacedName{Name: "appliance", Namespace: nsName}
require.NoError(t, k8sClient.Get(ctx, depName, &dep))
return strings.HasSuffix(dep.Spec.Template.Spec.Containers[0].Image, "4.5.7")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't feel the need to use golden test fixtures here, I'm just asserting on a few fields.

// Wait for SG to be deployed before alloweing self-update
if kerrors.IsNotFound(err) {
u.Logger.Info("Sourcegraph ConfigMap not found, exiting appliance self-update")
return nil

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note - if we haven't got an SG version yet, it is probably simpler to just not self-update until we do.

if len(versionStrs) == 0 {
return "", errors.New("no versions found")
}
latestVersion, err := appliance.HighestVersionNoMoreThanNMinorFromBase(versionStrs, currentSGVersion, 2)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's a bit cheeky to import the appliance package here. If this ever becomes a problem that creates a circular dependency, we can always split appliance/versions.go into another package, maybe even the appliance/config package.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, should be fine. Easy to move later if not like you said.

func (u *SelfUpdate) Loop(ctx context.Context) error {
u.Logger.Info("starting self-update loop")

ticker := time.NewTicker(u.Interval)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: the first tick is after the interval, so it's painful to kick the tires on this without hacking the hardcoded value in cmd/appliance. Maybe we can extract that to a flag with a sensible default.

I haven't truly run this e2e since there were no appliance docker images available at the time of writing, so I split out https://linear.app/sourcegraph/issue/REL-255/validate-appliance-self-update-end-to-end for that final validation.

Namespace string
}

func (u *SelfUpdate) Loop(ctx context.Context) error {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the logic is in here. The rest is main-plumbing and tests.

}
return nil
})
if config.selfDeploymentName != "" {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: this is empty by default so self-update is disabled by default when just running the exe. But, sourcegraph/deploy-sourcegraph-helm#501 enables this by default for helm users, as all real users will be.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might have a method where we provide just a yaml file for users and have them run kubectl apply -f as well. But this is also trivial to add there as well 👍

@craigfurman craigfurman requested review from a team and jdpleiness July 11, 2024 10:41
@craigfurman craigfurman marked this pull request as ready for review July 11, 2024 10:41
}

dep.Spec.Template.Spec.Containers[0].Image = replaceTag(dep.Spec.Template.Spec.Containers[0].Image, newTag)
if err := u.K8sClient.Update(ctx, &dep); err != nil {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will cause a rolling deployment of the appliance. @jdpleiness we were talking the other day about whether we need to wait for the reconcile loop to finish when the context is cmd/appliance is canceled by the signal handler - good news, the controller-runtime already handles this 🎊

I tested this by interrupting the server immediately after triggering a reconcile. The logs show the signal getting handled, with a note about this exact scenario, and the reconciliation completing before shutdown.

So we don't need to do anything.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we don't need to do anything.

Nice! 💯

@jdpleiness jdpleiness left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good! A few questions but nothing blocking.

}
return nil
})
if config.selfDeploymentName != "" {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might have a method where we provide just a yaml file for users and have them run kubectl apply -f as well. But this is also trivial to add there as well 👍

k8sClient client.Client
)

func TestMain(m *testing.M) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it's kind of nice to be lazy and get validation that we're using k8s correctly for free.

Yeah agreed. I like this much better.


rc := m.Run()

// Our earlier defer won't run after we call os.Exit() below

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL

}

dep.Spec.Template.Spec.Containers[0].Image = replaceTag(dep.Spec.Template.Spec.Containers[0].Image, newTag)
if err := u.K8sClient.Update(ctx, &dep); err != nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we don't need to do anything.

Nice! 💯

if len(versionStrs) == 0 {
return "", errors.New("no versions found")
}
latestVersion, err := appliance.HighestVersionNoMoreThanNMinorFromBase(versionStrs, currentSGVersion, 2)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, should be fine. Easy to move later if not like you said.


// Otherwise, allow the crossing of one major boundary (if one is available)
var highestMinorToleratedInNextMajor *uint64
for _, version := range versions {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with that. This isn't like a super hot path or anything anyhow.

expectedVersion: "4.1.1",
},
{
name: "tolerates crossing a major version boundary, but then no further minor versions",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably a good call. I think we have few cases where we advise to upgrade to the major like then first, then proceed with upgrading from there.

Comment on lines +24 to +28
// If you're wondering why we define an interface here, Java-style, instead of
// using the more Go-ish idiom of encouraging consumers define interfaces, it's
// because there are a couple of packages we want to use mocks for this
// interface in, and its simpler to generate them in one place and import them
// everywhere.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment thread internal/appliance/versions.go Outdated
Comment thread cmd/appliance/shared/config.go Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about if disabled by default is what we want, if we enable it in the Helm charts and the potential yaml files anyway. WDYT?

@craigfurman

Copy link
Copy Markdown
Contributor Author

Thanks for looking @jdpleiness! Since you approved I'm going to merge this before it gets stale, please feel free to follow-up on any remaining threads and we can revert bits / do follow-ons.

@craigfurman craigfurman merged commit a20b065 into main Jul 11, 2024
@craigfurman craigfurman deleted the appliance-self-upgrade branch July 11, 2024 16:59
craigfurman pushed a commit that referenced this pull request Jul 31, 2024
**chore(appliance): extract constant for configmap name**

To the reconciler, this is just a value, but to higher-level packages
like appliance, there is a single configmap that is an entity. Let's
make sure all high-level orchestration packages can reference our name
for it. This could itself be extracted to injected config if there was a
motivation for it.



**chore(appliance): extract NewRandomNamespace() in k8senvtest**

From reconciler tests, so that we can reuse it in self-update tests.



**feat(appliance): self-update**

Add a worker thread to the appliance that periodically polls release
registry for newer versions, and updates its own Kubernetes deployment.

If the APPLIANCE_DEPLOYMENT_NAME environment variable is not set, this
feature is disabled. This PR will be accompanied by one to the
appliance's helm chart to add this variable by default.



**fix(appliance): only self-update 2 minor versions above deployed SG**




**chore(appliance): self-update integration test extra case**

Check that self-update doesn't run when SG is not yet deployed.

https://linear.app/sourcegraph/issue/REL-212/appliance-can-self-upgrade
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants