Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

chore/executors: Native Kubernetes Executors default to use single job pod#64088

Merged
peterguy merged 8 commits into
mainfrom
peterguy/executors-native-kubernetes-single-job-pod-true
Jul 31, 2024
Merged

chore/executors: Native Kubernetes Executors default to use single job pod#64088
peterguy merged 8 commits into
mainfrom
peterguy/executors-native-kubernetes-single-job-pod-true

Conversation

@peterguy

Copy link
Copy Markdown
Contributor

For Executors on Native Kubernetes deployments, the option to run jobs in a single pod has been available since Native Kubernetes has been around.

The purpose of running jobs in a single pod is:

  1. Efficiency. Jobs require three steps at least, and without specifying a single pod, that requires spinning up three pods.
  2. Security. For Batch Changes, when jobs are run across several pods, git's safe.directory must be set to avoid untrusted users or processes injecting code or an attack. Running the job in one pod removes the need for safe.directory.
  3. Usability. Because of the need to set safe.directory, root access to write to git's global config is required, which means that many times special configurations and sign-offs from security teams must be used for Batch Change setups.

This PR takes a step toward using single pod jobs only in enabling them by default instead of requiring an environment variable to enable them.

The same environment variable that was used to enable them - KUBERNETES_SINGLE_JOB_POD - is still available to disable them by setting it to false.

Test plan

Bazel and CI for now

Changelog

@cla-bot cla-bot Bot added the cla-signed label Jul 25, 2024
@peterguy peterguy self-assigned this Jul 25, 2024
@peterguy peterguy requested review from camdencheek and eseliger July 25, 2024 21:45
peterguy added 2 commits July 25, 2024 15:45
There are two tests: the default test, which tests the default values, and the config test, which sets up a mock config and checks that values are propagated from it correctly.
Change the value in the mock config to be the non-default value so that the test continues to test grabbing the non-default value.
peterguy referenced this pull request Jul 26, 2024
Committed all of the changes in one fell swoop because there was a lot of research, experimentation, and backtracking.

What this does:
- removes the `KubernetesSingleJobPod` configuration, along with it reading the environment variable and propogating to other variables and options.
- removes non-single-job code paths. There were several code paths identified by TODO comments that allowed choosing between single job and not. Removed all of the not branches.
- Modified all tests to be successful in the new world of single-job pods.
  - in the case of `workspace`, this meant removing all tests because the single job pod approach uses a placeholder workspace.
  - for `runner`, there are now at least five actions because now secrets are included.
- tweaks as the effects of the changes rippled out
  - `runner.Spec.Image` means nothing anymore - it's all `command.Spec.Image`. This is split out in [a separate PR](https://github.com/sourcegraph/sourcegraph/pull/64048), but there's some of that in this one also. I'll have to merge them at some point.
- modified some deployment definitions (yaml files). There's [another PR](https://github.com/sourcegraph/sourcegraph/pull/64088) for making single job pods the default that is supposed to be a precursor to this one, which has similar edits - another merging opportunity.
- Change the name of the env var `KUBERNETES_SINGLE_JOB_STEP_IMAGE` to `KUBERNETES_JOB_STEP_IMAGE` because "single" is now assumed.
  - made sure to read both from the env vars into the config to maintain backward compatibility.
@peterguy peterguy requested a review from loujar July 26, 2024 19:22
Comment on lines -24 to +30
# - apiGroups:
# - ""
# resources:
# - secrets
# verbs:
# - create
# - delete
- apiGroups:
- ""
resources:
- secrets
verbs:
- create
- delete

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@loujar should I have done this? I'm not clear on the purpose and use of these yaml files - are they simply examples?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as far as im aware, these manifests are just examples and are not actually used by customers. Customers are instructed to use either the helm chart or kustomize overlays for deploying the kubernetes executors.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I had to guess, this part of the Role definition would be for managing executor secrets but im not clear why this would only be required for the single job variant and not necessary for the current design.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also not sure why single job pods would need the secrets and the alternate approach doesn’t seem to. When running tests, it’s clear that single job pods take five steps instead of three: there’s a setting up of secrets and a deleting of them as the two extra steps.

Comment on lines -24 to +30
# - apiGroups:
# - ""
# resources:
# - secrets
# verbs:
# - create
# - delete
- apiGroups:
- ""
resources:
- secrets
verbs:
- create
- delete

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@loujar should I have done this? I'm not clear on the purpose and use of these yaml files - are they simply examples?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants