Executors Native SSBC#50079
Conversation
# Conflicts: # enterprise/cmd/executor/internal/worker/runtime/docker.go # enterprise/cmd/executor/internal/worker/runtime/runtime.go
# Conflicts: # doc/admin/executors/deploy_executors_kubernetes.md # enterprise/cmd/executor/internal/worker/runtime/runtime.go
| skipped, err := batcheslib.SkippedStepsForRepo(batchSpec.Spec, string(repo.Name), workspace.FileMatches) | ||
| if err != nil { | ||
| return apiclient.Job{}, err | ||
| } | ||
| executionInput.SkippedSteps = skipped |
There was a problem hiding this comment.
since this simply uses a library method, would it be easier to just let the batcheshelper recompute this vs putting it in the payload? once we go with dynamic skipping, too, we have to recompute it anyways on the client side.
There was a problem hiding this comment.
Actually, this is quite the lift to move into batcheshelper. In order for batcheshelper to call this lib function, it needs
BatchSpecWorkspace- Repo name (I believe batcheshelper gets this in the
WorkspacesExecutionInput) BatchSpec
We could update the WorkspacesExecutionInput to have these fields (then have transform.go put these fields in it since it already has all the data), but that's quite a change from how it looks today.
Also, SkippedSteps is static. So it only needs to be computed once. It wont change between steps.
There was a problem hiding this comment.
ahh I see 🤔 those are not required for dynamic step skipping?
There was a problem hiding this comment.
That's correct. This is just pure static skipping. For dynamic skipping, we use the stepContext with the if to determine if the step should be skipped - this is eval in batcheshelper
stepContext is based on the previous result and WorkspacesExecutionInput.
| if err := runtimeRunner.Run(ctx, spec); err != nil { | ||
| return errors.Wrapf(err, "running command %q", spec.CommandSpec.Key) | ||
| } | ||
| if strings.HasSuffix(spec.CommandSpec.Key, "pre") { |
There was a problem hiding this comment.
this again is coupling the batcheshelper and the executor implementation, can we simply skip to the read next step name from the skipfile and jump there? A bit scared that the "pre" word is in here - also because we already talked about that we might be able to consolidate pre and post into one and .. might rename it then :)
There was a problem hiding this comment.
The problem I am trying to solve with the pre check is I only want to check for a skip file when a pre step has been ran. I don't want to wait the cycles checking for the skip file on every step - especially when we are not doing a batches job.
So the following is the flow (for batches),
- Run the step (
step.docker.step.0.pre) - Check for skip file
- No skip file
- Run step (
step.docker.step.0.run) - Run step (
step.docker.step.0.post) - Run step (
step.docker.step.1.pre) - Check for skip file
- There is a skip file - get the "next" key (
step.docker.step.3.pre) - Check the next step (
step.docker.step.1.run) against the skip key - does not match - Check next step (
step.docker.step.1.post) against the skip key - does not match - Check next step (
step.docker.step.2.pre) against the skip key - does not match - Check next step (
step.docker.step.2.run) against the skip key - does not match - Check next step (
step.docker.step.2.post) against the skip key - does not match - Check next step (
step.docker.step.3.pre) against the skip key - does match - Run step (
step.docker.step.3.pre) - Check for skip file
- No skip file
- Rinse and repeat
There was a problem hiding this comment.
I can't really think of a nice way to not decouple batcheshelper and executor. The skip file is contextual to batcheshelper. I could at least move this check to a lib so all this pre, post, run stuff are all consolidated in a single place versus scatter all over the code base
…xtract out building of the pre, post, and run step keys. Extract the Skip file struct and name
eseliger
left a comment
There was a problem hiding this comment.
Great work. I think I'm fairly comfortable with the state of this. I'd say let's just get this out and see what works and what doesn't and get integration tests written against it :) We can always follow up more and fixup things, or reduce the coupling if we get better ideas.
| ) | ||
|
|
||
| // CommandKey returns the fully formatted key for the command. | ||
| func CommandKey(name Name, rawStepKey string, index int) string { |
There was a problem hiding this comment.
should we prevent jumping back? Otherwise, it might mean that one can create a loop.
There was a problem hiding this comment.
Could you tell me more on what you mean by jumping back?
There was a problem hiding this comment.
step0
step1
step2
are meant to be executed sequentially - if step1 now says "go to step 0", should we allow that? That could cause loops
There was a problem hiding this comment.
I don't think we should allow that. However going based on the Key value, there isn't a way to know if we are going backwards without parsing out the index from the Key string.
But we wont run into loops since we are not implementing a "go to" system, but a "skip to" system.
So say step1 says "go to step0", it will skip step2 (since it is not step0) and the Job will complete.
| skipped, err := batcheslib.SkippedStepsForRepo(batchSpec.Spec, string(repo.Name), workspace.FileMatches) | ||
| if err != nil { | ||
| return apiclient.Job{}, err | ||
| } | ||
| executionInput.SkippedSteps = skipped |
There was a problem hiding this comment.
ahh I see 🤔 those are not required for dynamic step skipping?
sanderginn
left a comment
There was a problem hiding this comment.
Another high quality PR 🙌
- A lot of unit tests write to or read from disk. It seems to me that disk operations should be mocked out, but I dunno if that's a feasible thing.
- There's a bunch of unnecessary line breaks in the markdown files, probably from your IDE formatter. Not sure if those are still rendered incorrectly but probably worth fixing.
| Native Server-Side Batch Changes is an image that runs Batch Changes without | ||
| requiring [`src-cli`](https://github.com/sourcegraph/src-cli) to be installed on the Executor machine. |
There was a problem hiding this comment.
Let's include a short motivation for why a customer would benefit from this
There was a problem hiding this comment.
I wonder if we should even brand this as "native execution", or simply call it the "new execution infrastructure" or execution model or something. This feels like unnecessary terms for a customer to learn, especially if the migration works well, 99.9% of users should never worry about it. Just my 2ct though.
There was a problem hiding this comment.
I like Native Execution
| The Docker Images that execute the actual Batch Change step require `tee` to be available on the image. Without `tee`, | ||
| the output of the step cannot be captured properly for template variable rendering. |
There was a problem hiding this comment.
Is it feasible to validate that tee is present on the image somewhere? Maybe in run.Pre()?
| ## Build | ||
|
|
||
| When running on Kubernetes, the image can be built with the following command. | ||
|
|
||
| ```shell | ||
| IMAGE=sourcegraph/batcheshelper:insiders ./build.sh | ||
| ``` | ||
|
|
||
| When running the following `sg` commands, `batcheshelper` will automatically be built on any changes to the binary. | ||
|
|
||
| - `batcheshelper-builder` | ||
| - `batches` |
There was a problem hiding this comment.
Why does it need to be built manually?
There was a problem hiding this comment.
We don't yet have a profile/command yet in the sg run. So any changes to the source code wont automatically build the images.
| // KubernetesMountDir is the path where the Kubernetes volume is mounted in the container. | ||
| KubernetesMountDir = "/data" | ||
| // KubernetesMountPath is the path where the Kubernetes volume is mounted in the container. | ||
| KubernetesMountPath = "/data/" |
There was a problem hiding this comment.
These are named in a confusing way: I would reflect the mountPath and subPath names in these constants, especially because the KubernetesMountDir is now used as the MountPath in the volume mount.
| MountPath: KubernetesMountDir, | ||
| SubPath: strings.TrimPrefix(path, KubernetesMountPath), |
There was a problem hiding this comment.
Looking at what you're doing here, I'm assuming that path contains the workspace (and maybe step) directory, and the files that the job should have access to live in the /data/ directory in this location?
I would suggest to maybe rename KubernetesMountDir to something like ContainerMountPath. KubernetesMountPath could be HostVolumeSubPath or something.
There was a problem hiding this comment.
We do want the Kubernetes prefix on these since we have docker and firecracker in the same package. How about KubernetesMountPath for KubernetesMountDir (since it is in additional places than just the volume mount), and KubernetesVolumeMountSubPath for KubernetesMountPath?
| if err = os.Remove(path); err != nil { | ||
| return "", errors.Wrap(err, "removing skip file") | ||
| } |
There was a problem hiding this comment.
What happens if os.Remove(path) errors due to something like an I/O failure in the middle of a series of steps? Is the entire job marked as failed?
There was a problem hiding this comment.
Yea, you're right. And it is not end of the world if it can't be removed
| - `miniube image load executor-kubernetes:latest` | ||
| - `miniube image load sourcegraph/batcheshelper:insiders` |
There was a problem hiding this comment.
| - `miniube image load executor-kubernetes:latest` | |
| - `miniube image load sourcegraph/batcheshelper:insiders` | |
| - `minikube image load executor-kubernetes:latest` | |
| - `minikube image load sourcegraph/batcheshelper:insiders` |
| si.ExitCode = entry.ExitCode | ||
| si.FinishedAt = entry.StartTime.Add(time.Duration(*entry.DurationMs) * time.Millisecond) | ||
| // SetSafeFunc is a function that can be used to set a value on a StepInfo. | ||
| type SetSafeFunc func(step int, cb func(*StepInfo)) |
There was a problem hiding this comment.
Why is it called a SafeFunc?
There was a problem hiding this comment.
Good question.. Seems like it prevents any panics from happening 🤷 - by having checks... ?
I'll rename to just SetFunc
Closes #44183.
This PR updates the
batcheshelperimage (Native SSBC) to be feature parity withsrc. This will allow us to remove the dependency executors have onsrc.Changes
README.mdto thebatcheshelperpremodepostmodecommand.goto check the logs if the step is skippedpoststepIndexto the step to understand what "group" of commands belong togetherbatchSpecWorkspaceStepV2Resolverto handle env resolution by parsing the logs and adding astepInfoto itCreated Issues
Test plan
Added unit tests.
Scenarios
Skipped Steps
Hardcoded
Batch spec
Docker Clip
Screen.Recording.2023-04-13.at.09.22.58.mov
K8s Clip
Screen.Recording.2023-04-13.at.10.06.20.mov
Dynamic
Batch spec
Docker Clip
Screen.Recording.2023-04-13.at.09.23.30.mov
K8s Clip
Screen.Recording.2023-04-13.at.10.22.00.mov
Workspace Files
File
Batch spec
Docker Clip
Screen.Recording.2023-04-13.at.09.32.01.mov
K8s Clip
Screen.Recording.2023-04-13.at.10.24.45.mov
Directory
Batch spec
Docker Clip
Screen.Recording.2023-04-13.at.09.33.24.mov
K8s Clip
Screen.Recording.2023-04-13.at.10.25.50.mov
File Mounts
Batch spec
Docker Clip
Screen.Recording.2023-04-13.at.09.48.05.mov
K8s Clip
Screen.Recording.2023-04-13.at.10.26.56.mov
Env Var Rendering
Batch spec
Docker Clip
Screen.Recording.2023-04-13.at.09.51.06.mov
K8s Clip
Screen.Recording.2023-04-13.at.10.28.05.mov
Caching
Batch spec
Docker Clip
Screen.Recording.2023-04-13.at.09.54.21.mov
K8s Clip
Screen.Recording.2023-04-13.at.10.29.25.mov