update containerd and port executor over to the container client library#2275
update containerd and port executor over to the container client library#2275aaronlehmann merged 35 commits intomoby:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2275 +/- ##
==========================================
- Coverage 60.96% 60.42% -0.54%
==========================================
Files 126 125 -1
Lines 20391 20394 +3
==========================================
- Hits 12431 12323 -108
- Misses 6589 6688 +99
- Partials 1371 1383 +12 |
agent/exec/containerd/adapter.go
Outdated
| } | ||
|
|
||
| func withNamespace(ctx context.Context) context.Context { | ||
| return namespaces.WithNamespace(ctx, "default") |
There was a problem hiding this comment.
Is there value to using a swarm-specific namespace instead of the default namespace?
cc @stevvooe
There was a problem hiding this comment.
My plan was to make it a swarmd command line option (sooner rather than later, I just didn't get round to it). I'm not sure what the best default is.
There was a problem hiding this comment.
Done in the commit I just pushed.
agent/exec/containerd/adapter.go
Outdated
| // directly attaches stdin to /dev/null. | ||
| if devNull == nil { | ||
| if devNull, err = os.Open(os.DevNull); err != nil { | ||
| return errors.Wrap(err, "opening dev null") |
There was a problem hiding this comment.
"opening null device"?
agent/exec/containerd/adapter.go
Outdated
| if p, ok := mountPropagationReverseMap[m.BindOptions.Propagation]; ok { | ||
| propagation = p | ||
| } else { | ||
| log.G(ctx).Warningf("unknown bind mount propagation, using %q", propagation) |
There was a problem hiding this comment.
extra space after the ,
agent/exec/containerd/controller.go
Outdated
| return err | ||
| } | ||
| log.G(ctx).Errorf("EXIT STATUS %v", exitStatus) | ||
| if err := r.adapter.remove(ctx); err != nil { |
There was a problem hiding this comment.
It's not correct to remove the container automatically on shutdown. We only remove it when the task is removed from the assignment set. This way old containers are kept around as long as the corresponding historic task is retained.
Remove should be called by reconcileTaskState once the task is no longer in the assignment set. With default settings, you may have to cycle through 5 tasks for a particular slot (or scale down) to see a removal happen.
There was a problem hiding this comment.
Yes, this was still here (nb: it's the same behaviour as current master, since this series also moves the remove from r.adapter.shutdown into r.adapter.remove) while I figured out the strangeness I mentioned in the PR comment, I worked those out yesterday so I'll clean up those commits and append them here today.
There was a problem hiding this comment.
This is gone with the batch of changes which reworks Wait() and Shutdow().
|
Force push to fix Aaron's two nits, added "containerd: log tag which fits in with the others" (was using "ID", which stood out like a sore thumb, now uses "container.id" which is a better fit) |
|
Added a pile more patches which rework |
|
lint error: |
Needed to add namespacing and to use the new runtime name. Signed-off-by: Ian Campbell <ian.campbell@docker.com>
This is v1.0.0-rc6-12-g372ad78 and corresponds to what the vendored containerd is using. Signed-off-by: Ian Campbell <ian.campbell@docker.com>
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
The containerd client library contains a containerd.Container which will be a more appropriate use of the name. The existing spec function is renamed makeSpec() but after the switch to the client library the function becomes so simple it can be inlined into create. Signed-off-by: Ian Campbell <ian.campbell@docker.com>
…e instead The containerd client library contains a containerd.Task which will be a more appropriate use of the name. Signed-off-by: Ian Campbell <ian.campbell@docker.com>
This simplifies a bunch of stuff, with more potential improvements yet to come. Hardcodes the default namespace for now. Signed-off-by: Ian Campbell <ian.campbell@docker.com>
Rootfs and Mounts handling is not converted yet and will follow. Signed-off-by: Ian Campbell <ian.campbell@docker.com>
This means we no longer need to unpack the rootfs manually. Signed-off-by: Ian Campbell <ian.campbell@docker.com>
Changes/simplifications: - no anon volume support, they don't really make sense in a cluster environment; - mount over the image provided mounts instead of trying to dedup; - don't worry about dedup of the requested mounts either; With this there is no longer any need to obtain the image config on the client side. Signed-off-by: Ian Campbell <ian.campbell@docker.com>
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
It is so simple now a separate function is not that useful. Signed-off-by: Ian Campbell <ian.campbell@docker.com>
Now that rootfs and stdio are handled by containerd there is no need for this. Signed-off-by: Ian Campbell <ian.campbell@docker.com>
It is called from controller.Prepare and this seems to better reflect its purpose. Signed-off-by: Ian Campbell <ian.campbell@docker.com>
Here "prepared" means the prepared() method has been called and therefore both c.container and c.task can be assumed to be non-nil. Signed-off-by: Ian Campbell <ian.campbell@docker.com>
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
This potentially allows callers to try and cleanup partial state. Signed-off-by: Ian Campbell <ian.campbell@docker.com>
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
Existing tags as "node.id" and "task.id" etc. Signed-off-by: Ian Campbell <ian.campbell@docker.com>
It's just an event, not a fatal one (at least not at this level) Signed-off-by: Ian Campbell <ian.campbell@docker.com>
Dropping the exit code. The exitError type implements the ExitCoder interface so callers can extract it if they want. Signed-off-by: Ian Campbell <ian.campbell@docker.com>
The containerd Task struct contains a lot of info which callers are not interested in and which is not provided by the Task.Status() method, which we will shortly switch to in preference to an open coded InfoRequest gRPC. Belt-and-braces checks that ID and Pid match will go away as part of Task.Status switch. Signed-off-by: Ian Campbell <ian.campbell@docker.com>
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
It's unexpected and unnecessary, the orchestrator will call the Shutdown() or Remove() methods as needed. Signed-off-by: Ian Campbell <ian.campbell@docker.com>
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
Idempotency is not a requirement of the interface, just let it return failed (presumably some sort of "already dead" or "not found" type error). Signed-off-by: Ian Campbell <ian.campbell@docker.com>
agent/exec/containerd/executor.go
Outdated
| func NewExecutor(sock string, genericResources []*api.GenericResource) (exec.Executor, error) { | ||
| // TODO(ijc), configurable namespace. | ||
| // TODO(ijc), default to swarmd? | ||
| client, err := containerd.New(sock, containerd.WithDefaultNamespace("default")) |
There was a problem hiding this comment.
We should probably have something nice here for the namespace. k8s is using k8s.io. Perhaps, swarmkit.docker.com, but I am not sure if we are spinning off this project. swarmkit is probably sufficient.
There was a problem hiding this comment.
I tend to think of swarmd the standalone daemon process as somewhat separate to the swarmkit toolkit. So I think I might take a two layered approach here and have cmd/swarmd/main.go define a --containerd-namespace option, defaulting to "swarmd" which is passed as an argument to NewExecutor here.
In NewExecutor if the requested namespace is "" then it'll default to swarmkit. Or maybe I'll make it an error.
There was a problem hiding this comment.
Done in the commit I just added, I decided to make NewExecutor insist on an application provided namespace and had swarmd default to swarmd.
agent/exec/containerd/adapter.go
Outdated
| c.exitStatus = makeExitError(rsp.ExitStatus, "") | ||
| } else { | ||
| c.log(ctx).Debug("Task already deleted, error=%s", c.exitStatus) | ||
| var err error |
There was a problem hiding this comment.
This declaration seems unnecessary.
There was a problem hiding this comment.
It does, looks like it is there in master and this is just moving it around, it actually disappears entirely later in the series, in containerd: Use Task.Delete() in adapter.shutdown().
A downside which I hadn't anticipated of reviewing commit by commit is that GH marks comments such as this as "Outdated" right as they are made, so they are collapsed in the webui and I only notice them via the email notification.
| } | ||
|
|
||
| if sig == syscall.SIGTERM { | ||
| // We've tried as hard as we can. |
There was a problem hiding this comment.
We don't retry with SIGKILL?
There was a problem hiding this comment.
That was my intention but I mixed up my TERMs and my KILLs. Will fix.
This means we need to send the signals ourselves, since Delete waits for the IO to complete. Previously we were just lucky (and probably leaking some goroutines). Signed-off-by: Ian Campbell <ian.campbell@docker.com>
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
This function checks for valid options and for duplicates, but we have just constructed opts from a static set above and can trivially verify that everything is ok. Signed-off-by: Ian Campbell <ian.campbell@docker.com>
Requires a newer github.com/pkg/errors due to:
undefined: "github.com/docker/swarmkit/vendor/github.com/pkg/errors".WithStack
So update to 645ef00459ed84a119197bfb8d8205042c6df63d which is v0.8.0 as used
in current containerd.
Despite substantial changes to the containerd grpc interface the client library
isolated us from all that so no code changes required!
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
|
Force push is |
When inspecting a system with `ctr` by default you will now need to export `CONTAINERD_NAMESPACE=swarmd`. Signed-off-by: Ian Campbell <ian.campbell@docker.com>
|
CI failure is: Which hopefully can't be my fault! |
|
It passed on the second attempt. |
agent/exec/containerd/adapter.go
Outdated
| return c.exitStatus | ||
| case <-ctx.Done(): | ||
| return ctx.Err() | ||
| case <-time.After(timeout): |
There was a problem hiding this comment.
This would bet set to StopGracePeriod, but I don't think that value should apply here.
// StopGracePeriod the grace period for stopping the container before
// forcefully killing the container.
If it was set to 0, or a very small value, this would return false "task is unkillable" errors.
There was a problem hiding this comment.
I assume you are talking specifically about this timeout once we have brought out the big guns, for the initial attempt using StopGracePeriod is correct and proper I think (and I think you think so too, just checking).
Looking at the equivalent code on the docker side it blocks indefinitely once it has resorted to SIGKILL. In general that is probably fine, very little will survive a SIGKILL for very long but blocking in things like uninterruptible sleep on I/O can take a considerable length of time to die. I think this is something swarmkit can (and apparently does when running against docker) tolerate so the correct answer is to wait here with no timeout, other than the ctx.Done().
…own. There is no good choice of timeout here and SIGKILL is always eventually fatal, although the death can be deferred in some cases by long uninterruptible sleeps. Signed-off-by: Ian Campbell <ian.campbell@docker.com>
|
LGTM |
1 similar comment
|
LGTM |
- moby/swarmkit#2266 (support for templating Node.Hostname in docker executor) - moby/swarmkit#2281 (change restore action on objects to be update, not delete/create) - moby/swarmkit#2285 (extend watch queue with timeout and size limit) - moby/swarmkit#2253 (version-aware failure tracking in the scheduler) - moby/swarmkit#2275 (update containerd and port executor to container client library) - moby/swarmkit#2292 (rename some generic resources) - moby/swarmkit#2300 (limit the size of the external CA response) - moby/swarmkit#2301 (delete global tasks when the node running them is deleted) Minor cleanups, dependency bumps, and vendoring: - moby/swarmkit#2271 - moby/swarmkit#2279 - moby/swarmkit#2283 - moby/swarmkit#2282 - moby/swarmkit#2274 - moby/swarmkit#2296 (dependency bump of etcd, go-winio) Signed-off-by: Ying Li <ying.li@docker.com> Upstream-commit: 4509a00 Component: engine
- moby/swarmkit#2266 (support for templating Node.Hostname in docker executor) - moby/swarmkit#2281 (change restore action on objects to be update, not delete/create) - moby/swarmkit#2285 (extend watch queue with timeout and size limit) - moby/swarmkit#2253 (version-aware failure tracking in the scheduler) - moby/swarmkit#2275 (update containerd and port executor to container client library) - moby/swarmkit#2292 (rename some generic resources) - moby/swarmkit#2300 (limit the size of the external CA response) - moby/swarmkit#2301 (delete global tasks when the node running them is deleted) Minor cleanups, dependency bumps, and vendoring: - moby/swarmkit#2271 - moby/swarmkit#2279 - moby/swarmkit#2283 - moby/swarmkit#2282 - moby/swarmkit#2274 - moby/swarmkit#2296 (dependency bump of etcd, go-winio) Signed-off-by: Ying Li <ying.li@docker.com> Upstream-commit: 4509a00 Component: engine
This switches to the client library interfaces for pulling images, producing the spec and stdio as well as some of the lifecycle ops.
Unported are events/wait and tear down, I'm seeing some strangeness with those which I am still trying to diagnose, so for now these are usingPort is now complete.Client.TaskService(which is still an improvement) but still making direct gRPC calls.This is a pretty big change, I expect the overall diff will be pretty unreviewable, I've tried to break the commits down into individual logical changes with decent commit messages which ought to be a bit more tractable (although not always by much).