-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Description
There have been lots of discussions to optimize the rebuild flow of building a project with Docker. We have seen workarounds where volumes are used instead to get access to the source data to optimize the process. But that approach is really dangerous as the input data is mutable and can't be properly cached.
I'm proposing adding an endpoint to the remote API that could be used to transfer data between client and the builder using a negotiated protocol. For incremental build use case one of these protocols could be one that can skip the transfer of files that have not been changed.
@dmcgowan and I have been prototyping a solution for this using the changes stream from containerd https://godoc.org/github.com/docker/containerd/fs#Changes . Usually, this is used for finding the difference of changes between a read-write and image layer when committing a container. But the same concept also works for detecting the difference between current and a previous context transfer from a client. This is also the same algorithm used by rsync recursive directory sync (rsync delta update algorithm could be added later if it can help performance).
Having a special endpoint for context transfer also means we don't need to do any hacks with Dockerfile/.dockerignore where we currently try to combine build description and build inputs to the same archive and then extract and separate on daemon side.
Other benefits of this include daemon possibly detecting what files are actually needed for the build and skipping the transfer of the rest. Later this could be used for providing multiple sources for a build operation, skipping context transfer completely when parallel builds require same context, syncing files back to the client etc.
This would not remove any existing features from the builder. Sending context with a tar archive in a POST request would still be supported.
I hope to share code soon, was hoping to get some initial thoughts/feedback.
The transfer would go over a hijacked connection similar to the one used by the attach endpoint. For the framing, the prototype currently uses grpc. Could probably switch to websockets as grpc doesn't really provide much extra value in this case. All transferred data is encapsulated by protobuf. Initial benchmarks have been encouraging.