Skip to content

Proposal: embed build sources in image config #2269

@tonistiigi

Description

@tonistiigi

When an image has been built it is good to know what were the dependencies of the specific build. This allows figuring out if any of the dependencies have been updated and the build should be run again. Or maybe in the future this could be used as a way to pin the dependencies to a specific digest for reproducibility.

LLB has a Source operation for these cases: a container image, git commit, http URL, or local directory. Everything but the local directory can be tracked with immutable digest based only on the LLB definition.

When this immutable digest is computed in CacheMap()

CacheMap(context.Context, session.Group, int) (*CacheMap, bool, error)
, we can extend the return structure
type CacheMap struct {
with extra information that is later added to the image config. Because the solver package is generic and doesn't know about LLB/snapshots I think it should just be a string map. I don't think it makes sense to reuse the existing CacheOpts field for this (@sipsma).

ResolveResponse map[string]string

{ "container-image://docker.io/library/alpine:3.13": "sha256:deadbeef" }

When solver runs the build it already stores the CacheMap value for all the vertexes running as part of the build. Before returning CachedResult

return j.list.s.build(ctx, e)
it can walk back all the parent vertexes and gather their ResolveResponse values and combine them to a single structure that is returned out from the Build() function. The extra return value is needed because Metadata in CachedResult is not typed. Maybe it should be but that is for a different proposal.

Now this structure can be passed to the exporter. The image exporter will would add it as an extra field. As this is BuildKit specific, I think it makes sense to use similar as what we do with inline build-cache - use a single base64 encoded string with a buildkit specific name.

"moby.buildkit.buildinfo.v0": <base64>

Base64 decodes to

{ "sources": [
{
   "type": "image",
   "ref": "docker.io/library/alpine:3.13",
   "pin": "sha256:"
}, 
{
   "type": "git",
   "ref": "github.com/docker/buildx#master",
   "pin": "sha1:deadbeef"
}
]

There is one special case to take into account. A frontend might have already transformed a string user typed before generating LLB. Eg. in Dockerfile this happens for FROM images because Dockerfile needs to load their image config in the frontend in order to access env/onbuild etc. While doing that Dockerfile always adds digest to the image ref in order for the LLB solve to always point to the same image. So in LLB we already have the digest ref, but in the embedded buildinfo it would be better to show the original value.

The solution for thiss is that Dockerfile frontend can create its own moby.buildkit.buildinfo.v0 key in the image config for the values it sees and then the image exporter can fix it up after full solve. This is similar to how the history array works atm by Dockerfile adding the command strings and exporter filling up dates etc. later in patchImageConfig(). Dockerfile can add a record like:

{
   "type": "image",
   "ref": "docker.io/library/alpine:3.13",
   "alias": "docker.io/library/alpine:3.13@sha256:",
   "pin": "sha256:",
}, 

So that when now LLB adds a source for "docker.io/library/alpine:3.13@sha256:" it is fixed in exporter and alpine:3.13 is used as original ref instead.

We can start by adding this frontend component in Dockerfile and extend it to support full LLB.

I think this can be enabled by default. There shouldn't be any security aspect of having access to the source images. Mostly this information is already in the history array with textual form. But we should provide a way to opt-out with a special key in -o.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions