Skip to content

Add support for lazily-pulled blobs in cache manager.#1475

Merged
tonistiigi merged 4 commits intomoby:masterfrom
sipsma:lazy-ref
Aug 6, 2020
Merged

Add support for lazily-pulled blobs in cache manager.#1475
tonistiigi merged 4 commits intomoby:masterfrom
sipsma:lazy-ref

Conversation

@sipsma
Copy link
Collaborator

@sipsma sipsma commented May 7, 2020

Closes #870

This update allows cache manager's GetByBlob to optionally be called with a descriptor that does not yet exist in the local content store. When such a descriptor is loaded, the corresponding cache.Manager method needs to be called with a cache.DescHandlers object as a RefOption. DescHandlers specifies the provider/progress/image-ref to use for a given ref in the event it needs to be unlazied (i.e. during an extract or for preparation of a mount).

DescHandlers get associated with a vertex via a new field Opts of type solver.CacheOpts in the solver.CacheMap object, which can optionally be set by vertex implementations. CacheOpts allows associating arbitrary keys (wrapped in a generic interface{}) with arbitrary values. These values can be accessed through solver.CacheOptGetterOf(ctx), which has is a function that accepts a list of keys and returns a map of those keys to their associated values. This function will search "upwards" in the solver graph (traversing parents+ancestors via a DFS) until it finds a vertex that has set the associated key.

Currently, DescHandlers is the only use case for these CacheOpts (it gets set in source/containerimage/pull.go), but the intention is to potentially be generally useful in case it's needed more in the future. When a lazy ref needs to be loaded, the caller gets a typed error (cache.NeedsRemoteProvidersError) and then can use solver.CacheOptGetterOf(ctx) to find the DescHandler for each of the lazy blobs.

The change also includes updates to take advantage of laziness during push's, so that remote blobs are only pulled locally when strictly necessary, but not during, for example, a solve that simply updates the tag of a remote image or a push that can take advantage of a registry's support for cross-repo pushing. This is accomplished by moving the GetRemote method from worker directly to ImmutableRef. If GetRemote is called on a lazy ref, the Remote's provider will be setup such that if it's opened for reading, the ref will get unlazied (that is, pulled into the local content store). This results in the blobs staying lazy unless they are actually needed to be read by the push handlers during export.

Copy link
Collaborator

@hinshun hinshun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first pass at this PR.

At a high level, it seems like this differs from the proposal in #870, and this PR chose an approach that minimized changes. I think this is fine, but I'm not familiar with this part of the code so I'll leave the higher level discussion to @tonistiigi.

I'll write a summary here so I can quickly understand when I look at this again.

  • GetByBlob gets called in three use cases:
    • During scheduler loop, when it executes an edge causing it load the cached result from a cache manager. That in turn will call out to the worker to convert a Remote into a ImmutableRef.
    • During cache import, when the cache manager to called to load the cached result from the cache manager constructed from the cache import.
    • During cache export, when the solver calls out to the exporter to ExportTo, which in turns calls to load the solver.Result.
  • RefOption is a interface{} so consumers of these opts cast before applying to their options struct.
  • The content.Provider, progress.Writer and image ref are plumbed down using perSessionState which are mapped to the session ID.
  • The meat of this PR is in getLazyRef added to the *puller and the EnsureContentExists added to ImmutableRef.
# Solve case
(*scheduler).dispatch
        -> (*edge).unpark
                -> (*edge).execIfPossible
                        -> (*edge).loadCache
                                -> (*sharedOp).LoadCache
                                	-> (*combinedCacheManaged).Load
                                        	-> (*cacheManager).LoadWithParents
                                        		-> (*cacheResultStorage).LoadWithParents
                                        			-> (Worker).FromRemote
                                        				-> (Accessor).GetByBlob
                                			-> (*cacheResultStorage).Load
                                        			-> (Worker).FromRemote
                                        				-> (Accessor).GetByBlob
# Cache import case
(*lazyCacheManager).Load
	-> (*cacheManager).Load
		-> (*cacheResultStorage).LoadWithParent
			-> (Worker).FromRemote
				-> (Accessor).GetByBlob
# Cache export case
(*Solver).Solve
        -> (*exporter).ExportTo
        	-> (*cacheResultStorage).Load

@sipsma
Copy link
Collaborator Author

sipsma commented May 12, 2020

@hinshun Thanks for the comments! I will push the updates asap in the next day, just can't make them quite yet because I haven't set my desktop back up yet after moving to a new place.

At a high level, it seems like this differs from the proposal in #870, and this PR chose an approach that minimized changes.

Yeah the implementation is different than the original issue proposed, but I believe it's closer to what @tonistiigi suggested more recently on slack when discussing this. He can confirm though.

GetByBlob gets called in three use cases

Your summary is good, I think it's just missing the cases where the solver has a SourceOp which is a containerImage type and needs to determine the cache key and/or get a cache ref for it (when mounting the container image somewhere, for example). In those cases, I don't believe FromRemote is in the call stack.

The meat of this PR is in getLazyRef added to the *puller and the EnsureContentExists added to ImmutableRef.

That's correct, I'd just add that the changes to ImmutableRef.Extract (which now un-lazies content when needed) are also part of the crux and that EnsureContentExists may only need to be public in the short term (see my TODO comment here).

Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in slack, one of the issues that came up with this was the case where a lazy ref is looked up from the cache by another build request. This is the part currently solved by every request creating a lazy-ref in the CacheKey method.

The current approach seems risky and doesn't really follow to ops/solver logic. I think there are 2 possible solutions to avoid this that involve some tweaks to the cache logic.

The first approach would make the cache lookups not work until a ref is lazy. With this approach, when we call cache.Save(), for a lazy ref we would instead store this position as a callback and if the ref gets un-lazied these callbacks would get executed and add the cache records. As lazy refs are only in memory cleanup of these callbacks would be easy as well.

The problem, though, is that this is that it doesn't solve the same problem for the mergeop and stargz cases. In these cases, we definitely would want to have cache working even if stargz has not pulled the for ref or if part of the merged ref has not been pulled.

The second approach is that when lazy ref is accessed by the generated ID (like cache does) it can fail with a typed error requiring that you pass extra information (like a provider for a descriptor and progress info) with the method. Now the question is how a cache lookup can provide this extra information. If the lazy ref already stores the image-ref(s) then it is not that complicated as cache lookup already has access to the credentials through the session. But I'm not sure if it is a good idea to allow cache manager to just ask credentials for any previously cached image. Alternative would be to use something quite similar to the current solution where CacheMap query would save possible "extra load info" that cache lookup could use. But ideally, this is not done through a global map indexed by session but saved back to the job so another vertex from the same job can access it. Maybe using the session as an index if fine as well if this is done outside refs. (The main issue with current logic is that cache functions shouldn't create refs). Not sure how the cleanup would work without storing it to the job though. LMK if this is too confusing... fyi @ktock

RefOption is a interface{} so consumers of these opts cast before applying to their options struct.

@hinshun This is pretty bad. But lets not expand the scope of this PR with it. Feel free to create an issue for adding more typed API for this (or just fix it:))

@sipsma
Copy link
Collaborator Author

sipsma commented May 20, 2020

@tonistiigi

Alternative would be to use something quite similar to the current solution where CacheMap query would save possible "extra load info" that cache lookup could use. But ideally, this is not done through a global map indexed by session but saved back to the job so another vertex from the same job can access it. Maybe using the session as an index if fine as well if this is done outside refs.

Okay interesting, I'm focusing on your second approach for now cause it would be nice to have something that works for the stargz/mergeop cases too. Let me double check my understanding of your suggestion here.

In a previous iteration of this instead of calling GetByBlob in CacheKey I had added a separate entity called DescriptorManager that gets created by Worker and passed to both *imageSource and cache.Manager, which basically allowed CacheKey to associate providers/progress with a given session+oci descriptor and cache.Manager to query what provider/progress it should use for a given oci descriptor needing to be un-lazied. Essentially, I used a separate per-worker map to communicate provider/progress between image source and cache manager instead of GetByBlob calls.

That seems slightly closer to what you're suggesting relative to the current PR's approach but still different in a few crucial ways:

  1. Instead of having CacheKey communicate with cache.Manager through a new entity like DescriptorManager, your suggestion is to have CacheKey and/or one of its callers like SourceOp communicate which provider to use back up to the solver level (one way or another), which will store it per-job and then pass the relevant information to solver.CacheManager in case it's needed for unlazying.
  2. Instead of separating providers/progress by session, separate by solver job

Is that correct? Or am I misunderstanding?

If so, makes total sense in general, but I think one tough part might be how to keep Solver abstracted over different types of vertexes. That is, while Solver today already knows about progress for each vertex (allowing it to pass that information through to CacheManager when it's needed for unlazies), Solver doesn't know about providers today, and providers are highly specific to image source vertexes.

What do you think about these possibilities? (I don't love any of them, but want to get your thoughts on them as a starting point):

  1. CacheMap struct could have a new CacheOptions []CacheOption field which gets set by the specific op implementation. CacheOption would be some typed (not interface{} :-)) opt that can get passed through solver.CacheManager.Load and turned into a RefOption before getting passed to cache.Manager. For example,*imageSource can include an option for setting a session.Provider which would be used in case an unlazy is needed.
  2. Or instead CacheMap could have a new map[string]string field, where ops can fill in arbitrary metadata they want passed to the cache on load. *imageSource would then use this to specify the image ref string. In theory, all you need today to create a provider is a session.Manager and a image ref string, and Solver should already know about session.Manager, so if it just also provided cache.Manager the image ref string through an attribute map like that, cache.Manager could create the provider object itself when an unlazy needs to occur.
    • However, map[string]string is almost as bad as interface{} and having cache.Manager be responsible for creating a provider feels like it might be leaking concerns across the code?

@tonistiigi
Copy link
Member

tonistiigi commented May 21, 2020

What do you think about these possibilities? (I don't love any of them, but want to get your thoughts on them as a starting point):

I think smth like https://github.com/moby/buildkit/compare/master...tonistiigi:cache-helpers-example?expand=1 isn't that bad. No new imports are added to the solver package. If we want to avoid changing the Load() method signature we could hide this inside the context as well.

One thing I got wrong in that example branch is that helpers shouldn't just come directly from the current sharedOp instance but any vertex from the same job. For example when we load cache and parent (read: one of the parents) needs a "load-helper", not the ref being loaded itself, it should still work. This case will not happen atm because lazy refs can't have un-lazy children but will with stargz and mergeop. Not quite sure for the best way to merge these "load-helpers" together, it might make sense to pass it as a callback instead of a map between solver and cache loader, and when loading fails and needs helpers we can call the callback that will traverse the build graph looking for the helper with matching key(s).

Copy link
Collaborator Author

@sipsma sipsma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think smth like https://github.com/moby/buildkit/compare/master...tonistiigi:cache-helpers-example?expand=1 isn't that bad. No new imports are added to the solver package. If we want to avoid changing the Load() method signature we could hide this inside the context as well.

Awesome, makes total sense, including the part about probably needing more of a callback that can traverse ancestor vertexes in the build graph. Did a rough pass of this approach, the only minor question mark left in my head is what to do in the corner case where the callback is looking for a load helper key (such as content-provider:...) and it's present in multiple ancestors in the build graph (i.e. the case we discussed previously where two image source vertexes resolve to the same image in different repos). For now, I'll probably just have it use whichever helper it happens to find first.

Will push the updates in the next few days.

@tonistiigi
Copy link
Member

For now, I'll probably just have it use whichever helper it happens to find first.

Yes, that should be fine. Digests should provide stability here. Eg. if you today have 2 different image references in the same build that point to the same digest, pull also happens once and there is no predefined order defining which ref was used.

@sipsma sipsma force-pushed the lazy-ref branch 4 times, most recently from fe195be to bd7f19f Compare June 4, 2020 20:06
@sipsma
Copy link
Collaborator Author

sipsma commented Jun 4, 2020

Updated from previous comments and w/ a new integration test. Travis is failing for this PR, but I believe it's unrelated; the only failure is failed dialing: /tmp/bktest_buildkitd857002840/buildkitd.sock in an unrelated test and at the time of this writing the master branch seems to have a similar issue. When I run locally with a low --parallel value I don't see this issue.

I tried an empty force-push to re-run the tests for this PR but got the same error on a different test in that suite, so I'll just leave it as is for now.

@sipsma sipsma requested a review from tonistiigi June 9, 2020 16:19
Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass. No big issues atm.

The biggest issue I have atm is with the "cache-opt" API and how it is split between packages. Should not need a cacheutil package. The generic types can be part of solver (can even use Digest as a key there), and cache package should define the strongly typed implementation side (Provider, Progress definition etc). Using the function to walk ancestors is good but I'm not a fan of using the context for passing anything to cache package. If we pass something with context then it should end where the cache load happens. I think the correct API for cache package is to return a typed error with descriptions about what load opts are needed, then cache loader invokes the ancestor walking to find the opts matching the requested keys and calls again. If context is used to connect cache loading with the ancestor walking I think it is better if context doesn't reimplement Value() but adds the typed walker logic as a single value to the context.

Things I didn't fully investigate yet:

  • Progress of layer pulls does not complete
  • Load metadata and resolve step seem to run twice for the same image(eg. the resolver is not reused). Don't think this is the case atm. Opentracing output of the http requessts should help here.
  • Unpacking happens under the progress of the next vertex.
  • Ideally next vertex is not in running state while unlazy happens(you can leave this one to me as it probably means changing the solver a bit).

@sipsma
Copy link
Collaborator Author

sipsma commented Jun 23, 2020

The biggest issue I have atm is with the "cache-opt" API and how it is split between packages. Should not need a cacheutil package.

Done, now there's just solver/cacheopts.go and cache/opts.go.

Using the function to walk ancestors is good but I'm not a fan of using the context for passing anything to cache package.

Done, I changed to just make the descriptor handler be a RefOption arg and have the APIs return a specific error type, MissingDescHandler, when one needs to be provided by the caller. The caller can use solver.CacheOptGetterOf(ctx) to get a func that will return the cache opt for a given key, which will be setup by the solver to do the ancestor walking.

Progress of layer pulls does not complete

Fixed, there was a bug in providerWithProgress where it would exit early on the last iteration of the loop checking content progress.

Load metadata and resolve step seem to run twice for the same image(eg. the resolver is not reused).

Believe I fixed this. IIUC, basically the issue was that CacheKey needed to call PullManifests to get the manifest/layer digests and Snapshot also needed to call PullManifests because it needed the manifests to exist locally (so it could associate leases on them with the ImmutableRef it returns). However, even though pull.Puller used a sync.Once for the initial resolve step, the fetcher implementation used in its handlers would always go and check the remote before realizing the content exists locally and didn't need to be pulled.

I solved part of it by just having Snapshot re-use the digests obtained during CacheKey rather than always re-call PullManifests. It will check the local content store for each metadata blob and, if it finds any are missing, will repull them only then.

Additionally, because the lease on the manifests pulled during CacheKey expired at the end of that method's execution, those manifests were getting gc'd and when Snapshot got called, the manifests would have to get re-pulled. I realized a similar issue was already being solved in the imageutil package, so I just exposed the slice of lease callbacks it uses to cleanup leases during a Prune to be usable by CacheKey for its lease and then had CacheKey's lease expire after 5 minutes (value taken from imageutil).

Based on the debug logs during test runs, I think this is working now. Let me know if you are okay with this solution and/or if I was misunderstanding the concern.

Unpacking happens under the progress of the next vertex.

Fixed

A couple additional notes:

  • I'm temporarily including the same commits present in this separate PR so that I can run the tests involving containerd. I'll rebase this PR once those are merged
  • After making the changes suggested to the way progress is handled (or maybe after a rebase?), it seems like the issue where lots of empty vertex updates got printed went away, so I no longer have that commit here, it's just the lazy-ref related changes alone.

@sipsma
Copy link
Collaborator Author

sipsma commented Jun 23, 2020

The only failure in Travis appears to just be one of the ephemeral DeadlineExceeded desc = context deadline exceeded errors. Those tests have been passing consistently when run locally, so I'll just leave it as is to be ru-run during next updates.

Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Load metadata and resolve step seem to run twice for the same image(eg. the resolver is not reused).

I think this is still the case based on what I see in progressbar. I can add some debug to figure out what is going on.

Also, sorry for the merge conflicts I caused 😳

cache/opts.go Outdated
started *time.Time
Provider content.Provider
ImageRef string
ProgressWriter progress.Writer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a bit annoyed that we give the ref access to the generic progress.Writer and the VertexDigest, VertexName that are solver specific concepts. Also, not sure how it works with FromRemote as don't see it passed then. I think we should only pass the narrow interface for the actions cache pkg needs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so what do you think about CacheKey just passing some callback funcs used to start/complete the progress?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think define an interface for the methods you need in the cache pkg and then pass in implementation for it when needed.

cache/opts.go Outdated
VertexName string
}

type MissingDescHandler DescHandlerKey
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NeedsRemoteProviderError ?

cache/opts.go Outdated
return fmt.Sprintf("missing descriptor handler for lazy blob %q", digest.Digest(m))
}

func (m MissingDescHandler) FindIn(getter func(interface{}) interface{}) *DescHandler {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should not use interface{}. Worker is ok level, where to do the type cast.

}
ctx = opentracing.ContextWithSpan(progress.WithProgress(ctx, s.st.mpw), s.st.mspan)
ctx = session.NewContext(ctx, s.st.getSessionID())
ctx = withAncestorCacheOpts(ctx, s.st)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recently took sessionID out of context in #1551 so makes it weird to add it in here. Maybe you can even combine it with the Group interface I added for sessions. This can be follow-up as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think this PR reintroduces one of the things fixed in #1551 . Eg. the fetcher created in pull is already connected to a session but that session might change later. Would be better if the real session.Group is added to the request in here so when Provider is accessed it already is connected to updated session. Let me know if this gets too messy for you. We can merge without it and I can carry/fix this later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll take a closer look at #1551 and give integrating with the Group a shot

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, took a look, my understanding is the issue comes down to the fact that we construct a fetcher here from the resolver, at which time credentials will be called with whatever groups are present at the time. That's bad because it could be much later and under a different vtx that the actual de-lazy happens but the provider will still use that possibly stale fetcher.

If that's correct then I guess the fix is, in part, to recreate the fetcher from the resolver when the actual pull occurs rather than use the one created earlier, which is easy enough. My only question is if you think it's okay at that time (i.e. this line) to use the same resolver as before (which should be updated w/ sessions in calls to CacheMap and Snapshot such as here), or do you think we need to make a new resolver using the session.Group from the vtx where the actual de-lazy is occurring (i.e. here or in the exporter)?

Basically, should the de-lazy use the session.Groups from the original vtx where the lazy ref got created, or the session.Group from the vtx where the de-lazy is occurring? Or is it irrelevant (I haven't traced through #1551 carefully enough yet to tell)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only question is if you think it's okay at that time (i.e. this line) to use the same resolver as before

Yes, ideally we should reuse as much as possible. New resolver means new tcp connections and new authentication roundtrips that should be avoided.

Basically, should the de-lazy use the session.Groups from the original vtx where the lazy ref got created, or the session.Group from the vtx where the de-lazy is occurring?

From where the de-lazy is occurring. Old may not be valid anymore.

I added this SessionAuthenticator https://github.com/moby/buildkit/blob/master/source/containerimage/pull.go#L127 concept that allows adding new sessions while reusing the resolver https://github.com/moby/buildkit/blob/master/source/containerimage/pull.go#L167 . Was actually thinking to take this further and write a custom containerd authenticator and resolver pool (limited resolver pool already exists for pull) that can remember tokens and reuse connections cross builds(after checking auth). Something like resolverpool.Get(ref, scope ("pull/push"), g session.Group). (Maybe also resolver.WithSession(session.Group) resolver if that helps). That could help simplify this as well. Maybe I have time to work on this tomorrow.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, for now all I did was make it so the fetcher gets reconstructed at the time of de-lazy.

But it is still using the resolver that is only ever updated w/ session.Groups from calls to CacheKey/Snapshot. So it will not be updated with the session.Groups from the vertex where actual de-lazy is occurring. I'm not sure what the best way to implement that would be as it seems like it would require plumbing the session.Groups for the current vtx all the way through to the provider implementation, where it can be used to update the resolver. I guess we could put the session group in the ctx, but it seems like that was something you were trying to avoid in the previous PR? Let me know what you think or if I'm misunderstanding.

solver/types.go Outdated
"time"

"github.com/containerd/containerd/content"
"github.com/moby/buildkit/cache"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets try to avoid this dependency in the solver package.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I didn't like this either but felt a little better when I realized solver already had a transitive dependency on cache :)

The import is only needed for the Remote struct. I originally wanted to leave Remote in its original place in solver, but Remote is now needed in cache due to the GetRemote method moving there and trying to import solver into cache created an import loop. Moving Remote to cache worked, but what else would you suggest? I guess cache/remotecache seems like an obvious choice, but are you okay with solver having a direct import from that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized solver already had a transitive dependency on cache :)

go list -json ./solver | grep cache on master doesn't show a dependency on cache

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I... must have lost my mind at some point in all of this or am misremembering the issue but yeah you're right. I'll try to switch Remote back to solver

cache/refs.go Outdated
}

func (sr *immutableRef) getProvider(ctx context.Context, dh *DescHandler) (content.Provider, error) {
if isLazy, err := sr.isLazy(ctx); !isLazy {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this function need synchronization against parallel calls?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I previously looked through the underlying implementation of the content store methods it's calling and concluded the synchronization wasn't necessary here, but I'll go through it all again to double-check and get back

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay yeah, I still believe this basically works as is. There's a few relevant pieces of code:

  1. We always use contentutil.Copy when de-lazying, which in turn uses a FetchHandler
  2. The FetchHandler internally opens a content writer with a writer ref consistent for a given oci descriptor. It also uses content.OpenWriter, which internally will check to see if there's already a writer open for that ref (and thus that descriptor) and, if so, wait until that writer is closed before opening it again.
  3. fetch will also always check to see if opening the writer or committing an already completed writer returns an AlreadyExists err, in which case it knows the content was already successfully committed by something else and returns a nil err. Same goes for the content.Copy func it calls internally.

So, given the above, if we check isLazy and it returns true but by the time we actually try to de-lazy it's already been de-lazied by something else, then we should just get a nil error returned by contentutil.Copy and can move on successfully. If something else is in the middle of de-lazying, we'll block until that thing is done de-lazying and, if it succeeded, continue on or if it failed, try de-lazying it again.

The only slightly weird corner case I can think of this is if you loaded the same immutable ref twice but with different desc handlers and then those two were both getting de-lazied at the exact same time, it's possible for you to technically pull it from one remote but then the other cache ref wins the race to setting the Description in the metadata and it will be marked there as having been pulled from a different remote image ref. It seems like the description is mostly just informational and if two instances of the same cache ref are being pulled at the same time it is not really significant which one's image ref ends up in the Description, but let me know if that matters or if there are other cases I'm not thinking of.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid the case where to parallel Unlazy requests make an extra HTTP request to the registry, even if some data is ignored later. Also the locking in content.Writer isn't very well thought out, it doesn't handle cancellation of one request gracefully and I believe doesn't even release the lock until a time-based trigger completes. Usually flightcontrol.Do that is used extensively in this repo is a much better solution.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sure, it should be easy to wrap in a flightcontrol.Do too, will do.

eg, egctx := errgroup.WithContext(ctx)
for _, desc := range remote.Descriptors {
eg.Go(func() error {
return contentutil.Copy(egctx, contentutil.HashOnlyIngester(), remote.Provider, desc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really get this HashOnlyIngester logic. As your comment says just wrap the Applier so that it calls ReaderAt on Provider(or calls Unlazy() etc). This double hashing seems unnecessary and this construction is very hard to understand.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, the point wasn't just to take the hash, it was to read the content in full from remote.Provider which had the (magic) side effect of de-lazying it into the local content store (due to the implementation in the cache package). However, now that you mention "double hashing" I do realize that when we do the magic de-lazy to the local content store, we are already doing a hash-verification, so copying to a HashOnlyIngester is actually unnecessary and I can at minimum change this to copy to an ingester that straight up discards data it reads without even taking the hash. I'll do that if nothing else.

But I also agree this all kind of sucks and the approach I suggested in the comment of just having the Applier use remote.Provider directly would be cleaner. However, I didn't see any way of doing so that didn't involve even more large-scale refactoring on top of the already large change here (happy to be wrong though). The problem is the actual diff.Applier interface only has this method

Apply(ctx context.Context, desc ocispec.Descriptor, mount []mount.Mount, opts ...ApplyOpt) (ocispec.Descriptor, error)

which doesn't include a content.Provider. The Applier implementation appears to internally use the content.Store it's hooked up to much earlier (i.e. here and here) as its content.Provider.

So the options I'm seeing besides what I already have are:

  1. Wrap just the Apply method with something that takes the desc argument, de-lazies it (using a similar technique as now, just w/out the double-hashing you mentioned) and then calls the wrapped Apply method.
    • I have no opposition to this, I just didn't do it originally because it seemed like basically the same as what I'm currently doing but hidden under an Apply wrapper. Totally fine with updating to that if you think it's clearer though.
  2. Change the original Applier to be constructed in the worker implementations with a special content store that can somehow be updated on the fly to use providers for given descriptors as set from the remote.Provider we obtain in the exporter here.
    • I'm not entirely sure how this would work, but the idea is appealing on some level. My main concern is that even if it's possible it would expand this PR way too much, but I could of course be overestimating and/or not seeing a simple path to implementing it.

Let me know if this makes any sense and what you think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap just the Apply method with something that takes the desc argument, de-lazies it (using a similar technique as now, just w/out the double-hashing you mentioned) and then calls the wrapped Apply method.

I think that is the same that I had in mind.

wrappedApplier(applier, remote/provider) Applier implements Apply that when called invokes copy to be made to contentstore, and then just calls the regular applier that does the extraction from contentstore to snapshotter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this turns out problematic for some reason(eg. the current parallelization issue) then lets add explicit method to ref/remote for this and avoid magic constructions with side-effects.

Copy link
Collaborator Author

@sipsma sipsma Jul 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had some more thoughts; just to be super explicit about this since the code has kind of gone back and forth between different approaches here throughout the PR, here's what I'm thinking.

There's 3 places where a de-lazy may need to occur:

  1. During ImmutableRef.Extract()
  2. During a local export/unpack (such as the Applier case we were discussing)
  3. Sometimes during a push; specifically when a lazy ref is going to be pushed to a registry that doesn't contain the blob yet (but not when the registry already contains the blob in one or more of its repos)

There's also 2 general approaches for how the provider in the Remote returned by ImmutableRef.GetRemote() can be implemented:

  1. The current approach w/ magic side-effects where the provider is implemented such that opening a reader for it will result first in the entire descriptor blob to be pulled to the local content store and then a reader for the local content store to be returned.
  2. An alternative, less magical approach where it simply checks to see if the ref is lazy and, if so, set the provider to the one from the remote repo or, if not lazy, sets the local content store as the provider. This means that all callers of GetRemote() will have to check whether the descriptors in the returned remote are available locally before trying to use them.
    • Callers can be helped out with this via a Unlazy (or what I called before EnsureContentExists) method on the Remote or ImmutableRef object

In theory, approach 2 is preferable from the standpoint of being less magical and side-effecty, but there's a fairly significant downside when you consider the de-lazies that occur during a push (place 3 in the list above). Namely, all of the logic that determines during a push whether or not the content will need to exist locally or if pull can be skipped is fairly deep in external code (e.g. one place is here where it's determined if you can do a cross-repo push). I'm guessing we can't modify that code to do the checks/calls required in approach 2, so the only other option would be to put all the logic that determines whether we can skip the local pull of content directly in the push util itself. That seems like it would most likely be a fairly large amount of copy/paste and hardcoding of behavior that should just be handled by the library we're calling out to.

So given that, I'm still inclined for approach 1 (though, like I said, I'll fix the whole double-hashing issue). However, I don't really see any reason why we couldn't do approach 1 but also add a util method to Remote called Unlazy that at least makes the code less confusing to read. I'll use remote.Unlazy in the wrapper around the Applier like we were talking about and also in the extract() code path. Internally, Unlazy will just do contentutil.Copy from the remote provider to an ingester that discards everything (without even checking hash) in order to trigger the magic side effect of de-lazying, but at least it will be wrapped in a more explicit method.

Let me know how that all sounds.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internally Unlazy will call ref.GetRemote().Provider.ReaderAt(ctx,ref.Desc) that will copy from remote provider to the local contentstore?

Almost but not exactly, due to the issues I mentioned before I think we have to leave GetRemote and the provider it returns the same as it currently is, where when the provider gets opened it first copies from the remote provider to the local content store before returning a reader for the local content store (de-lazies as a side-effect of opening the reader).

If it weren't for the case of de-lazies occurring during pushes that I mentioned in the previous comment, I think we could do what you suggest (ref.GetRemote().Provider.ReaderAt(ctx,ref.Desc)) where the Remote returned by GetRemote is just the actual remote provider w/out any de-lazying as a side-effect, but like I said before I just don't know how to make that work with the push handler code that exists outside this repo (and thus doesn't know how to call Unlazy).

So, given that, if Unlazy is going to be a helper function, it would have to do something like contentutil.Copy(ctx, discardIngester, ref.GetRemote.Provider), where discardIngester is just the equivalent of ioutil.Discard but for ingesters.

However, if we make Unlazy be a method on Remote or ImmutableRef, we could probably make it not have to do the separate contentutil.Copy, it could be coded specially to just copy from the remote provider from DescHandler to the local content store. I think this is what I'd prefer since the contentutil.Copy is obviously sort of wasteful even if it's just writing to a discard ingester.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it weren't for the case of de-lazies occurring during pushes that I mentioned in the previous comment, I think we could do what you suggest (ref.GetRemote().Provider.ReaderAt(ctx,ref.Desc)) where the Remote returned by GetRemote is just the actual remote provider w/out any de-lazying as a side-effect,

I'm bit confused about what causes the conflict between push and de-lazy on ref.GetRemote().Provider.ReaderAt . The example you linked https://github.com/containerd/containerd/blob/0f2b15b7af41896190fcb6e7b597be789c11756b/remotes/docker/pusher.go#L129 directly works with descriptor and I don't see it requiring local data. So I don't think it would ever call ReaderAt if descriptor exists or can be cross-pushed. It only calls it if it needs local data and de-lazy is required then anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, what I'm saying is that there are times where de-lazy is required for push and there are times where it isn't required for push. However, the determination of whether it's needed happens deep in external code. So, for buildkit's code, if we don't have the "de-lazy as a side-effect" approach, then we would be forced to just always de-lazy before a push because we don't know ahead of time whether it needs to be present locally. But if we did that we'd be breaking some of the crucial features of lazy refs, namely that you don't have to de-lazy when just re-tagging or doing a cross-repo push.

Of course the other option would be to have buildkit's code do the determination of whether a de-lazy is needed. I was ruling that out for now because it seems like that functionality (such as that cross-repo logic I linked to) isn't public in containerd push handler code and is fairly far from trivial. Let me know though if there's some easier way of determining it, that would obviously change this consideration.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this more on Slack, we were actually in agreement, I was just confused about one point. We’ll go with the side effect reader and Unlazy will use that internally, but it will just open the reader to trigger the side effect rather than copy to a discard ingester.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with this change. I did make one superficial tweak to it, namely, instead of having Unlazy be a util func, I made an Unlazier interface and then callers can check to see if a remote's provider implements that interface, calling it if so. The only small improvement over Unlazy as a util func is that you can avoid having to open and immediately close a reader to the local content store, otherwise it's basically the same in my mind. It's a fairly minor change so if you disagree w/ the approach just let me know and I have zero issue switching to Unlazy as a util func.

}

if len(p.manifest.Remote.Descriptors) > 0 {
topDesc := p.manifest.Remote.Descriptors[len(p.manifest.Remote.Descriptors)-1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaics, this should happen for all layers. Also, a blob digest doesn't need to have the same parents so it doesn't look correct to assume refs parents come from the same remote (if only top digest is tracked).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking here (which I should probably add a comment for if nothing else) is that while it's true a parent desc could be in the chain of multiple refs and thus come from multiple remotes, it actually won't matter.

Say here you have a chain of descriptors from parent->child of A->B->C (topDesc is thus C), but in a different image you end up with just the chain A->B (where topDesc is thus B) and use an entirely different remote (and progress writer, etc.). If you end up creating lazy-refs for both these desc chains, say then that the A->B chain gets de-lazied first.

When you later get around to de-lazying the A->B->C chain, the implementation in cache realizes that A->B was already de-lazied by something else and should skip de-lazying those descriptors. It will end up just de-lazying what it needs to, C in this case. It thus shouldn't matter that there were some descriptors that had different remotes in different refs.

In addition, for a single ref, all the descriptors in the chain for that ref are going to have the same DescHandler values, so setting the DescHandler for each in the chain seemed unnecessary.

That's the theory anyways; you asked in some other comments about synchronization issues which I need to go look deeper at as obviously that's important to get right for all the above to work. Let me know if this makes sense or if I'm missing something.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what if you have A->B->C and A->D->C competing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot that the digest of a layer descriptor doesn't include the digest of its parent (this is all still settling in my brain), but I actually still feel like it would still work. If A->B->C and A->D->C are getting de-lazied at the same, each of the individual descriptors will still get de-lazied just once (again, assuming the locking/synchronization is done correctly). It's possible to end up with a mix of like A and C got de-lazied using remote 1 and B got de-lazied using remote 2, but at the end of the day they are the same bytes no matter the remote and end up in the local content store, which is what matters, right? What am I missing still?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A and C got de-lazied using remote 1 and B got de-lazied using remote 2, but at the end of the day they are the same bytes

I'm not following on the "but at the end of the day they are the same bytes". The case I'm thinking is when there are 2 separate lazy refs (different bytes) that have the same top layer. Afaics when they both get unlazy request only one deschandler is used. So ref that was based on A->B->C but is pulled with a deschandler for A->D->C there could be a problem.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, makes total sense, I was still somehow getting hung up on the layer digest not including its parent digest, but I get it now. I'll change this to set the DescHandler for each desc in the chain, thanks for bearing with me

@tonistiigi tonistiigi requested a review from hinshun July 9, 2020 05:18
@tonistiigi
Copy link
Member

Another thing I noticed is that layers are pulled one-by-one now, not in parallel like before.

@sipsma
Copy link
Collaborator Author

sipsma commented Jul 9, 2020

I think this is still the case based on what I see in progressbar. I can add some debug to figure out what is going on.
Also, sorry for the merge conflicts I caused 😳

Cool, let me know what you're seeing with the progressbar if I can reproduce it. And no worries, this is a large enough PR that's it's kinda hard to miss :)

Another thing I noticed is that layers are pulled one-by-one now, not in parallel like before.

Interesting, besides the integ tests I've been running a simple test case to try out the progress bars and it looks like a two-layer image shows the layers at least starting in parallel:

#1 sha256:23c364211cc2e57080667a64c534d7879beb04870ff0f418035dbd1c189c6cd2
#1 sha256:b110383df6daaed214b2c02afd8ac92f998bdd738c9245c9e9b49be32217fcef 0B / 351B 0.2s
#1 sha256:8f21022a30c69f37857ac14ce5ee519b1017e8446af941a6899a88f368238b2c 0B / 135.84MB 0.2s
#1 sha256:b110383df6daaed214b2c02afd8ac92f998bdd738c9245c9e9b49be32217fcef 351B / 351B 0.6s done
#1 sha256:8f21022a30c69f37857ac14ce5ee519b1017e8446af941a6899a88f368238b2c 8.39MB / 135.84MB 1.1s
#1 sha256:8f21022a30c69f37857ac14ce5ee519b1017e8446af941a6899a88f368238b2c 16.78MB / 135.84MB 1.8s
...

but maybe that's what you're seeing and they actually always download 1-by-1? I'll try with a more normal image w/ more layers to see if I can reproduce. In theory the layer pulls should be in parallel due to the waitgroup here, but something could be going wrong

@tonistiigi
Copy link
Member

Interesting, besides the integ tests I've been running a simple test case to try out the progress bars and it looks like a two-layer image shows the layers at least starting in parallel:

I think the case I was testing was when it was pulling while exporting. Didn't realize it was a different code path.

Also, cross repo push didn't seem to work for me(where it can move layers between repositories without pulling). Do you know if annotations get lost for the descriptors? I can look more into this myself. Not a blocker for PR.

@sipsma
Copy link
Collaborator Author

sipsma commented Jul 9, 2020

Also, cross repo push didn't seem to work for me(where it can move layers between repositories without pulling). Do you know if annotations get lost for the descriptors? I can look more into this myself. Not a blocker for PR.

Looked into this, think the issue that the push util uses the local content store to find any distribution source annotations. Obviously if the ref is lazy this will not work.

What can be done is somewhere around this line we could use the desc handler's ImageRef field to manually set the distribution source annotation on the descriptor when it is lazy (and the local content store when it's not lazy) and then change the push util to accept a map of desc digest -> distribution source list, which it will use instead of the local content store. The only problem I'm seeing right now is that the functions used by containerd to create the value of the distribution source key are private except for AppendDistributionSourceLabel which only takes a content.Manager and ref string as args. So the options I'm seeing are:

  1. copy/paste the private code from containerd for constructing the distribution source value string
  2. create a mocked out content.Manager that allows us to get the distribution source set by the image handler returned from AppendDistributionSourceLabel
  3. Modify containerd upstream to expose the functionality we need without forcing us to update a real content.Manager

3 above feels like the only good option, but let me know if I'm missing anything else obvious. Either way I will concentrate first on fixing the rest of the issues in this PR and leave this one for a follow up if there's not any straightforward immediate fixes.

@tonistiigi
Copy link
Member

3 above feels like the only good option, but let me know

Actually, not much of AppendDistributionSourceLabel seems reusable for this as it only seems to work on the Labels that is the local contentstore concept. In lazy case I think we just need to write the source annotation using the https://github.com/containerd/containerd/blob/40b17e97f6fdc8e8f158d25aaa7192bceff27264/remotes/docker/handler.go#L111 formatting. AppendDistributionSourceLabel might be reusable when unlazy happens and the descriptor is pulled to contentstore. I guess then we need to convert annotation to a label and might need this logic that can keep a list of sources in the label. Once we get the push we should only use the annotation again, and in GetRemote() it can create the annotation from label if the data is actually in contentstore (and this value doesn't already exists in memory).

@sipsma
Copy link
Collaborator Author

sipsma commented Jul 22, 2020

Most recent update was just a rebase on main, no actual changes.

@tonistiigi
Copy link
Member

tonistiigi commented Jul 29, 2020

Comparing the output against the old one it looks like:

  • Schema1 images do not show progress on pulling layers(only show on extraction)
  • Extraction progress is now shown per layer while before it was per extraction (as this step is sequential). If it is complicated it is not a big deal but if it is simple then I prefer the old output.

Edit: btw, it is not a big issue if schema1 images are just not lazy. Schema1 is not a priority so whatever is easiest works in here.

@sipsma
Copy link
Collaborator Author

sipsma commented Jul 31, 2020

Schema1 images do not show progress on pulling layers(only show on extraction)

Good catch, after looking at the schema1 converter code in more depth I saw that it's basically internally hardcoded to do a pull of the layers and relies on those being present in order to do the conversion. So, I went with your suggestion of just letting schema1 not be lazy for now. I did however update the code to show the progress on pull for that case though.

The tty progress for schema1 looks good, but I did notice that occasionally in the plain progress for schema1 there will be final layer updates after resolved shows as done, i.e.

#4 [1/2] FROM gcr.io/google_containers/pause:3.0@sha256:0d093c962a6c2dd8bb8...
#4 resolve gcr.io/google_containers/pause:3.0@sha256:0d093c962a6c2dd8bb8727b661e2b5f13e9df884af9945b4cc7088d9350cd3ee
#4 sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 32B / 32B 0.0s done
#4 resolve gcr.io/google_containers/pause:3.0@sha256:0d093c962a6c2dd8bb8727b661e2b5f13e9df884af9945b4cc7088d9350cd3ee 0.5s done
#4 sha256:f112334343777b75be77ec1f835e3bbbe7d7bd46e27b6a2ae35c6b3cfea0987c 311.20kB / 311.20kB 0.1s done
#4 DONE 0.0s

I verified that in that case the layer status is actually getting written to the progress stream before the resolve status is getting written, so it's not a synchronization issue on that level. My only guess is that the status updates are getting batched together and then written to the progress output in an arbitrary order? Let me know if you think this matters.

Extraction progress is now shown per layer while before it was per extraction (as this step is sequential). If it is complicated it is not a big deal but if it is simple then I prefer the old output.

Updated to per-extraction, it's a very easy thing to switch. The only downside I see with per-extraction relative to per-layer is that now because the de-lazy happens within the context of extraction, the time the extract takes includes the time de-lazying the whole image takes, i.e.

[+] Building 40.6s (5/5) FINISHED
 => [internal] load build definition from Dockerfile                                                                       0.1s
 => => transferring dockerfile: 72B                                                                                        0.0s
 => [internal] load .dockerignore                                                                                          0.1s
 => => transferring context: 2B                                                                                            0.0s
 => [internal] load metadata for docker.io/library/python:3.7-buster                                                       2.6s
 => [1/2] FROM docker.io/library/python:3.7-buster@sha256:291dbf30d0466dc6c6cd9d246b12b87ddf9df953d1b7616e156aa0c42db740  35.8s
 => => resolve docker.io/library/python:3.7-buster@sha256:291dbf30d0466dc6c6cd9d246b12b87ddf9df953d1b7616e156aa0c42db740f  1.0s
 => => unpacking docker.io/library/python:3.7-buster@sha256:291dbf30d0466dc6c6cd9d246b12b87ddf9df953d1b7616e156aa0c42db7  35.8s
 => => sha256:af0a6ea54c94e0a2b966b6836a18654b25eef8c06a13296a88a515ca8194814e 2.11MB / 2.11MB                             1.2s
 => => sha256:3410ca7853e637c55c6095b6e6ee4816c97abe23415c6c94326179e731d2c904 233B / 233B                                 1.0s
 => => sha256:56c9d8f3837df21441644b185fa051fd32c878f00fd417e05947a2e6bbae682b 26.70MB / 26.70MB                          12.4s
 => => sha256:3b0a7e6f20fbc38a7f19b3d07aa64262789970e984f390815f867ec783e4952d 192.22MB / 192.22MB                        31.1s
 => => sha256:75d93399dc4ef3cef040a53ed8a75e1108a6953b5c0b3549906141cb1dcaad05 6.15MB / 6.15MB                             4.5s
 => => sha256:ac34a4d7c330794ec24a76e7e58b50d4c8f6a2fc77ca958d83092c8962e7d6d7 51.83MB / 51.83MB                          21.2s
 => => sha256:bcd57146431eae70720cf24877e818256ae1a30b9c1c9e7d0ad093c945ca0af2 10.00MB / 10.00MB                           6.4s
 => => sha256:3ed641c4ae9821ca3c399071ea82ae667237acbcaad1e367c3e1e87fc967834c 7.81MB / 7.81MB                             5.4s
 => => sha256:31dd5ebca5efc5e96a425402fa85e492b02c8fe757dfd3edfdea2a7c67322909 50.39MB / 50.39MB                          20.8s

I don't have strong feelings either way though.

Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only downside I see with per-extraction relative to per-layer is that now because the de-lazy happens within the context of extraction, the time the extract takes includes the time de-lazying the whole image takes, i.e

I think this might be confusing. I think better to revert it then. Or split unlazy pull and extract although I guess new one is slightly more performant. We can make a change in client side to sum the durations of separate items with same array and reintroduce the single line unpack then. Then we could start, stop and continue the unpack progress.

LGTM

@thaJeztah Not sure how to merge this so it doesn't cause trouble for the moby release. I think porting this will take somewhat significant work and additional changes. We might need to make a special (temp) branch for 20.x if we need some fixes because I don't want to cut v0.8 without this PR as well.

})
Digest: desc.Digest,
}
if platform != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is bit confusing to me. Don't think there is a case where the platform is allowed to be nil.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to have platform not be a pointer

@tonistiigi
Copy link
Member

ping @hinshun

@tonistiigi
Copy link
Member

I also tested this in combination with #1622 , the extra requests I saw before were gone after that so these issues were not related to this PR.

@sipsma
Copy link
Collaborator Author

sipsma commented Aug 3, 2020

I think this might be confusing. I think better to revert it then. Or split unlazy pull and extract although I guess new one is slightly more performant. We can make a change in client side to sum the durations of separate items with same array and reintroduce the single line unpack then. Then we could start, stop and continue the unpack progress.

Sounds good, for now I just reverted to what I had before with per-layer extraction status.

LGTM

Thanks for your guidance throughout this whole PR, appreciate it! Let me know if there are any other follow-ups I can create issues for and/or look into. Right now the ones I'm aware of are 1) improved integration w/ session.Group and 2) the previously mentioned changes to progress.

@thaJeztah
Copy link
Member

@thaJeztah Not sure how to merge this so it doesn't cause trouble for the moby release. I think porting this will take somewhat significant work and additional changes. We might need to make a special (temp) branch for 20.x if we need some fixes because I don't want to cut v0.8 without this PR as well.

Is this the part that was reverted, or is this still a concern?

Copy link
Collaborator

@hinshun hinshun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks great! I just have some nits.

cache/blobs.go Outdated
var currentDescr ocispec.Descriptor
if sr.parent != nil {
eg.Go(func() error {
err := computeBlobChain(ctx, sr.parent, createIfNeeded, compressionType)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, this can just be return computeBlobChain right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

cache/blobs.go Outdated
}

var descr ocispec.Descriptor
var err error
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, can you move these two closer to CompareWithParent?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

cache/blobs.go Outdated
if descr.Digest == "" {
// reference needs to be committed
var lower []mount.Mount
var release func() error
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be overwritten by line 110.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's a bit confusing, I moved the declaration of release as a var to inside the if parent != nil below to make it a bit more clear

}

// setBlob associates a blob with the cache record.
// A lease must be held for the blob when calling this function
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do a leases.FromContext(ctx) check in this function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure may as well, added

blobChainID = imagespecidentity.ChainID([]digest.Digest{pInfo.BlobChainID, blobChainID})
p2, err := cm.Get(ctx, parent.ID(), NoUpdateLastUsed)

p2, err := cm.Get(ctx, parent.ID(), NoUpdateLastUsed, descHandlers)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could descHandlers be nil at this point? Looking at line 113, it can be nil if desc.Digest is found in the content store. Is a nil option okay here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache/refs.go Outdated
} else if err != nil {
return false, err
}
return false, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, this can be simplified to return false, err

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

cache/refs.go Outdated
createdAt := GetCreatedAt(sr.md)
if !createdAt.IsZero() {
if createdAt, err := createdAt.MarshalText(); err == nil {
desc.Annotations["buildkit/createdat"] = string(createdAt)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, seems more intuitive to check err != nil and return ocispec.Descriptor{}, err and then remove the else block to set the annotation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, updated

if err != nil {
return nil, err
}
dh := dhs[desc.Digest]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can dhs be nil here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but similar to before, you can read from a nil map (just always get a zero'd value), which gives us the expected behavior in this case.

}

func (p *puller) CacheKey(ctx context.Context, g session.Group, index int) (string, bool, error) {
func (p *puller) CacheKey(ctx context.Context, g session.Group, index int) (_ string, cacheOpts solver.CacheOpts, _ bool, err error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should names these just so we can tell what the return values mean.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
if desc.Digest != "" {

if desc.Digest != "" && func() error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty rough to read, can we extract this into a function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


descHandler := &cache.DescHandler{
Provider: p.manifest.Remote.Provider,
ImageRef: p.manifest.Ref,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was looking at this for some resolver changes with session.Group and wondered if we really need ImageRef at all. Afaics this is used in 2 places, cross-repo annotation and unlazy description message. For both, I think we should just pass them with GetByBlob and keep in metadata?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original concern with that was the case where there's multiple images identical in terms of content but using different image refs. In theory, we could store a list of ImageRefs for the blob in the metadata, which I think would work fine with the cross-repo stuff since you can add multiple distribution source annotations.

The only part I'm not sure about is the unlazy description message; if we have a list of ImageRefs we wouldn't know which one is actually associated with the provider, so if we just chose one it might not actually be the repo we're really pulling it from. Do you think that matters? AFAIK that description message isn't important for functionality, though it would be odd for it to say it pulled the ref from a place it didn't actually do the pull from.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think that matters?

I don't think it does. The same behavior already exists today. If you pull the same image from 2 refs, one of them will win and another will be skipped as already exists. When you are pulling you can't control what side your current pull will land.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay cool, I made the update to remove ImageRef as a separate commit because it was a little more involved than I originally thought, let me know what you think.

sipsma added 4 commits August 5, 2020 17:18
This allows the layers of images to only be pulled if/once they are actually
required.

Signed-off-by: Erik Sipsma <erik@sipsma.dev>
Signed-off-by: Erik Sipsma <erik@sipsma.dev>
Signed-off-by: Erik Sipsma <erik@sipsma.dev>
Signed-off-by: Erik Sipsma <erik@sipsma.dev>
Copy link
Collaborator

@hinshun hinshun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Well done 👏

@tonistiigi tonistiigi merged commit 545532a into moby:master Aug 6, 2020
@tonistiigi
Copy link
Member

🎉

@sipsma
Copy link
Collaborator Author

sipsma commented Aug 6, 2020

🎉

Looks good to me. Well done 👏

Thanks again! Appreciate all the reviews and glad I could help out with it. Gonna start taking a look at #1431 now...

@thaJeztah
Copy link
Member

Thanks @sipsma!

@tonistiigi Demo'd this feature Yesterday, and its really cool

@abdennour
Copy link

where is the docs of this feature ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

llbsolver: change references to evaluate lazily

5 participants