compute image's shared size by ndeloof · Pull Request #17 · rumpl/moby

ndeloof · 2022-07-11T15:44:53Z

- What I did
compute "shared size" by counting snapshots usage while listing images
to be addressed: I'm not sure how this should interact with filters: shall we count as "shared" some layers that are excluded, based on selected filters, or are those metrics "within the scope of the filter" ?

daemon/containerd/service.go

thaJeztah · 2022-07-12T14:15:15Z

daemon/containerd/service.go

+			}
+		}
+
 		size, err := image.Size(ctx)


Question (as I'm not 100% sure); how does virtualSize and image.Size() differ?

I know in the past (before docker 1.10), virtualSize was needed to calculate the total size of the image based on image + parent image + parent image (etc), but since 1.10, it's just the sum of layers. And (IIRC) in moby we just set both virtualSize and size to the same value.

What does image.Size() return? Isn't that already the same as virtualSize?

based on https://github.com/containerd/nerdctl/blob/b140bc1c7ace3f7c60fbbeae9dc16ec95687f4a1/cmd/nerdctl/images.go#L47 (API docs don't really help here)
Size returns the size of (compressed) blobs, i.e. image data transfert size
VirtualSize we compute here returns the size of uncompressed data (not considering some layers being shared between images)

Ah, gotcha. Hmm.. yes, so we don't have a concept of compressed size in "old" moby, so yup.

(Perhaps something we could add in containerd as well, to have options for both compressed and uncompressed size of an image)

Thanks for explaining!

thaJeztah · 2022-07-12T14:17:12Z

I'm not sure how this should interact with filters: shall we count as "shared" some layers that are excluded, based on selected filters, or are those metrics "within the scope of the filter" ?

I think the filters should only affect "what's shown". If an image shares layers with another image that's not shown, it's still a shared layer (?), so docker image ls --filter reference=ubuntu should still show the same size for ubuntu images as without the filter.

thaJeztah · 2022-07-12T15:40:49Z

daemon/containerd/service.go

+		if layers[chainID] == 1 {
+			continue
+		}
+		usage, err := snapshotter.Usage(ctx, chainID.String())


Last question (probably ok for a follow-up); is this a heavy operation? Asking because it looks like we potentially call this multiple times for the same chainID (i.e., especially if some layers are shared many times).

If that's the case we could look at (temporarily) storing the results in a map and/or use sync/singleflight

I have no idea, ctrd API is a black box to me

From the docs of this function:

The running time of this call for active snapshots is dependent on implementation, but may be proportional to the size of the resource. Callers should take this into consideration.

I also found some quite recent blog post mentioning that btrfs snapshotter causes high CPU usage due to the regular collection of disk usage, which probably also uses the same code path as snapshotter.Usage - https://blog.cubieserver.de/2022/dont-use-containerd-with-the-btrfs-snapshotter/

So it probably would make sense to cache this because it can be slow.

Thanks! I didn't dig deep yet before I wrote that comment, but saw that this function is part of the interface that's implemented by all snapshotters.

And based on @vvoland's comment, I fear that means "traversing the filesystem for all files in a snapshot", which can (depending on what's in each layer) a lot.

So, yes, we should at least cache it temporarily (docker image ls --size), to prevent calculating the size multiple times in a single invocation (that one should be relatively easy with a map[id]size), but possibly even consider storing it permanently (as layers should be immutable, so their size should never change). Not sure where to store it, and how to make sure that storage is purged when the layer is removed though.

I'm a bit fuzzy how we handled this in the existing (non-snapshotter) image/layer store, but perhaps we can have a look how we did it there.

thaJeztah · 2022-07-13T13:01:38Z

daemon/containerd/service.go

+type snapshotSizeFn func(ctx context.Context, d digest.Digest) (int64, error)
+
+type snapshotSizer struct {
+	snapshotter snapshots.Snapshotter
+	cache       map[digest.Digest]int64
+}
+
+func (s *snapshotSizer) size(ctx context.Context, d digest.Digest) (int64, error) {
+	if s, ok := s.cache[d]; ok {
+		return s, nil
+	}
+	usage, err := snapshotter.Usage(ctx, d.String())
+	if err != nil {
+		return 0, err
+	}
+	sizeCache[d] = usage.Size
+	return usage.Size, nil
+}
+


I think this is unused

yes, pushed by mistake while refactoring :P

I thought that was the case; I started looking at that code, wondering if / when we would reset the cache, and then realised "oh! but I don't think it's used" 😂

Signed-off-by: Nicolas De Loof <nicolas.deloof@gmail.com>

thaJeztah · 2022-07-13T13:26:30Z

daemon/containerd/service.go

+		if err != nil {
+			return 0, err
+		}
+		sizeCache[d] = usage.Size


👍 I think this should do "for now". We may want to create a ticket to optimise this more in-depth (concurrent use, perhaps contribute to containerd to persist this information as metadata, and make it generally available etc), but I don't think that's a blocker for the PoC.

thaJeztah

LGTM (assuming green), thanks!

ndeloof requested review from rumpl, thaJeztah and vvoland July 11, 2022 15:44

rumpl reviewed Jul 12, 2022

View reviewed changes

daemon/containerd/service.go Outdated Show resolved Hide resolved

rumpl reviewed Jul 12, 2022

View reviewed changes

daemon/containerd/service.go Outdated Show resolved Hide resolved

ndeloof force-pushed the sharedSize branch from a9de465 to 9ab61d1 Compare July 12, 2022 08:56

rumpl approved these changes Jul 12, 2022

View reviewed changes

vvoland approved these changes Jul 12, 2022

View reviewed changes

thaJeztah reviewed Jul 12, 2022

View reviewed changes

daemon/containerd/service.go Outdated Show resolved Hide resolved

ndeloof force-pushed the sharedSize branch from 9ab61d1 to 6896f87 Compare July 12, 2022 13:55

thaJeztah reviewed Jul 12, 2022

View reviewed changes

ndeloof force-pushed the sharedSize branch from 6896f87 to 82726f8 Compare July 13, 2022 12:54

thaJeztah reviewed Jul 13, 2022

View reviewed changes

compute image's shared size

ba19bfd

Signed-off-by: Nicolas De Loof <nicolas.deloof@gmail.com>

ndeloof force-pushed the sharedSize branch from 82726f8 to ba19bfd Compare July 13, 2022 13:05

thaJeztah reviewed Jul 13, 2022

View reviewed changes

thaJeztah approved these changes Jul 13, 2022

View reviewed changes

rumpl merged commit 17794da into rumpl:master Jul 18, 2022

This was referenced Jul 18, 2022

containerd integration: compute virtualsize moby/moby#43815

Merged

containerd integration: compute image's shared size moby/moby#43833

Merged

rumpl added the to-be-upstreamed label Aug 30, 2022

rumpl added upstreamed and removed to-be-upstreamed labels Jan 12, 2023

Conversation

ndeloof commented Jul 11, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thaJeztah commented Jul 12, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thaJeztah left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants