Skip to content
This repository was archived by the owner on Sep 12, 2018. It is now read-only.

Refcount images and cleanup orphans#409

Closed
wking wants to merge 9 commits intodocker-archive:masterfrom
wking:refcount-images
Closed

Refcount images and cleanup orphans#409
wking wants to merge 9 commits intodocker-archive:masterfrom
wking:refcount-images

Conversation

@wking
Copy link
Copy Markdown
Contributor

@wking wking commented Jun 4, 2014

Also add versioning to the storage backend, and lay some groundwork
for a more opaque storage driver. It shouldn't take too much more
work to factor out the file-based assumptions out of driver.Base, but
I've left that for later work. See the commit messages for details on
specific changes. And let Travis complete before merging, since I
haven't even run this locally yet ;). I'll try and get on tomorrow
and link this to some earlier discussion, but I don't have time to
look up the references now ;).

@samalba
Copy link
Copy Markdown
Contributor

samalba commented Jun 4, 2014

I think we should have that completely optional. The registry should work without it. On a large dataset, if a refcount is not updated and the refcount index gets corrupted, what are the options to rebuild it? I think it's suitable only for small dataset by design.

@wking
Copy link
Copy Markdown
Contributor Author

wking commented Jun 4, 2014

On Wed, Jun 04, 2014 at 08:27:01AM -0700, Sam Alba wrote:

I think we should have that completely optional. The registry should
work without it.

Can do. However, the only use cases I can think of where you'd want to
turn it off are:

  • If you have external tags pointing to images, which would not be
    accounted for in the registry's internal refcounting.
  • If you wanted to archive all the images you'd ever seen, but not the
    tags that referenced them.

Neither is particularly convincing to me, but there's no accounting
for taste ;).

On a large dataset, if a refcount is not updated and the refcount
index gets corrupted, what are the options to rebuild it?

As in “docker-registry crashes while/before writing the references
files”? I see two cases there:

  • Crashed during a write. When restarted, this can be detected by
    invalid JSON in the references file, and we'd have to walk the whole
    repositories/tags tree to see if any of those files depended on our
    corrupted image. Slow, but that's ok for recovering from such an
    unlikely situation.
  • Crashed before descending to a particular image in the ancestry
    stack. I think this is also unlikely, and it would be marked by the
    complete lack of a references file. Same recovery as for “crashed
    during a write”.

Neither case is currently handled in this PR, but I can add the code
to do so. Are there other cases I should consider?

I think it's suitable only for small dataset by design.

Why is that? Is the problem the long list of references for
commonly-used base images. For example, I imagine index.docker.io has
quite a number of references to the various ‘ubuntu’ images.

In any case, how you implement refcounting is a storage-side decision.
So long as the add_references and and remove_references endpoints are
there (even if they're null-ops), the core code doesn't have to care
about how you want to handle this.

@samalba
Copy link
Copy Markdown
Contributor

samalba commented Jun 4, 2014

  • Crashed during a write. When restarted, this can be detected by
    invalid JSON in the references file, and we'd have to walk the whole
    repositories/tags tree to see if any of those files depended on our
    corrupted image. Slow, but that's ok for recovering from such an
    unlikely situation.

To rephrase what you explain, the operation is not atomic. Walking the whole dataset means it's not suitable for big dataset.

@wking
Copy link
Copy Markdown
Contributor Author

wking commented Jun 4, 2014

On Wed, Jun 04, 2014 at 10:07:53AM -0700, Sam Alba wrote:

  • Crashed during a write. When restarted, this can be detected by
    invalid JSON in the references file, and we'd have to walk the whole
    repositories/tags tree to see if any of those files depended on our
    corrupted image. Slow, but that's ok for recovering from such an
    unlikely situation.

To rephrase what you explain, the operation is not atomic. Walking
the whole dataset means it's not suitable for big dataset.

To make things atomic, I'd suggest initially staging images (and their
refcounts) in a tmp/ directory and moving them into their true
locations after they were full (assuming your backing store supports
atomic moves). I'd do this for all of the registry's file
image/repository alterations, not just refcounting. What happens
currently if you crash with a half-written image ID in a tag file?

Even with atomic file-writes, you'd need some way to distinguish
between provisional refcounts (i.e. claimed by a tag that we're in the
process of writing) that were still provisional because they were
still in progress, and those that were still provisional because we
crashed while writing them (so they were aborted). In that case, I
think the logic would be:

def put_tag(namespace, repository, tag):
… extract data from request …
… write marker for tag-put-in-progress …
store.add_provisional_references(…)
… put all repository-side tag content …
store.add_references(
image_id=data, namespace=namespace, repository=repository, tag=tag)
… remove tag-put-in-progress marker …
return toolkit.response()

When touching the references files, we'd check for the provisional
references file. If we found the provisional referencing tags, we'd
add them to the permanent list. If we found an active
tag-put-in-progress marker, we'd leave them in the provisional
references list, but treat them as true references (to avoid garbage
collecting images while the put was in progress). If we found an
inactive tag-put-in-progress marker, we'd start a background job
cleaning up the ancestry chain of the failed provisional reference.
For local storage, the tag-put-in-progress marker storage could be
implemented with flock. For other backends, I'm not sure what the
best approach would be.

How does that sound?

Having a transactional storage backend would make this much easier ;).

wking added 9 commits June 4, 2014 17:59
This way we can access FileNotFoundError too, without the flake8 "H301
one import per line" raised by something like:

  from .exceptions import FileNotFoundError, NotImplementedError

I'm not sure why we implement our own NotImplementedError when Python
already has one built in, but that's probably off topic :p.
From Index._walk_storage.  It's useful for creating the initial index,
and also for the new storage upgrades.
Mirroring Base.get_registries.  Both methods are useful for
abstracting away the current nested filesystem implementation, which
will hopefully not be baked into Base for much longer.
From docker_registry.tags.get_tags.  Centralizing another abstraction
method helps hide the current nested filesystem implementation from
the coming upgrade methods.
Besides factoring out a bunch of repetitive

  {self.images}/{image_id}

construction, this gives us a path to recursively remove when we want
to delete an image.  Modeled after ab3805a (storage: Add
Storage.repository_path, 2014-02-12).
This is more explicit, and avoids getting clobbered by the later:

  data = create_tag_json(user_agent=ua)
Before this commit there was no way to remove images that were no
longer needed by any tags.  This commit adds a persistent list of
references for each image, with both the depending image id and the
namespace, repository, and tag.  When we delete a tag, we traverse
it's ancestors removing any obsolete references and any images that
are no longer referenced.

This is more complicated than just having a refcount number, but also
more resilient.  For example, someone will eventually have a power
outage after removing a tag while the refcounts were still being
adjusted.  By keeping the references themselves in a list, it's easy
to check for stale references and remove them.  If it turns out that
the removal checks are too slow, it would be easy to make them more
optimistic (e.g. passing the descendant image and tags that we expect
to remove down the _remove_references stack and removing them instead
of checking all listed references for all ancestors).

The JSON serialization for the [namespace, repository, tag] key allows
us to use JSON to serialize the resulting dictionary (JSON only
supports string keys).  It's a bit awkward, but I wanted to make sure
a single tag was only represented once (with whatever its current
image was) in the references list.
The upgrade from storage v0.0 to storage v0.1 just adds tag references
and cleans up unreferenced images.  I think all the storage subclasses
are too tightly bound to the file-based structure layed out in
driver.Base.  It would be nice is we exposed an API closer to the web
API (add a tag, get a tag, remove a tag, add a layer, get a layer,
remove a layer, ...), and left the internal details completely up to
the driver.  Of course, the file-like handling the the current Base
class could be a FileStorage utility for easy subclassing, but it
feels wrong to bake it into Base.  If we do pull it out, we'd want to
version both the Base API used by storage consumers.  We'd leave it up
to the implementations (like FileStorage) to version the backing data
store they used to provide the Base API:

  registry core <--(Base API)--> storage driver <--> backing store

Base.version and the version checks in Base._check_version allow the
storage driver to manage the backing store, and are independent of the
Base API used by the core.

In any case, the data in the backing store persists between software
upgrades, so we need *some* way to manage it as we polish the
persistent format.
This is part of the internal backing-store interface, and isn't part
of the public Base API.
@wking
Copy link
Copy Markdown
Contributor Author

wking commented Jun 5, 2014

After cleaning up some silly typos, the unit tests are passing again
:p.

We still need to reach a consensus on where this should live (I can
pull it out into a refcounting class (CautiousRefcounter?) that
subclasses Base and is in turn subclassed by the in-tree storage
backends. That way other storage backends have a clean slate by
subclassing Base, or can use my implementation by subclassing
CautiousRefcounter, or can skip ref-counting entirely by subclassing a
NoopRefcounter. How does that sound?

@sirupsen
Copy link
Copy Markdown

Would love to see this in. What's holding it back?

@dmp42
Copy link
Copy Markdown
Contributor

dmp42 commented Jun 25, 2014

Mainly:

  • It's a complex change not fitting well with the short term milestones
  • this is not going to be used by the official docker registry itself (see discussion about the existing dataset) unless there is a practical solution to handle the existing (growing) volume of images
  • or it should be made optional

Either way, if you want to try it, I guess nothing prevents you from giving it a spin from @wking branch.

@wking
Copy link
Copy Markdown
Contributor Author

wking commented Jun 25, 2014

On Wed, Jun 25, 2014 at 08:57:25AM -0700, Olivier Gambier wrote:

  • or it should be made optional

Optional shouldn't be too hard, especially now that option parsing is
a bit better. Any feedback on this structure 1, or should I stay
closer to what I already have in this branch?

@wking wking mentioned this pull request Jul 30, 2014
@wking
Copy link
Copy Markdown
Contributor Author

wking commented Aug 12, 2014

On Wed, Jun 25, 2014 at 11:26:14AM -0700, W. Trevor King wrote:

Wed, Jun 25, 2014 at 08:57:25AM -0700, Olivier Gambier:

  • or it should be made optional

Optional shouldn't be too hard, especially now that option parsing is
a bit better. Any feedback on this structure 1, or should I stay
closer to what I already have in this branch?

Ping.

@wking
Copy link
Copy Markdown
Contributor Author

wking commented Sep 6, 2014

On Tue, Aug 12, 2014 at 09:28:21AM -0700, W. Trevor King wrote:

Wed, Jun 25, 2014 at 11:26:14AM -0700, W. Trevor King:

Wed, Jun 25, 2014 at 08:57:25AM -0700, Olivier Gambier:

  • or it should be made optional

Optional shouldn't be too hard, especially now that option parsing is
a bit better. Any feedback on this structure 1, or should I stay
closer to what I already have in this branch?

Ping.

I just saw #533, so I thought I'd ping this again. I'm still looking
for feedback on my proposed structure for swapable refcounting
algorithms…

@dmp42
Copy link
Copy Markdown
Contributor

dmp42 commented Sep 6, 2014

@wking sorry about the lag on this - will look

@wking
Copy link
Copy Markdown
Contributor Author

wking commented Sep 6, 2014

On Tue, Aug 12, 2014 at 09:28:21AM -0700, W. Trevor King wrote:

Wed, Jun 25, 2014 at 11:26:14AM -0700, W. Trevor King:

Wed, Jun 25, 2014 at 08:57:25AM -0700, Olivier Gambier:

  • or it should be made optional

Optional shouldn't be too hard, especially now that option parsing is
a bit better. Any feedback on this structure 1, or should I stay
closer to what I already have in this branch?

Ping.

I just saw #533, so I thought I'd ping this again. I'm still looking
for feedback on my proposed structure for swapable refcounting
algorithms…

@dmp42
Copy link
Copy Markdown
Contributor

dmp42 commented Sep 8, 2014

About the open questions:

I would like this to live in its own subclass, and leave Base as-is.

Also, I would be happy with separate drivers (s3-rf, file-rf - where rf=ref-counting - but any other, better, suffix can do) that would subclass both their parent (s3, file) and the refcounting base if that's possible.

I really want this as standalone and "separate" as possible - and to minimize the effort on the drivers developers - and make this an explicit (simple) choice for the users.

Sorry for the long wait on this, and for the bikeshedding - and thank you for your work on this @wking

Best.

@ncdc
Copy link
Copy Markdown
Contributor

ncdc commented Sep 16, 2014

@wking would you like me to try to convert this to an extension?

@wking
Copy link
Copy Markdown
Contributor Author

wking commented Sep 16, 2014

On Tue, Sep 16, 2014 at 01:08:00PM -0700, Andy Goldstein wrote:

would you like me to try to convert this to an extension?

Sure. If we're not touching Base, what's your plan for injecting this
into the Storage class structure? Personally, I think the current
storage API is a confusing mix of registry-facing APIs and
backend-facing helpers that leads to wonky behavior like whatever was
fixed by #553 (see also my earlier comments [1,2]. I've been working
towards what I'd like the storage APIs to look like 3, but it's
very WIP (and I'll wait until #570 lands before working on it much
more). If we get there, it the registry-side API would be a nice
place to hang some logging 4 where it's independent of any
particular storage backend.

I think this sort of thing would be much cleaner once we had the
filesystem stuff abstracted out of the registry-side storage API.

@ncdc
Copy link
Copy Markdown
Contributor

ncdc commented Sep 16, 2014

@wking I wasn't actually going to try to modify Storage at all. I was going to treat all the reference counting code (and the files it creates) as something solely managed by the extension (even though _references goes in .../images/[id]/).

@wking
Copy link
Copy Markdown
Contributor Author

wking commented Sep 16, 2014

On Tue, Sep 16, 2014 at 01:48:00PM -0700, Andy Goldstein wrote:

I wasn't actually going to try to modify Storage at all.

Ah, I suppose you could tie it to the signals instead. Where would
you store tag references (I do it with db05eef8)?

@ncdc
Copy link
Copy Markdown
Contributor

ncdc commented Sep 16, 2014

If we use your existing code in this PR, just like you've been doing it via image_references_path.

@wking
Copy link
Copy Markdown
Contributor Author

wking commented Sep 16, 2014

On Tue, Sep 16, 2014 at 01:59:20PM -0700, Andy Goldstein wrote:

If we use your existing code in this PR, just like you've been doing
it via image_references_path.

Which I added to Base in 3603066. If you're not touching Base, where
do you put it?

@ncdc
Copy link
Copy Markdown
Contributor

ncdc commented Sep 16, 2014

I'd put image_references_path in the extension code, and use signals to react to tags being created and deleted.

@wking
Copy link
Copy Markdown
Contributor Author

wking commented Sep 16, 2014

On Tue, Sep 16, 2014 at 02:08:49PM -0700, Andy Goldstein wrote:

I'd put image_references_path in the extension code, and use signals
to react to tags being created and deleted.

I think it's going to be hard to convince me to like anything that's
going to end up binding to the current Base API ;). Since fixing that
API is a long shot at the moment, feel free to do whatever it takes to
get this PR translated into something @dmp42 will accept sooner. If
it ends up keeping a lot of filesystem-backend assumptions, I can
always cut those strings and pretty things up later if/when we get a
cleaner registry-side storage API.

@ncdc
Copy link
Copy Markdown
Contributor

ncdc commented Sep 17, 2014

@wking do you think we need to worry about file locking and/or mutexes when updating image references?

@wking
Copy link
Copy Markdown
Contributor Author

wking commented Sep 17, 2014

On Wed, Sep 17, 2014 at 12:56:01PM -0700, Andy Goldstein wrote:

@wking do you think we need to worry about file locking and/or
mutexes when updating image references?

Here are my previous thoughts for these issues [1,2].

ncdc pushed a commit to ncdc/docker-registry that referenced this pull request Oct 3, 2014
Add refcount extension to store image references and cleanup orphans.
Based heavily on @wking's work in
docker-archive#409 - this is just an
adaptation to extension format.

Signed-off-by: Andy Goldstein <agoldste@redhat.com>
@ncdc ncdc mentioned this pull request Oct 3, 2014
ncdc pushed a commit to ncdc/docker-registry that referenced this pull request Oct 3, 2014
Add refcount extension to store image references and cleanup orphans.
Based heavily on @wking's work in
docker-archive#409 - this is just an
adaptation to extension format.

Signed-off-by: Andy Goldstein <agoldste@redhat.com>
ncdc pushed a commit to ncdc/docker-registry that referenced this pull request Oct 3, 2014
Add refcount extension to store image references and cleanup orphans.
Based heavily on @wking's work in
docker-archive#409 - this is just an
adaptation to extension format.

Signed-off-by: Andy Goldstein <agoldste@redhat.com>
ncdc pushed a commit to ncdc/docker-registry that referenced this pull request Oct 6, 2014
Add refcount extension to store image references and cleanup orphans.
Based heavily on @wking's work in
docker-archive#409 - this is just an
adaptation to extension format.

Signed-off-by: Andy Goldstein <agoldste@redhat.com>
ncdc pushed a commit to ncdc/docker-registry that referenced this pull request Oct 6, 2014
Add refcount extension to store image references and cleanup orphans.
Based heavily on @wking's work in
docker-archive#409 - this is just an
adaptation to extension format.

Signed-off-by: Andy Goldstein <agoldste@redhat.com>
@dbason
Copy link
Copy Markdown

dbason commented Oct 9, 2014

@wking Sorry if this has been covered off (I'm having problems following the various discussions on deletion of images). Will the code as is clean up images on an s3 compatible back end (I ask because most of the scripts/cleanups I have seen in other threads look to be local file system only)?

@wking
Copy link
Copy Markdown
Contributor Author

wking commented Oct 9, 2014

On Thu, Oct 09, 2014 at 02:52:35PM -0700, dbason wrote:

Sorry if this has been covered off (I'm having problems following
the various discussions on deletion of images). Will the code as is
clean up images on an s3 compatible back end (I ask because most of
the scripts/cleanups I have seen in other threads look to be local
file system only)?

I tweaked Base a bit, but I don't think I added any new
storage-implementation-facing methods, so I expect it will. This
particular implementation is unlikely to land though, so you might
want to checkout #606 instead.

@wking wking mentioned this pull request Oct 10, 2014
ncdc pushed a commit to ncdc/docker-registry that referenced this pull request Oct 13, 2014
Add refcount extension to store image references and cleanup orphans.
Based heavily on @wking's work in
docker-archive#409 - this is just an
adaptation to extension format.

Signed-off-by: Andy Goldstein <agoldste@redhat.com>
@dmp42
Copy link
Copy Markdown
Contributor

dmp42 commented Oct 29, 2014

@wking @ncdc do you guys intend to work on this for v1?

@wking
Copy link
Copy Markdown
Contributor Author

wking commented Oct 29, 2014

On Wed, Oct 29, 2014 at 10:53:28AM -0700, Olivier Gambier wrote:

@wking @ncdc do you guys intend to work on this for v1?

Not me, and @ncdc has #606.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants