Skip to content

feat: implement deleter#470

Closed
carabasdaniel wants to merge 4 commits into
oras-project:mainfrom
opcr-io:delete
Closed

feat: implement deleter#470
carabasdaniel wants to merge 4 commits into
oras-project:mainfrom
opcr-io:delete

Conversation

@carabasdaniel

Copy link
Copy Markdown

This should close: #454

@carabasdaniel carabasdaniel marked this pull request as ready for review March 24, 2023 13:26
@codecov-commenter

codecov-commenter commented Mar 27, 2023

Copy link
Copy Markdown

Codecov Report

Attention: 17 lines in your changes are missing coverage. Please review.

Comparison is base (7dd0378) 73.22% compared to head (69a1851) 73.09%.
Report is 47 commits behind head on main.

Files Patch % Lines
content/memory/memory.go 33.33% 4 Missing and 2 partials ⚠️
content/oci/oci.go 60.00% 4 Missing and 2 partials ⚠️
content/oci/storage.go 57.14% 2 Missing and 1 partial ⚠️
internal/resolver/memory.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #470      +/-   ##
==========================================
- Coverage   73.22%   73.09%   -0.14%     
==========================================
  Files          49       49              
  Lines        4602     4642      +40     
==========================================
+ Hits         3370     3393      +23     
- Misses        923      935      +12     
- Partials      309      314       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@shizhMSFT shizhMSFT changed the title feature: Implement deleter feat: implement deleter Mar 27, 2023
@carabasdaniel

Copy link
Copy Markdown
Author

Hello @shizhMSFT @Wwwsylvia,

This is the start for the delete implementation, I would definitely appreciate a bit of feedback on this PR.

@shizhMSFT

Copy link
Copy Markdown
Contributor

@carabasdaniel Thanks for your contribution. We recognize and are reviewing this PR. Since it is related to design, it takes more time to be reviewed but should finish in 1 day or two.

Comment thread content/storage.go
Comment thread content/oci/storage.go Outdated
Comment thread content/oci/oci.go
return desc, nil
}

func (s *Store) Delete(ctx context.Context, target ocispec.Descriptor) error {

@Wwwsylvia Wwwsylvia Mar 29, 2023

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One tricky thing about deleting is that we need to maintain the predecessors relationship as well. See Store.Push and Store.Predecessors.
This also applies to other GraphStorage that implement PredecessorsFinder.

Comment thread content/file/file.go Outdated
Comment thread content/storage.go
Comment thread internal/cas/proxy.go Outdated
Comment thread internal/resolver/memory.go Outdated
Comment thread internal/cas/memory.go Outdated
Comment thread internal/cas/memory.go Outdated
Comment thread content/file/file.go Outdated
Comment thread content/oci/storage.go Outdated
Comment thread content/oci/storage.go Outdated
Comment thread content/oci/storage.go Outdated
Comment thread content/oci/oci.go
Comment on lines +196 to +204
resolvers := s.tagResolver.Map()
for reference, desc := range resolvers {
if content.Equal(desc, target) {
s.tagResolver.Remove(reference)
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if some is pushing and deleting at the same time? There is a race condition.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The targetResolver.Map() represent an inconsistent snapshot of the storage content as far as I understand, therefore I'm not sure if a delete of a pushing content would be possible as the tagResolver is actually a syncMap.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carabasdaniel Since it is possible that someone is trying to push and delete at the same time, we have to trade-off some performance for the deletion.

Similar to what I have mentioned in other comments, the current oci.Store is append-only and thus it has the benefits that multiple push and fetch can happen at the same time without race conditions.

However, adding the functionality of Delete() makes the store mutable, and we have to change the underlying implementation from sync maps to a RWMutex to ensure consistency. The drawback is that we no longer can push and delete simultaneously.

Overall, I suggest

  1. Drop Delete() for oci.Store.
  2. Develop oci.MutableStore with Delete() support.

/cc @Wwwsylvia

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I suggest
Drop Delete() for oci.Store.
Develop oci.MutableStore with Delete() support.

It makes sense.

@carabasdaniel I would also like to learn your use cases of deleting content in OCI layout. Is it for GC purpose? Could you share a little bit?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Wwwsylvia, @shizhMSFT,

We have a tool used for building OCI images from rego policies: policy CLI.

We have been using the previous major version of oras-go to build our containers and interact with upstream OCI compliant registries. In our case we want the user to be able to build and easily remove containers from the local OCI storage, thus the need to have a delete mechanism in place, keeping the same workflow feel as the docker CLI.

At the moment we use a fork that includes these changes in our v0.2.0 release candidate versions of the policy CLI and it seems to work with our current needs.

I think having a different store might make sense.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thank you for sharing this!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Wwwsylvia for additional use case info... I am working on https://github.com/awslabs/soci-snapshotter which uses https://pkg.go.dev/oras.land/oras-go/content to keep content in the local filesystem. Our artifacts are conceptually similar to image and layer metadata; we store one artifact per image we handle, and one artifact per layer in those images, with one-way references (via a list of digests equivalent to image layers) from artifacts of the first type to artifacts of the second type.

@carabasdaniel

Copy link
Copy Markdown
Author

Hi @shizhMSFT @Wwwsylvia,

Thanks for the feedback, I'll start working on the required changes.

Comment thread internal/cas/memory.go Outdated
Comment thread content/storage.go Outdated
Comment on lines +48 to +49
// DeleteStorage represents an extension of the Storage interface that includes the Deleter.
type DeleteStorage interface {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Storage interface abstracts append-only storages. Adding Deleter makes the storage a MutableStorage.

@Wwwsylvia Do you have a better name suggestion on DeleteStorage?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MutableStorage sounds like that the existing data can be modified or deleted. But in our case, the data can be deleted but cannot be modified.
How about StorageDeleter or DeletableStorage?
BTW, do we really need to define this interface?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type StorageWithDelete interface {
    Storage
    Deleter
}

This is from my current efforts in the same direction as this PR. If you don't define this interface, folks like me will just have to do it themselves, and our naming divergence will make it harder to find usage examples in the wild.

Comment thread content/oci/storage.go Outdated
Comment thread content/oci/storage.go Outdated
Comment thread content/oci/oci.go Outdated
Comment thread content/oci/oci.go
Comment thread internal/cas/memory_test.go
Comment thread internal/cas/memory_test.go
Comment thread content/storage.go Outdated
Comment on lines +48 to +49
// DeleteStorage represents an extension of the Storage interface that includes the Deleter.
type DeleteStorage interface {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MutableStorage sounds like that the existing data can be modified or deleted. But in our case, the data can be deleted but cannot be modified.
How about StorageDeleter or DeletableStorage?
BTW, do we really need to define this interface?

Comment thread content/oci/storage_test.go
@sparr

sparr commented May 26, 2023

Copy link
Copy Markdown
Contributor

I find myself preparing to implement similar functionality as part of https://github.com/awslabs/soci-snapshotter and want to check in on this PR. I would much prefer to contribute upstream and then consume that. Is this still under development? Would additional effort be welcome?

Comment thread content/oci/oci_test.go
return true
}

func TestStore_Delete(t *testing.T) {

@sparr sparr May 26, 2023

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can some/most of this be deduplicated with the memory_test.go TestStoreDelete with similar contents? Ditto in storage_test.go

Comment thread content/oci/storage.go Outdated

@TerryHowe TerryHowe left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like maybe two other changes needed here as well as the deletes

Comment thread content/oci/storage.go Outdated
Comment thread content/oci/oci_test.go Outdated
Comment thread content/oci/oci.go Outdated
carabasdaniel and others added 4 commits June 2, 2023 10:26
Signed-off-by: carabasdaniel <dani@aserto.com>
Co-authored-by: Terry Howe <terrylhowe@gmail.com>
Signed-off-by: carabasdaniel <dani@aserto.com>
Co-authored-by: Terry Howe <terrylhowe@gmail.com>
Signed-off-by: carabasdaniel <dani@aserto.com>
Co-authored-by: Terry Howe <terrylhowe@gmail.com>
Signed-off-by: carabasdaniel <dani@aserto.com>
@carabasdaniel

Copy link
Copy Markdown
Author

Merged suggested changes and rebased on top of latest upstream.
If we merge this I can open another PR that adds the untag feature.

Thanks for the help @TerryHowe.

@sparr

sparr commented Jun 2, 2023

Copy link
Copy Markdown
Contributor

@carabasdaniel There appear to be two outstanding comments referring to needed functionality for updating graph and resolver when content is deleted.

My limited understanding is that graph is used for predecessor/successor (aka parent/child) lookups and traversal of content and is populated in Store.Push where it calls graph.Index and during loadIndex when it loads index.json and then calls graph.IndexAll, and resolver is used for looking up content by tags and is populated by Tag (which may require something like your Untag to undo although I haven't looked far enough to determine if it would meet that need)

@Wwwsylvia

Copy link
Copy Markdown
Member

My limited understanding is that graph is used for predecessor/successor (aka parent/child) lookups and traversal of content and is populated in Store.Push where it calls graph.Index and during loadIndex when it loads index.json and then calls graph.IndexAll, and resolver is used for looking up content by tags and is populated by Tag (which may require something like your Untag to undo although I haven't looked far enough to determine if it would meet that need)

Yes it's right.

@carabasdaniel

Copy link
Copy Markdown
Author

Hi @sparr @Wwwsylvia,

Currently my bandwidth is very limited so I'm not sure when I'll be able to come back to this. For our current needs we are using a fork that has this change and the UnTag change merged.

In addition to the recent changes I made here I also prepared an untag branch that would build on top of this PR for the resolver.

I would appreciate any help to get this set implemented/merged.

@shizhMSFT shizhMSFT left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it is technically challenging to add Delete() to the existing *oci.Store, as mentioned in #470 (comment), I would suggest developing a new store named *oci.DeletableStore where the functionality of Predecessors() can be dropped so that the difficulty of the development should be dramatically reduced.

It should also meet the requirements of https://github.com/opcr-io/policy as it does not leverage the functionality of Predecessors().

Comment thread content/memory/memory.go
}

// Delete a target descriptor for storage.
func (s *Store) Delete(ctx context.Context, target ocispec.Descriptor) error {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s.resolver is for the tagging service, i.e. for Resolve(), as Resolve() should not resolve a reference to a deleted descriptor. Similarly, s.graph is for Predecessors().

Since this PR mainly focuses on OCI store, we can keep the changes to the Memory store in another PR.

Comment thread internal/graph/memory.go
}

// RemoveFromIndex removes the node and all its predecessors from the index and predecessor maps.
func (m *Memory) RemoveFromIndex(ctx context.Context, fetcher content.Fetcher, node ocispec.Descriptor) error {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems fetcher is not used anywhere.

Comment thread content/oci/oci.go
Comment on lines +197 to +198
// Delete removed a target descriptor from index and storage.
func (s *Store) Delete(ctx context.Context, target ocispec.Descriptor) error {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in #470 (comment), the Delete() method might be called concurrently with other methods like Push() and Resolve(). Race condition is a concern here.

@sparr

sparr commented Jun 9, 2023

Copy link
Copy Markdown
Contributor

Speaking only for myself, an implementation of a new Store type with Delete functionality but without Predecessors would not fit my current use case. My current solution is to delete files then recreate index.json then recreate the content store so that the graph will be reindexed from scratch.

@shizhMSFT

Copy link
Copy Markdown
Contributor

The main technical challenge here is that the current *oci.Store implementation is optimized for the concurrent use of append-only store in terms of performance. It is non-trivial to modify it to have the delete functionality.

To unblock this PR attempt, @wangxiaoxuan273 could you prototype a *oci.DeletableStore with both the functionality of Delete() and Predecessors()? We can have performance penalty, but we have to make sure that all public methods of *oci.DeletableStore MUST be go-routine safe since it is by design for all Targets.

@TerryHowe

Copy link
Copy Markdown
Member

Where is this going. Any progress on the oci.DeletableStore ?

@wangxiaoxuan273

wangxiaoxuan273 commented Jul 8, 2023

Copy link
Copy Markdown
Contributor

Where is this going. Any progress on the oci.DeletableStore ?

No progress so far. I'm still in the stage of investigating. But it's likely that this PR will not be continued, and we will open a new one to fix this issue. @TerryHowe

@Wwwsylvia

Copy link
Copy Markdown
Member

Where is this going. Any progress on the oci.DeletableStore ?

FYI, here is the PR authored by @wangxiaoxuan273: #614

@shizhMSFT shizhMSFT added the stale Inactive issues or pull requests label Dec 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale Inactive issues or pull requests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

oci.Store and oci.Storage should implement content.Deleter

8 participants