-
Notifications
You must be signed in to change notification settings - Fork 594
feat(repo)gcs: add support for versioning and retention #3564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3564 +/- ##
==========================================
+ Coverage 75.81% 75.83% +0.01%
==========================================
Files 465 465
Lines 37171 37183 +12
==========================================
+ Hits 28183 28199 +16
+ Misses 7059 7056 -3
+ Partials 1929 1928 -1 ☔ View full report in Codecov by Sentry. |
|
Can someone help a bit ? |
repo/blob/gcs/gcs_storage.go
Outdated
| if opts.RetentionPeriod != 0 { | ||
| retainUntilDate := clock.Now().Add(opts.RetentionPeriod).UTC() | ||
| writer.ObjectAttrs.Retention = &storage.ObjectRetention{ | ||
| Mode: "Unlocked", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Mode: "Unlocked", | |
| Mode: blob.Locked, |
repo/blob/gcs/gcs_storage.go
Outdated
| } | ||
|
|
||
| func (gcs *gcsStorage) ExtendBlobRetention(ctx context.Context, b blob.ID, opts blob.ExtendOptions) error { | ||
| retentionMode := "Unlocked" // TODO: properly handle lock mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| retentionMode := "Unlocked" // TODO: properly handle lock mode | |
| retentionMode := blob.Locked |
| bucket: cli.Bucket(opt.BucketName), | ||
| } | ||
|
|
||
| fmt.Println("Provo il PIT") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in general the Println's can go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i know, it's a WIP . i've opened the PR asking for help because im stuck in object sorting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look into it Today, see if I notice anything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
main issue i had is that GCS doesn't allow sorting object versions by creation time so i had to list all versions, sort manually by creation time, cycle the result and keep the newestthan (or older then). but im not a Go programmer, i had huge issues trying to sort and debug this thing. i have an apparently working version on my laptop but i'm not confident on why It works 🤣
(probably on Discord i can explain better)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So a couple notes from looking at it.
- blob versions are automatically sorted from oldest to most recently created so no need to add sorting anywhere
IsDeleteMarkerneeds to consider not only ifDeletedis set but also if it was deleted after the PIT e.g.!oi.Deleted.IsZero() && (gcs.PointInTime == nil || oi.Deleted.Before(*gcs.PointInTime)). This is because GCS decided to mark every previous version as being deleted. For convenience you can changeinfoToVersionMetadatato be a method instead of funcnewestAtUnlessDeletedneeds to return!v.IsDeleteMarkeras mentioned in another commentgetOlderThanneedsreturn vs[:i]like Azure since they're ascending
If you make these changes then PIT will work as expected
Not really to PIT in particular:
I guess better if Timestamp is .Created rather than .Updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They might not have custom sorting, but we don't need it. Or manual sorting. You can delete your sort calls and create 10 versions of a blob and every list command will have them in ascending order at the PRE SORT log you added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can see the changes I mentioned here
return v, trueis not correct since it would return deleted blobs e.g. PIT=15:00, blob deleted 14:00 even when it is the latest/only blob version
I've seen your branch. If it works, i can safely drop this PR and fork yours as mine was just a wip and not a working one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably they just added sorting to blobs in the time since you created this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They might not have custom sorting, but we don't need it. Or manual sorting. You can delete your sort calls and create 10 versions of a blob and every list command will have them in ascending order at the
PRE SORTlog you added
When i did this PR it wasn't stable, that's why i've added a sort function, trying to make that stable. Probably they have changed something .....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably they just added sorting to blobs in the time since you created this PR
Honestly, looking carefully at the versioning list, probably they don't have a sorting option for object NAMES, but the versions are named with the timestamp and even if you sort them lexicographically (https://pkg.go.dev/cloud.google.com/go/storage#BucketHandle.Objects) a timestamp is always sorted properly (as long as you don't add digits). So, probably, it always worked in this way, i've added a buggy code for a non-existant issue. Shame on me.
repo/blob/gcs/gcs_pit.go
Outdated
| fmt.Printf("newestAtUnlessDeleted del pit %s deleted=%t\n", v.Timestamp, v.IsDeleteMarker) | ||
|
|
||
| //return v, !v.IsDeleteMarker | ||
| return v, true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here you assume a blob is never deleted
| func getOlderThan(vs []versionMetadata, t time.Time) []versionMetadata { | ||
| fmt.Printf("getOlderThan %s \n", t) | ||
|
|
||
| sort.Slice(vs, func(i, j int) bool { return vs[i].Timestamp.Before(vs[j].Timestamp) }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're sorting both here and in newestAtUnlessDeleted
repo/blob/gcs/gcs_pit.go
Outdated
| return nil, errors.Wrapf(err, "could not get determine if bucket '%s' supports versioning", gcs.BucketName) | ||
| } | ||
|
|
||
| if !attrs.VersioningEnabled { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also should error if attrs.ObjectRetentionMode != "Enabled"
Otherwise connecting/creating to a repo fails: ERROR cannot initialize repository: unable to write blobcfg blob: PutBlob() failed for "kopia.blobcfg": unable to complete PutBlob(kopia.blobcfg) despite 10 retries: unexpected GCS error: googleapi: Error 400: Bucket does not have object retention enabled., invalid
|
|
||
| if !attrs.VersioningEnabled { | ||
| if !attrs.VersioningEnabled || attrs.ObjectRetentionMode != "Enabled" { | ||
| return nil, errors.Errorf("cannot create point-in-time view for non-versioned bucket '%s'", gcs.BucketName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A separate check would be better. That way you can highlight that ObjectRetention is the problem in the err
| return versionMetadata{ | ||
| Metadata: bm, | ||
| IsDeleteMarker: !oi.Deleted.IsZero(), | ||
| IsDeleteMarker: !oi.Deleted.IsZero() && (gcs.PointInTime == nil || oi.Deleted.Before(*gcs.PointInTime)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be func (gcs *gcsStorage) infoToVersionMetadata for this to work
| if opts.RetentionPeriod != 0 { | ||
| retainUntilDate := clock.Now().Add(opts.RetentionPeriod).UTC() | ||
| writer.ObjectAttrs.Retention = &storage.ObjectRetention{ | ||
| Mode: blob.Locked, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Mode: blob.Locked, | |
| Mode: string(blob.Locked), |
| retainUntilDate := clock.Now().Add(opts.RetentionPeriod).UTC().Truncate(time.Second) | ||
|
|
||
| ObjectRetention := &storage.ObjectRetention{ | ||
| Mode: retentionMode, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Mode: retentionMode, | |
| Mode: string(blob.Locked), |
| // It returns true even if versioning is enabled but currently suspended for the | ||
| // bucket. Notice that when object locking is enabled in a bucket, object | ||
| // versioning is enabled and cannot be suspended. | ||
| func (gcs *gcsStorage) IsVersioned(ctx context.Context) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this isn't used and can be removed
| } | ||
|
|
||
| // listBlobVersions lists all versions for all the blobs with the given blob ID prefix. | ||
| func (gcs *gcsStorage) listBlobVersions(ctx context.Context, prefix blob.ID, callback versionMetadataCallback) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| func (gcs *gcsStorage) listBlobVersions(ctx context.Context, prefix blob.ID, callback versionMetadataCallback) error { | |
| func (gcs *gcsPointInTimeStorage) listBlobVersions(ctx context.Context, prefix blob.ID, callback versionMetadataCallback) error { |
I guess everything in here should be gcsPointInTimeStorage rather than gcsStorage
| } | ||
|
|
||
| func toBlobID(blobName, prefix string) blob.ID { | ||
| return blob.ID(prefix + strings.TrimPrefix(blobName, prefix)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return blob.ID(prefix + strings.TrimPrefix(blobName, prefix)) | |
| return blob.ID(strings.TrimPrefix(blobName, prefix)) |
| return blob.ID(prefix + strings.TrimPrefix(blobName, prefix)) | ||
| } | ||
|
|
||
| func infoToVersionMetadata(prefix string, oi *storage.ObjectAttrs) versionMetadata { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo it would be better to do the Azure style where getVersionedBlobMeta calls az.getBlobMeta(it). This way there is only 1 place in the code where versionMetadata is created & (there should be, but I missed a bit) 1 place where blob.Metadata is created.
Also it would mean the non-versioned code doesn't touch the versioned side which imo is better. e.g. this calls GetMetadata rather than GetMetadata calling this & discarding things
getVersionedBlobMeta also sounds better than infoToVersionMetadata imo
|
Some follow up comments based on the current state of the PR, I also left a few more on this PR:
Otherwise it looks ok 👍 Finally could you add some test coverage and a documentation section for this PR as 2 separate PRs? The test coverage can be a new branch based off this one (PR train~) to avoid the PR getting to big |
currently i'm a little bit busy at work and my current version rebuilt locally (a couple of days ago) with the new laptop and newer Go version doesn't work anymore, it crash and i don't have time to troubleshoot due to lack of time i can do the above suggested fixes but i don't have time to test or write docs, in a short time |
|
|
||
| retainUntilDate := clock.Now().Add(opts.RetentionPeriod).UTC().Truncate(time.Second) | ||
|
|
||
| ObjectRetention := &storage.ObjectRetention{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ObjectRetention := &storage.ObjectRetention{ | |
| objectRetention := &storage.ObjectRetention{ |
|
Implemented. |
still a WIP but this PR aim to add support for versioning and retention for malware protection to the native Google Cloud Storage backend
backup and retention extension are working as expected
point in time still doesn't work, probably i need some help in this regard.