Skip to content

Conversation

@guestisp
Copy link

still a WIP but this PR aim to add support for versioning and retention for malware protection to the native Google Cloud Storage backend

backup and retention extension are working as expected
point in time still doesn't work, probably i need some help in this regard.

@codecov
Copy link

codecov bot commented Feb 2, 2024

Codecov Report

Attention: Patch coverage is 33.33333% with 8 lines in your changes are missing coverage. Please review.

Project coverage is 75.83%. Comparing base (e389d92) to head (22ab4e7).
Report is 124 commits behind head on master.

❗ Current head 22ab4e7 differs from pull request most recent head 2cb602f. Consider uploading reports for the commit 2cb602f to get more accurate results

Files Patch % Lines
cli/storage_gcs.go 33.33% 7 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3564      +/-   ##
==========================================
+ Coverage   75.81%   75.83%   +0.01%     
==========================================
  Files         465      465              
  Lines       37171    37183      +12     
==========================================
+ Hits        28183    28199      +16     
+ Misses       7059     7056       -3     
+ Partials     1929     1928       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@guestisp
Copy link
Author

Can someone help a bit ?

if opts.RetentionPeriod != 0 {
retainUntilDate := clock.Now().Add(opts.RetentionPeriod).UTC()
writer.ObjectAttrs.Retention = &storage.ObjectRetention{
Mode: "Unlocked",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Mode: "Unlocked",
Mode: blob.Locked,

}

func (gcs *gcsStorage) ExtendBlobRetention(ctx context.Context, b blob.ID, opts blob.ExtendOptions) error {
retentionMode := "Unlocked" // TODO: properly handle lock mode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
retentionMode := "Unlocked" // TODO: properly handle lock mode
retentionMode := blob.Locked

bucket: cli.Bucket(opt.BucketName),
}

fmt.Println("Provo il PIT")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general the Println's can go

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know, it's a WIP . i've opened the PR asking for help because im stuck in object sorting

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into it Today, see if I notice anything

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main issue i had is that GCS doesn't allow sorting object versions by creation time so i had to list all versions, sort manually by creation time, cycle the result and keep the newestthan (or older then). but im not a Go programmer, i had huge issues trying to sort and debug this thing. i have an apparently working version on my laptop but i'm not confident on why It works 🤣

(probably on Discord i can explain better)

Copy link
Contributor

@KastenMike KastenMike Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So a couple notes from looking at it.

  • blob versions are automatically sorted from oldest to most recently created so no need to add sorting anywhere
  • IsDeleteMarker needs to consider not only if Deleted is set but also if it was deleted after the PIT e.g. !oi.Deleted.IsZero() && (gcs.PointInTime == nil || oi.Deleted.Before(*gcs.PointInTime)). This is because GCS decided to mark every previous version as being deleted. For convenience you can change infoToVersionMetadata to be a method instead of func
  • newestAtUnlessDeleted needs to return !v.IsDeleteMarker as mentioned in another comment
  • getOlderThan needs return vs[:i] like Azure since they're ascending

If you make these changes then PIT will work as expected

Not really to PIT in particular:
I guess better if Timestamp is .Created rather than .Updated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They might not have custom sorting, but we don't need it. Or manual sorting. You can delete your sort calls and create 10 versions of a blob and every list command will have them in ascending order at the PRE SORT log you added

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see the changes I mentioned here

return v, true is not correct since it would return deleted blobs e.g. PIT=15:00, blob deleted 14:00 even when it is the latest/only blob version

I've seen your branch. If it works, i can safely drop this PR and fork yours as mine was just a wip and not a working one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably they just added sorting to blobs in the time since you created this PR

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They might not have custom sorting, but we don't need it. Or manual sorting. You can delete your sort calls and create 10 versions of a blob and every list command will have them in ascending order at the PRE SORT log you added

When i did this PR it wasn't stable, that's why i've added a sort function, trying to make that stable. Probably they have changed something .....

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably they just added sorting to blobs in the time since you created this PR

Honestly, looking carefully at the versioning list, probably they don't have a sorting option for object NAMES, but the versions are named with the timestamp and even if you sort them lexicographically (https://pkg.go.dev/cloud.google.com/go/storage#BucketHandle.Objects) a timestamp is always sorted properly (as long as you don't add digits). So, probably, it always worked in this way, i've added a buggy code for a non-existant issue. Shame on me.

fmt.Printf("newestAtUnlessDeleted del pit %s deleted=%t\n", v.Timestamp, v.IsDeleteMarker)

//return v, !v.IsDeleteMarker
return v, true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here you assume a blob is never deleted

func getOlderThan(vs []versionMetadata, t time.Time) []versionMetadata {
fmt.Printf("getOlderThan %s \n", t)

sort.Slice(vs, func(i, j int) bool { return vs[i].Timestamp.Before(vs[j].Timestamp) })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're sorting both here and in newestAtUnlessDeleted

return nil, errors.Wrapf(err, "could not get determine if bucket '%s' supports versioning", gcs.BucketName)
}

if !attrs.VersioningEnabled {
Copy link
Contributor

@KastenMike KastenMike Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also should error if attrs.ObjectRetentionMode != "Enabled"
Otherwise connecting/creating to a repo fails: ERROR cannot initialize repository: unable to write blobcfg blob: PutBlob() failed for "kopia.blobcfg": unable to complete PutBlob(kopia.blobcfg) despite 10 retries: unexpected GCS error: googleapi: Error 400: Bucket does not have object retention enabled., invalid


if !attrs.VersioningEnabled {
if !attrs.VersioningEnabled || attrs.ObjectRetentionMode != "Enabled" {
return nil, errors.Errorf("cannot create point-in-time view for non-versioned bucket '%s'", gcs.BucketName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A separate check would be better. That way you can highlight that ObjectRetention is the problem in the err

return versionMetadata{
Metadata: bm,
IsDeleteMarker: !oi.Deleted.IsZero(),
IsDeleteMarker: !oi.Deleted.IsZero() && (gcs.PointInTime == nil || oi.Deleted.Before(*gcs.PointInTime)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be func (gcs *gcsStorage) infoToVersionMetadata for this to work

if opts.RetentionPeriod != 0 {
retainUntilDate := clock.Now().Add(opts.RetentionPeriod).UTC()
writer.ObjectAttrs.Retention = &storage.ObjectRetention{
Mode: blob.Locked,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Mode: blob.Locked,
Mode: string(blob.Locked),

retainUntilDate := clock.Now().Add(opts.RetentionPeriod).UTC().Truncate(time.Second)

ObjectRetention := &storage.ObjectRetention{
Mode: retentionMode,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Mode: retentionMode,
Mode: string(blob.Locked),

// It returns true even if versioning is enabled but currently suspended for the
// bucket. Notice that when object locking is enabled in a bucket, object
// versioning is enabled and cannot be suspended.
func (gcs *gcsStorage) IsVersioned(ctx context.Context) (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this isn't used and can be removed

}

// listBlobVersions lists all versions for all the blobs with the given blob ID prefix.
func (gcs *gcsStorage) listBlobVersions(ctx context.Context, prefix blob.ID, callback versionMetadataCallback) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (gcs *gcsStorage) listBlobVersions(ctx context.Context, prefix blob.ID, callback versionMetadataCallback) error {
func (gcs *gcsPointInTimeStorage) listBlobVersions(ctx context.Context, prefix blob.ID, callback versionMetadataCallback) error {

I guess everything in here should be gcsPointInTimeStorage rather than gcsStorage

}

func toBlobID(blobName, prefix string) blob.ID {
return blob.ID(prefix + strings.TrimPrefix(blobName, prefix))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return blob.ID(prefix + strings.TrimPrefix(blobName, prefix))
return blob.ID(strings.TrimPrefix(blobName, prefix))

return blob.ID(prefix + strings.TrimPrefix(blobName, prefix))
}

func infoToVersionMetadata(prefix string, oi *storage.ObjectAttrs) versionMetadata {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo it would be better to do the Azure style where getVersionedBlobMeta calls az.getBlobMeta(it). This way there is only 1 place in the code where versionMetadata is created & (there should be, but I missed a bit) 1 place where blob.Metadata is created.

Also it would mean the non-versioned code doesn't touch the versioned side which imo is better. e.g. this calls GetMetadata rather than GetMetadata calling this & discarding things

getVersionedBlobMeta also sounds better than infoToVersionMetadata imo

@KastenMike
Copy link
Contributor

Some follow up comments based on the current state of the PR, I also left a few more on this PR:

  • Can you create an isCreate check on Connect like Azure/AWS has
  • separate out !attrs.VersioningEnabled || attrs.ObjectRetentionMode != "Enabled" checks
  • string(blob.Locked) as mentioned in a comment to get around compilation issues-
  • remove the sorts
  • remove the print statements

Otherwise it looks ok 👍

Finally could you add some test coverage and a documentation section for this PR as 2 separate PRs? The test coverage can be a new branch based off this one (PR train~) to avoid the PR getting to big
Docs can just be a small section added like this for Azure
For tests you could mimic azure_versioned_test.go and azure_immu_test.go

@kreare
Copy link

kreare commented Sep 2, 2024

Some follow up comments based on the current state of the PR, I also left a few more on this PR:

  • Can you create an isCreate check on Connect like Azure/AWS has
  • separate out !attrs.VersioningEnabled || attrs.ObjectRetentionMode != "Enabled" checks
  • string(blob.Locked) as mentioned in a comment to get around compilation issues-
  • remove the sorts
  • remove the print statements

Otherwise it looks ok 👍

Finally could you add some test coverage and a documentation section for this PR as 2 separate PRs? The test coverage can be a new branch based off this one (PR train~) to avoid the PR getting to big Docs can just be a small section added like this for Azure For tests you could mimic azure_versioned_test.go and azure_immu_test.go

currently i'm a little bit busy at work and my current version rebuilt locally (a couple of days ago) with the new laptop and newer Go version doesn't work anymore, it crash and i don't have time to troubleshoot due to lack of time

i can do the above suggested fixes but i don't have time to test or write docs, in a short time


retainUntilDate := clock.Now().Add(opts.RetentionPeriod).UTC().Truncate(time.Second)

ObjectRetention := &storage.ObjectRetention{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ObjectRetention := &storage.ObjectRetention{
objectRetention := &storage.ObjectRetention{

@julio-lopez
Copy link
Collaborator

Implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants