-
Notifications
You must be signed in to change notification settings - Fork 2.7k
S3 driver: Attempt HeadObject on Stat first, fail over to List #4401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3 driver: Attempt HeadObject on Stat first, fail over to List #4401
Conversation
17f0e1e to
7617627
Compare
registry/storage/driver/s3-aws/s3.go
Outdated
| Bucket: aws.String(d.Bucket), | ||
| Key: aws.String(s3Path), | ||
| }) | ||
| if err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://go.dev/wiki/CodeReviewComments#indent-error-flow
I would suggest refactoring this into three functions:
func (d *driver) Stat(ctx context.Context, path string) (storagedriver.FileInfo, error) {
s3Path := d.s3Path(path)
fi, err := d.statHead(ctx, s3Path)
if err != nil {
if _, ok := err.(awserr.Error); ok {
var err error
fi, err = d.statList(ctx, s3Path)
if err != nil {
return nil, err
}
}
}
fi.Path = path
return storagedriver.FileInfoInternal{FileInfoFields: fi}, nil
}|
PTAL @thaJeztah @Jamstah @squizzi |
registry/storage/driver/s3-aws/s3.go
Outdated
| fi, err := d.statList(ctx, s3Path) | ||
| if err != nil { | ||
| if errors.Is(err, storagedriver.PathNotFoundError{}) { | ||
| return nil, storagedriver.PathNotFoundError{Path: path} | ||
| } | ||
| return nil, parseError(path, err) | ||
| } | ||
| fi.Path = path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a fan of mutating the returned type and setting the path here.
Could we instead make statList and statHead accept the unprocessed path argument, and make them set the Path in the FileInfo they return? (internalising the d.s3Path into those functions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why we're not doing that (see Corey's suggestion) is there is no need to do strings.Trimming [in worst case] d.S3Path does, multiple times. We do it once in Stat and pass it down.
registry/storage/driver/s3-aws/s3.go
Outdated
| // are slightly outdated, the HeadObject actually returns NotFound error | ||
| // if querying a key which doesn't exist or a key which has nested keys | ||
| // and Forbidden if IAM/ACL permissions do not allow Head but allow List. | ||
| if _, ok := err.(awserr.Error); ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this should be an errors.Is() / errors.As ?
registry/storage/driver/s3-aws/s3.go
Outdated
| var fi storagedriver.FileInfoFields | ||
| if len(resp.Contents) == 1 { | ||
| if *resp.Contents[0].Key != s3Path { | ||
| fi.IsDir = true | ||
| return storagedriver.FileInfoInternal{FileInfoFields: fi}, nil | ||
| return &fi, nil | ||
| } | ||
|
|
||
| return nil, storagedriver.PathNotFoundError{Path: path} | ||
| fi.IsDir = false | ||
| fi.Size = *resp.Contents[0].Size | ||
| fi.ModTime = *resp.Contents[0].LastModified | ||
| return &fi, nil | ||
| } | ||
| if len(resp.CommonPrefixes) == 1 { | ||
| fi.IsDir = true | ||
| return &fi, nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove the var fi storagedriver.FileInfoFields, and just inline constructing the value; it reduces cognitive load (did fi already have fields set elsewhere?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It (inline constructed values) also unnecessarily bloats the code in this specific func. The original code I wrote did exactly that. Not sure if it's worth it, tbh.
|
@milosgajdos opened a quick PR against your PR branch with my suggestions; milosgajdos#1 (if you agree with the changes, you can merge them, and the'll get into this PR, but it probably needs some squashing 😄) |
|
I saw that and mmm, I'm unconvinced most of those changes are needed 🤔 But also, at the same time, whatever gets anything going in this repo. |
|
PTAL @thaJeztah |
corhere
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're going to squash down to one commit, I assume?
I am waiting for Seb to see if he wants his commits to be in this PR or if he doesn't mind my squashing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I am waiting for Seb to see if he wants his commits to be in this PR or if he doesn't mind my squashing
Oh, definitely squash 'em and cleanup the commit message accordingly
You can make Cory and I partner in crime with Co-authored-by: comments (GitHub takes those into account), e.g.;
Co-authored-by: Cory Snider <corhere@gmail.com>
Co-authored-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Milos Gajdos <milosgajdos83@gmail.com>
1370015 to
68169e9
Compare
Stat always calls ListObjects when stat-ing S3 key. Unfortauntely ListObjects is not a free call - both in terms of egress and actual AWS costs (likely because of the egress). This changes the behaviour of Stat such that we always attempt the HeadObject call first and only ever fall through to ListObjects if the HeadObject returns an AWS API error. Note, that the official docs mention that the only error returned by HEAD is NoSuchKey; experiments show that this is demonstrably wrong and the AWS docs are simply outdated at the time of this commit. HeadObject actually returns the following errors: * NotFound: if the queried key does not exist * NotFound: if the queried key contains subkeys i.e. it's a prefix * BucketRegionError: if the bucket does not exist * Forbidden: if Head operation is not allows via IAM/ACLs Co-authored-by: Cory Snider <corhere@gmail.com> Co-authored-by: Sebastiaan van Stijn <github@gone.nl> Signed-off-by: Sebastiaan van Stijn <github@gone.nl> Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>
68169e9 to
a18cc8a
Compare
S3.StorageDriver.Statalways callsListObjectswhen stat-ing an S3 key to determine if it's a "directory" or not.Unfortunately, the
ListObjectsis not a free call: both in terms of egress and actual AWS costs (likely because of the egress).This changes the behaviour of
Statsuch that we always attempt theHeadObjectcall first and only ever fall through toListObjectsif theHeadObjectcall returns an AWS API error.Note, that the official docs mention that the only error returned by HEAD is
NoSuchKey(https://docs.aws.amazon.com/AmazonS3/latest/API/API_HeadObject.html#API_HeadObject_Errors); experiments show that this is demonstrably wrong and the AWS docs are simply outdated at the time of this commit.HeadObject actually returns the following errors:
NotFound: if the queried key does not exist or if the queried key contains subkeys i.e. it's a prefixBucketRegionError: if the bucket does not existForbidden: if Head operation is not allowed via IAM/ACLsCloses: #4326 (yes I know that's a PR, but the OG author doesn't respond and the PR is in a half-broken state)