Skip to content

Avoid stat round-trips when fetching a blob#1226

Merged
stevvooe merged 1 commit intodistribution:masterfrom
aaronlehmann:avoid-stat-roundtrips
Dec 3, 2015
Merged

Avoid stat round-trips when fetching a blob#1226
stevvooe merged 1 commit intodistribution:masterfrom
aaronlehmann:avoid-stat-roundtrips

Conversation

@aaronlehmann
Copy link
Copy Markdown

Without this commit, three round-trips are required to fetch a blob with
a progress bar. The first is a call to Stat (HEAD request), to determine
the size. Then Open is called, which also calls Stat, and finally
performs a GET request.

Only the GET request is actually needed. The size of the blob can be
sniffed from Content-Length in the GET response.

This commit changes HTTPReadSeeker to automatically detect the size from
Content-Length instead of requiring it to be passed in. The Stat call is
removed from Open because it is no longer necessary.

HTTPReadSeeker now takes an additional errorHandler callback argument which
translates an unsuccessful HTTP response into an appropriate API-level
error. Using a callback for this makes it possible to avoid leaking the
repsonse body to Read's caller, which would make lifecycle management
problematic.

Fixes #1223

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like during the close as a condition of calling Error. Is there any reason we can't tie the body lifecycle to the lifecycle of the readseeker?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like it much either.

The problem is that there may be many HTTP requests created by each HTTPReadSeeker. Each call to Read can potentially cause a new HTTP request, whose response body needs to be closed. That's why I was associating them with the error object.

How about keeping a slice in the httpReadSeeker struct that tracks every response body that was associated with an error condition, and closing all of those when the Close method is called?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronlehmann This is how it was before. Actually, there was only one request open at any time. If it was still open on the Close, it would have the body closed, as well.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the question here is whether it's reasonable to close the body on the next Read call. For example, should we allow code that does this:

for ... {
        _, err := readSeekCloser.Read(p)
        if err != nil {
                if httpError, ok := err.(*transport.HTTPError); ok {
                        go doSomethingAsyncWithTheBody(httpError.Resp.Body)
                }
        }
}

This would be unsafe in the model where we only allow one open response body a time. But it seems like a weird case, and thinking about it some more, breaking this kind of code is probably better than potentially keeping many sockets open until the HTTPReadSeekCloser is closed, potentially hitting resource limits in the mean time.

So I'll move the body closing from Error to Close/reader and keep at most one response body open at a time, if you think that's reasonable.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to close the Body of a response to an unsuccessful request on the next call to Read or Seek (before issuing a new request), or Close.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated again to avoid exposing the response through an error return. Instead, the HTTPReadSeeker takes a callback function that generates errors.

@aaronlehmann aaronlehmann force-pushed the avoid-stat-roundtrips branch 2 times, most recently from cc03f58 to 8a12c63 Compare December 2, 2015 21:45
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this statement be in (*httpReadSeeker).Seek? I think we can avoid having both seekOffset and offset.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a comment on the importance of this statement.

Without this commit, three round-trips are required to fetch a blob with
a progress bar. The first is a call to Stat (HEAD request), to determine
the size. Then Open is called, which also calls Stat, and finally
performs a GET request.

Only the GET request is actually needed. The size of the blob can be
sniffed from Content-Length in the GET response.

This commit changes HTTPReadSeeker to automatically detect the size from
Content-Length instead of requiring it to be passed in. The Stat call is
removed from Open because it is no longer necessary.

HTTPReadSeeker now takes an additional errorHandler callback argument which
translates an unsuccessful HTTP response into an appropriate API-level
error. Using a callback for this makes it possible to avoid leaking the
repsonse body to Read's caller, which would make lifecycle management
problematic.

Fixes distribution#1223

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
@aaronlehmann aaronlehmann force-pushed the avoid-stat-roundtrips branch from 8a12c63 to 6beeb93 Compare December 2, 2015 22:21
@stevvooe
Copy link
Copy Markdown
Collaborator

stevvooe commented Dec 2, 2015

LGTM

1 similar comment
@dmcgowan
Copy link
Copy Markdown
Collaborator

dmcgowan commented Dec 2, 2015

LGTM

stevvooe added a commit that referenced this pull request Dec 3, 2015
Avoid stat round-trips when fetching a blob
@stevvooe stevvooe merged commit 2b63f65 into distribution:master Dec 3, 2015
aaronlehmann pushed a commit to aaronlehmann/docker that referenced this pull request Dec 4, 2015
We were calling Stat for each layer to get the size so we could indicate
progress, but distribution/distribution#1226 made it
possible to get the length from the GET request that Open initiates.

Saving one round-trip per layer should make pull operations slightly
faster and more robust.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
aaronlehmann pushed a commit to aaronlehmann/distribution that referenced this pull request Dec 11, 2015
This calls Stat before Open, which should be unnecessary because Open
can handle the case of a nonexistent blob. Removing the Stat saves a
round trip.

This is similar to the removal of stat in Open in distribution#1226.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
thaJeztah pushed a commit to thaJeztah/distribution that referenced this pull request Apr 22, 2021
This calls Stat before Open, which should be unnecessary because Open
can handle the case of a nonexistent blob. Removing the Stat saves a
round trip.

This is similar to the removal of stat in Open in distribution#1226.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
thaJeztah pushed a commit to thaJeztah/distribution that referenced this pull request Jan 19, 2022
This calls Stat before Open, which should be unnecessary because Open
can handle the case of a nonexistent blob. Removing the Stat saves a
round trip.

This is similar to the removal of stat in Open in distribution#1226.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
dylanrhysscott pushed a commit to digitalocean/docker-distribution that referenced this pull request Jan 5, 2023
This calls Stat before Open, which should be unnecessary because Open
can handle the case of a nonexistent blob. Removing the Stat saves a
round trip.

This is similar to the removal of stat in Open in distribution#1226.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants