Avoid stat round-trips when fetching a blob#1226
Avoid stat round-trips when fetching a blob#1226stevvooe merged 1 commit intodistribution:masterfrom
Conversation
There was a problem hiding this comment.
I really don't like during the close as a condition of calling Error. Is there any reason we can't tie the body lifecycle to the lifecycle of the readseeker?
There was a problem hiding this comment.
I don't like it much either.
The problem is that there may be many HTTP requests created by each HTTPReadSeeker. Each call to Read can potentially cause a new HTTP request, whose response body needs to be closed. That's why I was associating them with the error object.
How about keeping a slice in the httpReadSeeker struct that tracks every response body that was associated with an error condition, and closing all of those when the Close method is called?
There was a problem hiding this comment.
@aaronlehmann This is how it was before. Actually, there was only one request open at any time. If it was still open on the Close, it would have the body closed, as well.
There was a problem hiding this comment.
I guess the question here is whether it's reasonable to close the body on the next Read call. For example, should we allow code that does this:
for ... {
_, err := readSeekCloser.Read(p)
if err != nil {
if httpError, ok := err.(*transport.HTTPError); ok {
go doSomethingAsyncWithTheBody(httpError.Resp.Body)
}
}
}This would be unsafe in the model where we only allow one open response body a time. But it seems like a weird case, and thinking about it some more, breaking this kind of code is probably better than potentially keeping many sockets open until the HTTPReadSeekCloser is closed, potentially hitting resource limits in the mean time.
So I'll move the body closing from Error to Close/reader and keep at most one response body open at a time, if you think that's reasonable.
There was a problem hiding this comment.
Updated to close the Body of a response to an unsuccessful request on the next call to Read or Seek (before issuing a new request), or Close.
There was a problem hiding this comment.
Updated again to avoid exposing the response through an error return. Instead, the HTTPReadSeeker takes a callback function that generates errors.
cc03f58 to
8a12c63
Compare
There was a problem hiding this comment.
Can this statement be in (*httpReadSeeker).Seek? I think we can avoid having both seekOffset and offset.
There was a problem hiding this comment.
Let's add a comment on the importance of this statement.
Without this commit, three round-trips are required to fetch a blob with a progress bar. The first is a call to Stat (HEAD request), to determine the size. Then Open is called, which also calls Stat, and finally performs a GET request. Only the GET request is actually needed. The size of the blob can be sniffed from Content-Length in the GET response. This commit changes HTTPReadSeeker to automatically detect the size from Content-Length instead of requiring it to be passed in. The Stat call is removed from Open because it is no longer necessary. HTTPReadSeeker now takes an additional errorHandler callback argument which translates an unsuccessful HTTP response into an appropriate API-level error. Using a callback for this makes it possible to avoid leaking the repsonse body to Read's caller, which would make lifecycle management problematic. Fixes distribution#1223 Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
8a12c63 to
6beeb93
Compare
|
LGTM |
1 similar comment
|
LGTM |
Avoid stat round-trips when fetching a blob
We were calling Stat for each layer to get the size so we could indicate progress, but distribution/distribution#1226 made it possible to get the length from the GET request that Open initiates. Saving one round-trip per layer should make pull operations slightly faster and more robust. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
This calls Stat before Open, which should be unnecessary because Open can handle the case of a nonexistent blob. Removing the Stat saves a round trip. This is similar to the removal of stat in Open in distribution#1226. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
This calls Stat before Open, which should be unnecessary because Open can handle the case of a nonexistent blob. Removing the Stat saves a round trip. This is similar to the removal of stat in Open in distribution#1226. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
This calls Stat before Open, which should be unnecessary because Open can handle the case of a nonexistent blob. Removing the Stat saves a round trip. This is similar to the removal of stat in Open in distribution#1226. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
This calls Stat before Open, which should be unnecessary because Open can handle the case of a nonexistent blob. Removing the Stat saves a round trip. This is similar to the removal of stat in Open in distribution#1226. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Without this commit, three round-trips are required to fetch a blob with
a progress bar. The first is a call to Stat (HEAD request), to determine
the size. Then Open is called, which also calls Stat, and finally
performs a GET request.
Only the GET request is actually needed. The size of the blob can be
sniffed from Content-Length in the GET response.
This commit changes HTTPReadSeeker to automatically detect the size from
Content-Length instead of requiring it to be passed in. The Stat call is
removed from Open because it is no longer necessary.
HTTPReadSeeker now takes an additional errorHandler callback argument which
translates an unsuccessful HTTP response into an appropriate API-level
error. Using a callback for this makes it possible to avoid leaking the
repsonse body to Read's caller, which would make lifecycle management
problematic.
Fixes #1223