Feat/sync retry download#23
Conversation
| let mut reader = reader::ResumableReader::new(reader, current); | ||
| std::io::copy(&mut reader, file)?; |
There was a problem hiding this comment.
Given this is only linearly increasing, couldn't we just Seek into the file ?
Making ResumableReader not necessary (less code is almost always better) ?
There was a problem hiding this comment.
Yes, you are right. file is already at the right position, but we could SeekFrom::End(0) to get the length and use that as current. I will implement the change later today
There was a problem hiding this comment.
You don't need to SeekFrom::End I think the response Content-Length should already give that information.
And even then you don't need it either since you would be using increasing Seek.
There was a problem hiding this comment.
Ohh I just realized what you meant. Seek::End would be equal to the partial-length, I see, yes that works too (actually wouldn't the cursor already be there naturally given io::copy 's nature?
There was a problem hiding this comment.
The cursor is already there, but we need to get its byte position to plug it into the RANGE header. We could also use SeekFrom::Current(0) (or the equivalent .stream_position()). SeekFrom::End(0) would have the advantage, that we could also use it with .incomplete files directly once we implement that, however, that would also be easily achieved by just doing it once in the beginning.
SeekFrom::Current(0) might have slightly better performance, I would have to check how it is implemented, so we might prefer it.
Or am I missing another method of retrieving the necessary byte offset?
There was a problem hiding this comment.
Sorry for the delay. Maybe file.metadata() then ? I don't know which is the simplest.
Also is there a way to fuse the retry loop with the outer "regular" logic?
There was a problem hiding this comment.
Please also excuse the late response. file.metadata() would require us to first call file.sync_all() if I am not mistaken, so I do not think that it is desirable. I have used stream_position for now (see next commit).
I'll look into combining the logic this evening.
without simulating a failure
and remove ResumableReader
6b7d7e9 to
733600d
Compare
Narsil
left a comment
There was a problem hiding this comment.
After waaaayy too long, sorry about that, LGTM.
|
Hey, thanks for coming back to it. |
|
No, totally my bad, better late than never in any case. Thanks a lot |
What this PR does
This PR is related to #18. It does not support resuming downloads. Rather it adds retry capabilities to the sync API. It adds the max_retries field to
ApiBuilderandApithat is then used indownload.We stream the download to a tempfile, and, if we encounter an error retry at most
max_retriestimes, using range requests to only request the data we do not have already.We use a new struct
ResumableReaderto keep track of the number of bytes we downloaded successfully, it is basically just a reduced version ofProgressbar.We assume all bytes that we did receive to be correct.
Left ToDo
ProgressbarwithResumableReader(easy)Testing
Testing this functionality properly is a bit harder, since we need to ensure the failure of the request. Further, it would be nice to test that we indeed do not continue from the start of the file.
There are different ways we could go about achieving this, here are some I evaluated:
Using a mock server would be simplest, however many existing frameworks, like
wiremockorhttpmockdo not support us returning an ok, in the range case 206, request but then streaming and failing the body if I understood that correctly. So I think we would be forced to implement a proper serve on our own.Second would be a bit annoying, since we not only use the client, but then also the response (although we just turn that into a Reader), so it would be doable. We of course would not test if the Reader we get from the request truly behaves like we expect it (and like our mock does), but I think this is acceptable.
Third would be much like second although reducing what we need to mock, but also changing some things about the architecture which may or may not be acceptable.
So, this is the main area I am looking to for some feedback, since I think whatever choice is taken, it will have an effect on the rest of the project.
Additionally if we also want to test that it does indeed not just always resume from the start we would also potentially get some issues with std::io::copy since we do not know its buffer size. But that should be rather easy to resolve, so I will look into it once a basic testing strategy is outlined.