-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Description
What is your current rclone version?
1.54.1
What problem are you are trying to solve?
This ticket requests the Resume feature in the chunker backend.
from #87 (comment) by @mcalman:
I'm interested in addressing the case when an upload is interrupted for a large file, and must be restarted. It would be nice if the user was able to resume uploading that file from where they left off [...]
I have been looking into using the chunker backend to support an upload resume feature. I noticed that when an upload is done with chunker and is quit during a file upload, the chunks that have already been uploaded are left on the remote, but then ignored. I have been working on modifying rclone chunker to check for these existing chunks, and if present, use them rather than re-upload those chunks.
Note that here we ask only for sequential resume, which is irrelevant to the multi-thread upload feature covered by requests #5041 (for chunker) and #4798 (general discussion).
How do you think rclone should be changed to solve that?
from the 1st #4547 (comment):
We can't just chain to the lower backend in general case. If a file is chunked, its remote will chain to a small metadata (or nothing if metadata is disabled). If it's not chunked, it can become chunked after resume, but we can't predict it [in a general case].
from the 2nd #4547 (comment):
Chunker can tolerate objects uploaded from multiple clients thanks to transactions [and save partially uploaded chunks per transaction].
Later, upon a resume request it can select the "best" incomplete transaction given the rolling hash state and size of already uploaded chunks.
from the 3rd #4547 (comment):
Golang's Hash interface allows to save/restore intermediate hash state for any (TBC) type of hash.
[The common Resume handler will] keep it in the resume metadata json together with hash name,
[and will] negotiate with [chunker] whether operation should be continued from the last point or [retry] from the start
from the 4th #4547 (comment):
The use of intermediate (aka rolling or accrued) hashsums will prevent the following scenario:
- user uploads a large file
- network broken, upload canceled
- source file is changed or another attempt is changing the partial upload on target
- user asks to resume a file
- rclone resumes (here we could have checked validity of partial upload and rewind from start)
- after some hours rclone finds that fingerprint is wrong
from the 5th #4547 (comment):
[Let's] add a new per-transaction control chunk to save info about partial hash and [probably] hashes of uploaded chunks.
[Let's also] add a code that selects transaction to resume given a partial hash and the total uploaded size so far. Maybe select the "best" partial transaction (when rename is fast) or just pick a single partial transaction ID (when it's slow).
The implementation will obey the Resumer interface developed by PR #4547.
In case of chunker the resumer cache usage can be somewhat decreased because already uploaded chunks are isolated remotely and marked by a "transaction ID". The resumer proper will just re-check them based on negotiations with chunker.
NOTE This change will create a new version of the chunker metadata and grow the number of tested combinations. I think we can commit this together with other chunker PRs on a dedicated branch which will produce a beta release for public beta-testing. Later we can merge these commits together from there on the master branch using a single metadata version number.
References
- Related to feature request Resume uploads #87 (Resume uploads)
- Depends on pull request fs: add Resumer interface #4547 (add Resumer interface)
- Orthogonal to feature request chunker: add support for multi-thread uploads #5041 (multi-thread uploads in chunker)
- Orthogonal to discussion Multi-thread upload for different backends #4798 (multi-thread uploads for different backends)
- Related to thread https://forum.rclone.org/t/intelligent-faster-chunker-file-updates-on-checksum-enabled-remotes/22313/7