Skip to content

chunker: implement Resumer interface #5154

@ivandeex

Description

@ivandeex

What is your current rclone version?

1.54.1

What problem are you are trying to solve?

This ticket requests the Resume feature in the chunker backend.

from #87 (comment) by @mcalman:

I'm interested in addressing the case when an upload is interrupted for a large file, and must be restarted. It would be nice if the user was able to resume uploading that file from where they left off [...]
I have been looking into using the chunker backend to support an upload resume feature. I noticed that when an upload is done with chunker and is quit during a file upload, the chunks that have already been uploaded are left on the remote, but then ignored. I have been working on modifying rclone chunker to check for these existing chunks, and if present, use them rather than re-upload those chunks.

Note that here we ask only for sequential resume, which is irrelevant to the multi-thread upload feature covered by requests #5041 (for chunker) and #4798 (general discussion).

How do you think rclone should be changed to solve that?

from the 1st #4547 (comment):

We can't just chain to the lower backend in general case. If a file is chunked, its remote will chain to a small metadata (or nothing if metadata is disabled). If it's not chunked, it can become chunked after resume, but we can't predict it [in a general case].

from the 2nd #4547 (comment):

Chunker can tolerate objects uploaded from multiple clients thanks to transactions [and save partially uploaded chunks per transaction].
Later, upon a resume request it can select the "best" incomplete transaction given the rolling hash state and size of already uploaded chunks.

from the 3rd #4547 (comment):

Golang's Hash interface allows to save/restore intermediate hash state for any (TBC) type of hash.
[The common Resume handler will] keep it in the resume metadata json together with hash name,
[and will] negotiate with [chunker] whether operation should be continued from the last point or [retry] from the start

from the 4th #4547 (comment):

The use of intermediate (aka rolling or accrued) hashsums will prevent the following scenario:

  • user uploads a large file
  • network broken, upload canceled
  • source file is changed or another attempt is changing the partial upload on target
  • user asks to resume a file
  • rclone resumes (here we could have checked validity of partial upload and rewind from start)
  • after some hours rclone finds that fingerprint is wrong

from the 5th #4547 (comment):

[Let's] add a new per-transaction control chunk to save info about partial hash and [probably] hashes of uploaded chunks.

[Let's also] add a code that selects transaction to resume given a partial hash and the total uploaded size so far. Maybe select the "best" partial transaction (when rename is fast) or just pick a single partial transaction ID (when it's slow).

The implementation will obey the Resumer interface developed by PR #4547.

In case of chunker the resumer cache usage can be somewhat decreased because already uploaded chunks are isolated remotely and marked by a "transaction ID". The resumer proper will just re-check them based on negotiations with chunker.

NOTE This change will create a new version of the chunker metadata and grow the number of tested combinations. I think we can commit this together with other chunker PRs on a dedicated branch which will produce a beta release for public beta-testing. Later we can merge these commits together from there on the master branch using a single metadata version number.

References

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions