march: fix deadlock in --no-traverse when all transfers skipped using sequence numbering#8648
march: fix deadlock in --no-traverse when all transfers skipped using sequence numbering#8648jeremy wants to merge 1 commit into
Conversation
When every source object already exists at the destination and --no-traverse is set, march deadlocked after 100 entries because dstChan lagged srcChan. Buffer all sources and destinations, then replay them through balanced channels. Introduced in v1.70.0 with callback-based syncing. * Replaces slice-accumulating sorter with O(checkers) streaming * Each incoming src entry gets a monotonic sequence number; a map buffers out-of-order destination HEAD results so dstChan is replayed in sequence order * Directories get a zero-cost placeholder to guarantee 1:1 src/dst event flow without blocking
ncw
left a comment
There was a problem hiding this comment.
This looks like a correct way of doing the fix, but it looks a lot more complicated than it needs to be.
I also have a small concern about the seq overflowing. It should be int64 as more than 4 billion files in a directory is not unheard of on s3!
Anyway I decided to see if I could come up with a simpler approach.
This is what I came up with - I would appreciate your feedback. The basic idea is to keep two channels, one for the workers and one for the output but instead of passing objects about, pass a chan to write/read the object. This then allows the workers to run concurrently and keep the output in order. I've explained that really badly - hopefully you'll find the code clearer!
v1.71.0-beta.8831.f41ac3342.fix-8656-march-deadlock on branch fix-8656-march-deadlock (uploaded in 15-30 mins)
|
Nice approach! Dig passing the "slot" channel down the pipeline so we get ordering for free, by construction. No notes. |
Thanks for the review @jeremy . Sorry about not merging your code - I started of noodling with the chan of chans idea to give you feedback on a simpler implementation but I couldn't get it to work and by the time I did get it to work I had implemented it completely! |
What is the purpose of this change?
When every source object already exists at the destination and --no-traverse is set, march deadlocked after 100 entries because dstChan lagged srcChan. Buffer all sources and destinations, then replay them through balanced channels. Introduced in v1.70.0 with callback-based syncing.
See #8647 for the revert-to-old-behavior fix and repro script. This PR attempts a fix that retains streaming behavior.
Using repro script from #8647:
Using rclone: ./rclone Version: rclone v1.71.0-DEV Creating 200 test files... Initial copy to destination... Testing --no-traverse with all files existing (10s timeout)... Transferred: 0 B / 0 B, -, 0 B/s, ETA - Checks: 200 / 200, 100%, Listed 200 Elapsed time: 0.0s ✅ PASS: Command completed successfullyWas the change discussed in an issue or in the forum before?
Checklist