-
Notifications
You must be signed in to change notification settings - Fork 4.5k
s3.sync: Do not stat files that don't match filters #2105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The way the task pipeline that each command executes filters files *after* collecting information about it. For local files it means that even when the user doesn't want to sync these files, they'll still be stat'ed, which is unnecessary. This change uses the filters received from the command line inside of the `FileGenerator.should_ignore_file` method. So the list of files to be filtered won't contain files supposed to be excluded.
|
I hope you guys like the idea, it also seems to fix #1117. |
|
@clarete I think the approach is reasonable. I think this is something we would pull in. The only thing that I would be wary of is to make sure the file comparisons in sync do not get out of order when comparing the two order lists. But I do not think that is an issue because the comparison of files happens after the usual filtering and you are just pushing the filtering to happen even earlier in the process. If you can get some unit tests up, we will take a more scrutinized look and make sure that none of the integration tests are not broken and proceed from there. |
|
Still interested, just didn't have time yet. Will post results as soon as my weird schedule allows. Sorry about that! |
|
@clarete Are you still planning on finishing this? |
|
This patch still applies to current In my case, I am periodically syncing a large deep file tree from a netapp NFS share; netapp provides access to snapshots in I have made a PR with a minor modification and an update to unit tests: |
This probably helps on issue #1138.
I have a HUGE
~/srcdirectory that contains projects that I clone from various version control systems. Which makes me want to avoid backing them up on my personal S3 bucket to save some coins. For that, I used the--exclude="src/*"parameter (I also make sure theawscommand is called from my$HOMEdirectory, since I learned that the filters start matching from the current directory -- more details can be found in the #1588 issue). Although it worked, it still scanned and stat'ed the entire directory.This patch changes the
FileGeneratorclass tofile_filterparameter to match the path received by.should_ignore_fileand figure out if the path received by the method is allowed by the filters.This patch can be considered a work in progress because I only tested my specific need. It still needs proper tests. I didn't really write any because I wanted to ask if the strategy I chose is enough. If that's the case, I'd spend time writing the proper tests. Otherwise, I'd love to hear a better idea. I may be able to rework it if it's not too time consuming.