Replaced glob with readdir-glob to be memory efficient#433
Merged
Conversation
…addir stream is paused when archiving is on-going
ctalkington
approved these changes
Jul 23, 2020
|
@Yqnn Is memory consumptions also fixed for archiver.append(file, {name: filename}) |
Contributor
Author
Nop, if you use |
|
@Yqnn Thanks. But is there any alternative to append(). I am passing buffer and filename to append() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
As described in the following issue: #422, node-glob is not designed to handle a huge amount of files: it requires a quantify of memory that is proportional to the number of matched files.
Why ? Because, it lists only the folders that appears in the pattern, and it has to memorise all the found files to ensure a same file is not emitted twice.
It makes sense to proceed like this when the pattern matches only a little proportion of the filesystem.
But as it's not a common use case when creating an archive, it would be more efficient to list all the files, and then check if they match the given pattern.
Advantage:
Drawback:
archiver.glob('*.txt',{cwd: '/tmp'})has to be used instead ofarchiver.glob('/tmp/*.txt')The current PR implements this approach by replacing
globwithreaddir-globthat is memory-efficient.It also pauses the glob stream when archiving is on-going, to keep the memory usage stable.
Maybe it would be better to offer this as a new option, or only to replace the
directory()function.Feel free to give feedback :)