Replaced glob with readdir-glob to be memory efficient by Yqnn · Pull Request #433 · archiverjs/node-archiver

Yqnn · 2020-07-22T16:20:11Z

As described in the following issue: #422, node-glob is not designed to handle a huge amount of files: it requires a quantify of memory that is proportional to the number of matched files.

Why ? Because, it lists only the folders that appears in the pattern, and it has to memorise all the found files to ensure a same file is not emitted twice.
It makes sense to proceed like this when the pattern matches only a little proportion of the filesystem.
But as it's not a common use case when creating an archive, it would be more efficient to list all the files, and then check if they match the given pattern.

Advantage:

memory consumptions is fixed, no matter the number of matched files
it's faster when the proportion of matching files is high

Drawback:

absolute glob patterns are not supported: archiver.glob('*.txt',{cwd: '/tmp'}) has to be used instead of archiver.glob('/tmp/*.txt')
it's slower when few files matches in a big filesystem

The current PR implements this approach by replacing glob with readdir-glob that is memory-efficient.
It also pauses the glob stream when archiving is on-going, to keep the memory usage stable.

Maybe it would be better to offer this as a new option, or only to replace the directory() function.
Feel free to give feedback :)

…addir stream is paused when archiving is on-going

melitus · 2020-07-23T08:17:09Z

@Yqnn Is memory consumptions also fixed for archiver.append(file, {name: filename})

Yqnn · 2020-07-23T08:51:23Z

@Yqnn Is memory consumptions also fixed for archiver.append(file, {name: filename})

Nop, if you use append() the library has to remember all the appended files, so it's not possible to make it memory efficient in that case.
You have to put in place a throttling mechanism on your side. I guess you could check if archiver._queue.length is small enough before appending new files.

melitus · 2020-07-23T08:53:36Z

@Yqnn Thanks. But is there any alternative to append(). I am passing buffer and filename to append()

Yqnn force-pushed the readdirp branch from af806a1 to 6b2a90d Compare July 22, 2020 16:32

Replaced glob with readdir-glob to be memory efficient, and ensure re…

b31b705

…addir stream is paused when archiving is on-going

Yqnn force-pushed the readdirp branch from 6b2a90d to b31b705 Compare July 22, 2020 16:35

ctalkington approved these changes Jul 23, 2020

View reviewed changes

ctalkington merged commit a4c4507 into archiverjs:master Jul 23, 2020

hyoo mentioned this pull request Sep 22, 2020

.glob() with options.cwd is not working at 5.x #455

Closed

j0k3r mentioned this pull request Mar 18, 2021

Update deps to latest version nfriedly/node-bestzip#43

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replaced glob with readdir-glob to be memory efficient#433

Replaced glob with readdir-glob to be memory efficient#433
ctalkington merged 1 commit into
archiverjs:masterfrom
Yqnn:readdirp

Yqnn commented Jul 22, 2020

Uh oh!

melitus commented Jul 23, 2020

Uh oh!

Yqnn commented Jul 23, 2020

Uh oh!

melitus commented Jul 23, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Yqnn commented Jul 22, 2020

Uh oh!

melitus commented Jul 23, 2020

Uh oh!

Yqnn commented Jul 23, 2020

Uh oh!

melitus commented Jul 23, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants