Skip to content

Memory issue with large amount of files #422

@Yqnn

Description

@Yqnn

node.js crashes when using node-archiver on directories which large amount of files:

  • >200000 files : crashes during the archiving
  • >1000000 files : crashes before the archiving starts

The error message is something like:
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

I started to investigate why node-archiver has so much difficulties to handle directories with huge number of files, and I found out 2 issues.

Issue 1
npm-glob that is used for both glob() and directory(), seems highly inefficient.
The following code, that does nothing, and should use a fixed amount of memory, actually needs a lot of memory.

        var globber = glob('**', {stat: false, dot: true, cwd: '/folder/'});
        globber.on('match', () => {});
        globber.on('end', () => {
            console.log('End');
        });

This simple code required 1.8 GB of memory when /folder/ contained 1000000 files.

Issue 2
In core.js, tasks are rapidly added to this._queue (or this. _statQueue), as soon as files are found, making these queues growing in proportion of the number of files.
When there is more than >100000 files, this starts to become an issue.

Possible solution
For the first issue, using something better than npm-glob should do the job.
And for the second one: new tasks should be added to a queue only when this queue is small enough.

As a proof of concept, I've re-implemented the directory() function in the following commit: Yqnn@b90f4d1
Now it requires a fixed amount of memory, no matter the number of files.
It appears also to be much faster than the original version.

I did a test with a 300000 files, 10 GB folder, and the following code:

var archive = archiver('tar', {gzip:false});
archive.directoryImproved(directory ,false);
archive.finalize();
archive.pipe(lz4.createEncoderStream()).pipe(fs.createWriteStream('output.tar.lz4'));
  • With current code: 1.3 GB, 12 minutes
  • With patched one: 154 MB, 3 minutes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions