Skip to content

OutOfMemoryError spins library out of control #400

@TWiStErRob

Description

@TWiStErRob

Issue summary

nonapi.io.github.classgraph.fastzipfilereader.NestedJarHandler#close may never complete if Thread was interrupted, which is always if scanning failed. See below for details.

Background

I was writing some unit tests for a really basic Neo4J graph, but while Neo4J-OGM was looking for my Entities during test setup, it just never completed. With some very roundabout debugging (all the async and OOM interplay was making me tear my hair out), I managed to narrow down the root cause to how classgraph works. I have 32G memory, and Java defaults to using 1/4th of memory per process, which means I cannot run much in parallel, so I have a global 256M override for each process. It works fine for Gradle, Neo4J, and other traditionally resource-hungry setups. It even works for my test setup most of the time, so for this report repro I lowered the memory limit to be consistent.

Repro Setup

trivial library usage:

	// implementation "io.github.classgraph:classgraph:4.8.62"
	ScanResult result = new ClassGraph()
			.whitelistPackages("pack.age")
			.verbose()
			.scan()
			;

plus a dependency with a nested jar in it:

runtimeOnly "org.neo4j:neo4j-lucene-upgrade:3.4.9"

non-trivial: resource constrained memory: -Xmx128M (see build.gradle)

Full minimal repro can be found here: https://github.com/TWiStErRob/repros/tree/master/classgraph/oom-NestedJarHandler-spin

Repro steps

  1. gradlew run

or

  1. Import project to IntellIJ
  2. Execute Main.main (should pass, depending on machine setup)
  3. Edit Run Configuration to have VM options: -Xmx128M
  4. Run or debug again (will fail, expected: print a line)

gradlew run.log

Pieces of the puzzle

What I found to be contributing to the issue.

Nested Jar files

runtimeOnly "org.neo4j:neo4j-lucene-upgrade:3.4.9"

an example from Neo4J/OGM, which contains !/lib/lucene-backward-codecs-5.5.0.jar
This is required to trigger

                    } else {
                        // This path has one or more '!' sections.

code path in NestedJarHandler.

Large allocation to trigger OOM

This is baked into the library. MappedByteBufferResources allocates 64M per JAR file, which is quite large, considering there could be many jars like this on the classpath, there could a lot of allocations happening at the same time and consuming tons of memory. Note: I have 14 core/28 logical core processor, and the library runs on 39 threads. So if all those are processing nested JARs, 2.5G of memory is required. That doesn't sound "ultra-lightweight" :)

This allocation throws an OOM, which is caught by AutoCloseableExecutorService.afterExecute and the thread is interrupted. I think this interrupt doesn't hurt NestedJarHandler.close, so it could be an irrelevant red herring to the issue. But this location came up during the investigation so I thought I would mention it.

Scanner.call catches and cleans up

When an exception occurred (e.g. our OOM), removeTemporaryFilesAfterScan is set to true and threads are interrupted. finally then comes and tries to close nestedJarHandler. The relevant parts of the method:

    @Override
    public ScanResult call() throws InterruptedException, CancellationException, ExecutionException {
        try {
            scanResult = openClasspathElementsThenScan();
        } catch (final Throwable e) {
            // Since an exception was thrown, remove temporary files
            removeTemporaryFilesAfterScan = true;

            // Stop any running threads (should not be needed, threads should already be quiescent)
            interruptionChecker.interrupt();
        } finally {
            if (removeTemporaryFilesAfterScan) {
                // If removeTemporaryFilesAfterScan was set, remove temp files and close resources,
                // zipfiles and modules
                nestedJarHandler.close(topLevelLog);
            }
        }
        return scanResult;
    }

NestedJarHandler#close spins

See numbered inline comments for explanation.

            if (canonicalFileToPhysicalZipFileMap != null) {
                // (4) spins out of control, because (3)
                while (!canonicalFileToPhysicalZipFileMap.isEmpty()) {
                    try {
                        // (1) throws InterruptedException if thread was interrupted, which is always because Scanner error handling interrupted
                        for (final Entry<File, PhysicalZipFile> ent : canonicalFileToPhysicalZipFileMap.entries()) {
                            final PhysicalZipFile physicalZipFile = ent.getValue();
                            physicalZipFile.close();
                            // (3) never executed, because of ConcurrentHashMap's interruption in SingletonMap
                            canonicalFileToPhysicalZipFileMap.remove(ent.getKey());
                        }
                    } catch (final InterruptedException e) {
                        // (2) re-interrupts Thread
                        interruptionChecker.interrupt();
                    }
                }
                canonicalFileToPhysicalZipFileMap = null;
            }

Proposed solutions

  1. Quick treatment with library gracefully executing to successful completion:
    Catch OOM explicitly when allocating buf in MappedByteBufferResources and spill to disk if that happened. This would be in line with how IOException from makeTempFile is caught.
  2. Lower memory usage (could be combined with previous):
    fastZipEntryToZipFileSliceMap knows the size of the nested Jar file (childZipEntry), if that was passed down the decision can be immediately made to spill to disk before allocating and without reading the whole stream.
  3. Don't spin
    In any case, NestedJarHandler.close should probably be able to close canonicalFileToPhysicalZipFileMap even in an interrupted state, so that fail classpath scans can receive the exceptional termination and handle accordingly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions