Skip to content

OOM upgrading from 9.4.1 to 9.5.0 #37753

@trask

Description

@trask

Current Behavior

OOM stack (daemon log)

Exception in thread "JNA Cleaner" java.lang.OutOfMemoryError: Java heap space
Unexpected exception thrown.
org.gradle.internal.remote.internal.MessageIOException: Could not read message from '/127.0.0.1:42332'.
    at org.gradle.internal.remote.internal.inet.SocketConnection.receive(SocketConnection.java:96)
    at org.gradle.internal.remote.internal.hub.MessageHub$ConnectionReceive.run(MessageHub.java:270)
    ...
Caused by: java.lang.OutOfMemoryError: Java heap space

Problem in daemon expiration check
java.lang.OutOfMemoryError: Java heap space
    at java.base/java.lang.StringUTF16.compress(StringUTF16.java:211)
    at java.base/java.lang.String.<init>(String.java:303)
    at java.base/java.io.DataInputStream.readUTF(DataInputStream.java:550)
    at o.g.internal.serialize.InputStreamBackedDecoder.readString(InputStreamBackedDecoder.java:78)
    at o.g.launcher.daemon.registry.DaemonStopEvent$Serializer.read(DaemonStopEvent.java:116)
    at o.g.launcher.daemon.registry.DaemonRegistryContent$Serializer.readStopEvents(DaemonRegistryContent.java:130)
    at o.g.launcher.daemon.server.DaemonRegistryUnavailableExpirationStrategy.checkExpiration(DaemonRegistryUnavailableExpirationStrategy.java:63)
    at o.g.launcher.daemon.server.MasterExpirationStrategy.checkExpiration(MasterExpirationStrategy.java:80)
    at o.g.launcher.daemon.server.Daemon$DaemonExpirationPeriodicCheck.run(Daemon.java:273)

The OOM is in the daemon JVM itself — not a test worker fork (tests are skipped via -PskipTests=true).

Heap dump analysis (Eclipse MAT 1.16.1)

HeapDumpOnOutOfMemoryError produced a 5,452,521,642-byte dump (objects: 82,518,156). Total live retained ≈ 3.16 GB.

Suspect 1 — 677 MB (21.4%): build-operation queue not draining

Class java.util.concurrent.ThreadPoolExecutor (Gradle's DefaultExecutorFactory$TrackedManagedExecutor)
Single-instance retained heap 677,039,984 bytes
Total ThreadPoolExecutor retained (top-level dominator) 677 MB across 64 instances
Accumulation point LinkedBlockingQueue$Node chain (677 MB)
Owner thread Build operations Thread 4

Owner thread stack (parked acquiring a worker lock while the queue grows):

java.lang.Object.wait(Object.java:339)
o.g.internal.resources.DefaultResourceLockCoordinationService.withStateLock(DefaultResourceLockCoordinationService.java:109)
o.g.internal.work.DefaultWorkerLeaseService.acquireLocks(DefaultWorkerLeaseService.java:297)
o.g.internal.work.DefaultWorkerLeaseService.releaseWorkerLeaseAndWaitFor(DefaultWorkerLeaseService.java:476)
o.g.internal.work.DefaultWorkerLeaseService.acquireLocksWithoutWorkerLeaseWhileBlocked(DefaultWorkerLeaseService.java:465)
o.g.internal.work.DefaultWorkerLeaseService.withLocksAcquired(DefaultWorkerLeaseService.java:277)
o.g.internal.work.DefaultWorkerLeaseService.withLocks(DefaultWorkerLeaseService.java:270)
o.g.internal.work.DefaultWorkerLeaseService.runAsWorkerThread(DefaultWorkerLeaseService.java:129)
o.g.internal.operations.DefaultBuildOperationQueue$WorkerRunnable.runBatch(DefaultBuildOperationQueue.java:285)
o.g.internal.operations.DefaultBuildOperationQueue$WorkerRunnable.lambda$runOperations$0(DefaultBuildOperationQueue.java:246)
o.g.internal.operations.CurrentBuildOperationRef.with(CurrentBuildOperationRef.java:84)
o.g.internal.operations.DefaultBuildOperationQueue$WorkerRunnable.runOperations(DefaultBuildOperationQueue.java:243)
o.g.internal.operations.DefaultBuildOperationQueue$WorkerRunnable.run(DefaultBuildOperationQueue.java:236)
o.g.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64)
o.g.internal.concurrent.AbstractManagedExecutor$1.run(AbstractManagedExecutor.java:47)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)

Suspect 2 — 488 MB (15.4%): Guava caches holding incremental-compile data

  • 34 com.google.common.cache.LocalCache$LocalManualCache instances retain 487,695,152 bytes.
  • Two single caches dominate: 281 MB and 157 MB retained.
  • 161 CompileJavaBuildOperationReportingCompiler instances retain 681 MB — roughly one per compileJava task in the build, and none released after the task completed.
  • Reachable via o.g.api.internal.tasks.compile.incremental.recomp.CurrentCompilationAccess and IncrementalResultStoringCompiler.

Per-class retention inside the caches (subset):

Class Objects Retained (≥)
o.g.api.internal.tasks.compile.incremental.deps.ClassSetAnalysisData 4,444 840 MB
o.g.api.internal.tasks.compile.incremental.deps.ClassAnalysis 1,006,487 208 MB
o.g.api.internal.tasks.compile.incremental.compilerapi.deps.DependentsSet$DefaultDependentsSet 2,267,823 437 MB
com.google.common.cache.LocalCache$StrongAccessEntry 1,462,462 613 MB
com.google.common.collect.RegularImmutableSet 2,780,730 584 MB
com.google.common.collect.ImmutableMapEntry 3,716,341 378 MB
o.g.internal.hash.HashCode$HashCode128 1,310,441 42 MB

Suspect 3 — 685 MB: BuildOperationState / BuildOperationDescriptor

  • 177 BuildOperationState retain 685,215,552 bytes (also 685 MB via 177 BuildOperationDescriptor).
  • Each retains a CompileJavaBuildOperationReportingCompiler$1$1 (161 instances retain 685 MB cumulatively).
  • The build-operation tracking infrastructure is keeping every :compileJava build-op's full compilation context alive.

Other notable retainers

Object Retained
o.g.api.tasks.util.internal.CachingPatternSpecFactory$CachingSpec (single instance) 52 MB
o.g.api.tasks.util.internal.CachingPatternSpecFactory (single instance) 41 MB
o.g.api.internal.cache.StringInterner (single instance) 37 MB
4,110,081 String instances 370 MB
com.gradle.scan.plugin.internal.d.j.a.d (single instance, Develocity plugin internal) 35 MB

ClassLoader-level dominators

ClassLoader Retained % of heap
o.g.internal.classloader.VisitableURLClassLoader @ 0x740699698 (Gradle core) 1,279,746,256 40.48%
<system class loader> 932,314,248 29.49%
o.g.initialization.MixInLegacyTypesClassLoader 747,514,976 23.65%
VisitableURLClassLoader$InstrumentingVisitableURLClassLoader (×2) 168,405,360 5.33%

Diagnosis

The dump points at two coupled regressions in Gradle 9.5.0:

  1. compileJava build-operation graph is never released.
    161 CompileJavaBuildOperationReportingCompiler / IncrementalResultStoringCompiler / CurrentCompilationAccess instances — one per task — remain reachable after their tasks finish, each pinning a Guava LocalCache of ClassSetAnalysisData / ClassAnalysis / DependentsSet. Cumulative impact: ~840 MB of ClassSetAnalysisData alone, and 685 MB of BuildOperationState chained to them.
  2. DefaultBuildOperationQueue / TrackedManagedExecutor not draining.
    "Build operations Thread 4" is parked in DefaultWorkerLeaseService.acquireLocks while the executor's LinkedBlockingQueue keeps accumulating runnables (677 MB of LinkedBlockingQueue$Nodes). Combined with (1), every queued op pins another BuildOperationDescriptor chain.

Diagnostics artifact

The CI run for commit e484ca2d uploaded a gradle-daemon-diagnostics artifact (≈1.3 GB compressed, 5.4 GB heap dump) containing:

  • daemon.hprof — heap dump (5.4 GB)
  • gc.log — verbose G1 GC log (rotated)
  • daemon-logs/9.5.0/daemon-2478.out.log — Gradle daemon stdout
  • system-info.txtdf -h / free -m / uname -a

Available for 7 days at https://github.com/open-telemetry/opentelemetry-java-instrumentation/actions/runs/25136630109.

Expected Behavior

(no OOM 😅)

Context (optional)

No response

Self-contained Reproducer Project

See open-telemetry/opentelemetry-java-instrumentation#18372

Gradle version

9.5.0

Gradle version that used to work

9.4.1

Build scan URL (optional)

https://gradle.com/s/pkxhvlxppwgv2

Your Environment (optional)

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions