Fix runtime and classloader retention after ScriptingContainer terminate#9315
Fix runtime and classloader retention after ScriptingContainer terminate#9315jimtng wants to merge 10 commits intojruby:jruby-10.0from
Conversation
Clear lingering ThreadContext soft references during teardown, drop the embed LocalContext runtime reference, and avoid re-adopting the current thread while Ruby is shutting down. Also switch Java integration class caches to weak values and avoid recreating JavaSupport during teardown so terminated runtimes and their JRubyClassLoaders can be collected promptly. Add regression coverage for same-thread and worker-thread container termination. Fixes jrubyGH-9092 Signed-off-by: Jimmy Tanagra <jcode@tanagra.id.au>
|
There's clearly some issues here but I'd like to figure out how to land a fix. It seems that the weak references to the class values is too weak, and I'm not sure we should be using weak references at this point anyway. If they're weak and a JavaClass goes unreferenced for a couple of GC cycles, it will end up getting collected and a future access will cause it to be recreated. That's the source of all the reinitialized constant warnings and the reason the test/jruby tests fail here. Even a SoftReference is probably too weak, since it just delays the time until the JavaClass gets collected. Is that part of the patch necessary for this to work? Is clearing the reference to the JavaSupport class not enough? |
Signed-off-by: Jimmy Tanagra <jcode@tanagra.id.au>
Yes it's necessary. Reverting only the JavaSupport/ClassValue part causes all three GH-9092 regression tests in ScriptingContainerClassLeakTest to fail, and restoring it makes that test class pass again. So that part is necessary for the leak fix, not for normal runtime cache behavior. If you have a cleaner lifecycle-based alternative, I’m happy to switch to that. |
|
My question was really if we need both changes. Clearing the reference to the JavaSupport instance seems like it would be sufficient, and making the Java class references weak leads to other problems quickly. I can give it a try myself when I'm in the office. |
Signed-off-by: Jimmy Tanagra <jcode@tanagra.id.au>
Signed-off-by: Jimmy Tanagra <jcode@tanagra.id.au>
|
My question was really if we need both changes. Clearing the reference to the JavaSupport instance seems like it would be sufficient, and making the Java class references weak leads to other problems quickly. I can give it a try myself when I'm in the office. |
|
I've removed the weak reference and instead, explicitly clear it at teardown. It seems that most tests are passing except that test:mri:core:jit which seem flaky - sometimes java21 failed but java25 passed, and another time java21 passed, but java25 failed. Is this a known issue? |
|
That particular test does seem to be flaky on and off so I don't think it's related to your changes. I will take another look at that but I think we are probably okay with this set of changes. |
| private final ClassValue<JavaClass> javaClassCache; | ||
| private final ClassValue<RubyModule> proxyClassCache; | ||
| private final MapBasedClassValue<JavaClass> javaClassCache; | ||
| private final MapBasedClassValue<RubyModule> proxyClassCache; |
There was a problem hiding this comment.
now there's no way to switch back to proper ClassValue, AFAIU the MapBasedClassValue was really a work-around for Java ClassValue bugs in some releases.
There was a problem hiding this comment.
I need to revisit this behavior distinction. I'm not sure where we ended up in discussions with JDK core-libs folks about the weakness (or strength) of the ClassValue references.
This comes back to my concerns about the original super-weak change to the ClassValue that made it too weak and caused classes to get re-proxied repeatedly. If the hard reference is the concern, then clearing the old JavaSupport reference should be enough and further changes should be unnecessary. Perhaps I'm missing some subtlety here.
|
I think we're largely here but I echo @kares concerns about a few of these changes. It feels to me (begging pardon if this is not the case) that there's a smattering of related attempts to fix this issue that all got balled up in one PR, when the simple fix of fully clearing the JavaSupport reference might solve all of them at once (and in a similar way to other Let's try to finish this discussion and merge for 10.0.5.0, due out this week. |
|
I've asked copilot to write this response to explain the various changes: The PR actually started with that simpler approach — the first commit used Here's why Path 1 —
Path 2 — ThreadService
Path 3 — JDK
The |
Signed-off-by: Jimmy Tanagra <jcode@tanagra.id.au>
Heap histogram profiling (jcmd) showed three sources of per-reload accumulation after ScriptingContainer.terminate(): - `staticAssignedNames` and `instanceAssignedNames` in JavaSupportImpl used ClassValue.newInstance() (→ StableClassValue → JDK ClassValue), which stores entries in the Class object itself — an external GC root that survives the runtime. Switch both to MapBasedClassValue and clear them in tearDown(), matching the pattern already applied to javaClassCache and proxyClassCache. - `objectProxyCache` (ObjectProxyCache<IRubyObject,RubyClass>) was not cleared on tearDown(). Add ObjectProxyCache.clear() and call it from JavaSupport.tearDown(). Top histogram delta classes eliminated: HashMap$Node -3MB AssignedName -862KB ClassValue$Entry -128KB StableClassValue$StableValue -110KB ConcurrentWeakHashMap$* -16KB each Signed-off-by: Jimmy Tanagra <jcode@tanagra.id.au>
|
I found some more small leaks - patched. Now, after 20+ reloads I don't see any more heap growth, other than those caused by soft references, which takes 1000s to be released by the jvm by default. |
|
@jimtng Thanks for iterating on this. Regarding the justifications for the additional changes, most seem to make general sense. I believe the extra effort to clear references throughout the system is appropriate at teardown, and anywhere we are doing so we should be skeptical of decreasing the strength of those references. In other words, if we can actively clear those references and that is sufficient to eventually dereference and collect the entire runtime, no change should be necessary in the strength of those references (Weak or SoftReference additions should not be necessary). Put yet another way, we should only be using Weak or SoftReference in situations where a still-running JRuby instance does not vacate otherwise unreferenced objects OR if it is not possible at teardown to actively clear those references. This PR is about making teardown clean, so we should focus primarily on that aspect. This line from Copilot is also not correct (or perhaps it is correct but as a general statement about the behavior of GC:
This really applies to all objects in a GCed system. If there's no pressure to clear the GC, most GCs will not actively move to vacate the heap. You can see this effect by having an app essentially pause all operation and then forcing a GC some time later. In the absence of heap pressure due to ongoing allocation, objects will not be cleared. SoftReference is essentially just a WeakReference with a time delay from the last traversal. This is typically specified in time versus heap size, like milliseconds per megabyte. It can be tuned up or down, but the intent is that infrequently-accessed objects don't immediately get collected like you may see from WeakReference (and we have had bad tests come in from CRuby or ruby/spec that fail to consider this possibility).
And it appears that all tests are passing, which is a pretty good testament. I will review the latest round of changes and see if I'm comfortable including this in JRuby 10.0.5.0 (which has been delayed until Monday to consider this and other issues). |
headius
left a comment
There was a problem hiding this comment.
I'm still not following why the WeakReference is needed, nor why clearing the JavaSupport reference is not sufficient to clear any ClassValue it aggregates.
| private final AtomicLong threadCount = new AtomicLong(0); | ||
|
|
||
| /** | ||
| * Weak references to every SoftReference ever stored into this ThreadLocal (one per thread |
There was a problem hiding this comment.
I don't believe this is necessary. If the SoftReferences are held by ThreadService elsewhere, wrapping them in a WeakReference does not aid their being collected, and in the case where those SoftReferences do actually become dereferenced, we have just shifted the problem to a list of WeakReferences. Please explain if there's something I'm missing here. I've never seen a SoftReference wrapped in a WeakReference before.
| this.runtime = runtime; | ||
|
|
||
| this.javaClassCache = ClassValue.newInstance(klass -> new JavaClass(runtime, getJavaClassClass(), klass)); | ||
| this.javaClassCache = new MapBasedClassValue<>(klass -> new JavaClass(runtime, getJavaClassClass(), klass)); |
There was a problem hiding this comment.
Is this change just so that we can go clear those entries? Again I wonder why dereferencing the JavaSupport is not enough to dereference the ClassValue and all values it refers to.
|
I think this is the part that's confusing me:
ClassValue does store the values in a WeakHashMap with the ClassValue's "identity" object used as the weak key. The references to values are hard, but dereferencing the ClassValue should allow these WeakHashMaps to dereference the ClassValue identities and eventually clear that hard reference. So I believe I'm correct in thinking that clearing the JavaSupport reference is at least sufficient to ensure the proxy objects are vacated, but I recognize that may not happen as rapidly as we'd like (especially if there are hundreds or thousands of references from that ClassValue identity to our JavaClass proxies). Using MapBasedClassValue gives us the ability to actively clear those references, since we control the underlying map, albeit at the cost of a much longer teardown process than simply walking awaay from them. I'd ike to see what the heap curve looks like for each case. I suspect the regular ClassValue version will climb higher before clearing (due to WeakReference delays) but I would not expect it to continue "leaking" over time. So I propose a compromise: ship the logic from your patch that actively clears MapBasedClassValue caches, but do not use MapBasedClassValue by default. This will allow us the benefit of ClassValue's JDK optimizations for non-embedded use cases (faster access and less teardown overhead for command-line apps) while allowing embedded use cases a way to opt into active ClassValue clearing with MapBasedClassValue. It also minimizes the changes, since MapBasedClassValue is already configurable. What do you think? |
Based on my test, no it is not sufficient. I checked out jruby-10.0 and then only changed a single line in Ruby.java: Then I ran openhab with this JVM options: Then each time a new jruby engine is re-loaded I ran 5 GC calls to ensure that GC is performed. Given the small Xmx setting, on the 5th reload I eventually hit OOM: This seems to me that there is a genuine leak, and not merely SoftReferences waiting to be claimed. |
This reverts commit 77d52d4.
|
@jimtng I think you misunderstood me. I meant that the ClassValue change should not be necessary if we are clearing the reference to JavaSupport, since that should in turn cause the ClassValues to all be dereferenced. The other changes make sense. Edit: DEreferenced |
I tried reverting all the MapBasedClassValue back to ClassValue as they were before this PR. The result is OOM on the 5th reload in the same set up as above. This shows me that we do need to use MapBasedClassValue and manually clear them. This is way beyond my Java knowledge, so I asked copilot: The answer is right there in the // If we put any objects that reference an org.jruby.Ruby runtime
// (like RubyClass instances) in here we run into a circular
// reference situation that GC isn't handling where they will
// always be strongly referenced and never garbage collected.
// This appears to be a bug in OpenJDK. See jruby/jruby#3228.The exact cycle in
|
|
Ok, give me a day to verify this and try to fix the ClassValue cycle. |
To clarify, I started with the current PR 7a17756 and only reverted the changes that involved the use of MapBasedClassValue back to their pre-PR ClassValue, but leaving all the other changes in this PR. That didn't work, confirming that the changes to MapBasedClassValue are necessary. |
|
Wow nice find! I'll test in a few hours and report back. |
Signed-off-by: Jimmy Tanagra <jcode@tanagra.id.au>
Unfortunately this didn't work for me.
Under my test, it also caused OOM crash after 5 reloads. I confirmed that the modified StableValue constructor was called by inserting a log call. |
|
I tried asking copilot to go down this path and it tried setting calculator = null after it was used, but that too didn't fix the leak. This is what it's saying, FWIW: The The permanent strong chain
Why headius's test hides the leakHis script only touches 3 JI classes ( Why
|
|
This analysis is still wrong, or at least deeply misleading. My patience is wearing thing, but let's play this game a bit longer so we can get it right for 10.0.5.0. AI is often confidently incorrect
For some reason, Copilot ignored completely that my script ran in a loop, generating and terminating 10 runtime instances and then running GC. Each iteration takes around 1-2 seconds on my system once warmed up. After 5 minutes without my fix, it was consuming almost 500MB. The leak was obvious. Let's use that as a baseline. Five minutes is 600 seconds. We'll be generous and say that 50MB out of that 500 was always going to be occupied, leaving 450MB leaking. That works out to something like 0.75MB leaking per second. Now let's look at the fixed version that ran for 20 minutes. If it were still broken and every second leaked 0.75MB, that's 0.75MB * (1200s) = 900GB of heap memory that should have been permanently stuck. You saw both graphs... where 5 minutes without the fix climbed up to 500MB, the fixed version stayed within the 100-200MB range. The total available heap size never even got past 500MB. A leak was definitely fixed. It just wasn't the leak you saw when you tried to confirm it. It's shocking that Copilot would be so wrong about this. It's confidently giving you bad information. Beware. Java7ClassValue vs StableClassValueI said above:
The changes I reverted dd not completely reset JavaSupport and JavaSupportImpl. I just removed the direct construction of Here's the diff I applied to revert those changes: commit d88d692332ac4a1137f4532f6f627f84f128ba55
Author: Charles Oliver Nutter <headius@headius.com>
Date: Thu Apr 2 12:03:49 2026 -0500
Revert MapBasedClassValue changes
diff --git a/core/src/main/java/org/jruby/javasupport/JavaSupport.java b/core/src/main/java/org/jruby/javasupport/JavaSupport.java
index 75089d1d6b..6bf9499314 100644
--- a/core/src/main/java/org/jruby/javasupport/JavaSupport.java
+++ b/core/src/main/java/org/jruby/javasupport/JavaSupport.java
@@ -67,8 +67,8 @@ public abstract class JavaSupport {
protected final Ruby runtime;
@Deprecated(since = "9.4.3.0")
- private final MapBasedClassValue<JavaClass> javaClassCache;
- private final MapBasedClassValue<RubyModule> proxyClassCache;
+ private final ClassValue<JavaClass> javaClassCache;
+ private final ClassValue<RubyModule> proxyClassCache;
static final class UnfinishedProxy extends ReentrantLock {
final RubyModule proxy;
@@ -113,9 +113,9 @@ public abstract class JavaSupport {
public JavaSupport(final Ruby runtime) {
this.runtime = runtime;
- this.javaClassCache = new MapBasedClassValue<>(klass -> new JavaClass(runtime, getJavaClassClass(), klass));
+ this.javaClassCache = ClassValue.newWeakInstance(klass -> new JavaClass(runtime, getJavaClassClass(), klass));
- this.proxyClassCache = new MapBasedClassValue<>(this::computeProxyClass);
+ this.proxyClassCache = ClassValue.newWeakInstance(this::computeProxyClass);
// Proxy creation is synchronized (see above) so a HashMap is fine for recursion detection.
this.unfinishedProxies = new ConcurrentHashMap<>(8, 0.75f, 1);
@@ -397,8 +397,8 @@ public abstract class JavaSupport {
}
public void tearDown() {
- javaClassCache.clear();
- proxyClassCache.clear();
+// javaClassCache.clear();
+// proxyClassCache.clear();
unfinishedProxies.clear();
objectProxyCache.clear();
}
diff --git a/core/src/main/java/org/jruby/javasupport/JavaSupportImpl.java b/core/src/main/java/org/jruby/javasupport/JavaSupportImpl.java
index 21a81ae1b5..18738b1e54 100644
--- a/core/src/main/java/org/jruby/javasupport/JavaSupportImpl.java
+++ b/core/src/main/java/org/jruby/javasupport/JavaSupportImpl.java
@@ -53,10 +53,10 @@ import org.jruby.javasupport.proxy.JavaProxyClass;
*/
public class JavaSupportImpl extends JavaSupport {
- private final MapBasedClassValue<Map<String, AssignedName>> staticAssignedNames =
- new MapBasedClassValue<>(JavaSupportImpl::newAssignedNames);
- private final MapBasedClassValue<Map<String, AssignedName>> instanceAssignedNames =
- new MapBasedClassValue<>(JavaSupportImpl::newAssignedNames);
+ private final ClassValue<Map<String, AssignedName>> staticAssignedNames =
+ ClassValue.newInstance(JavaSupportImpl::newAssignedNames);
+ private final ClassValue<Map<String, AssignedName>> instanceAssignedNames =
+ ClassValue.newInstance(JavaSupportImpl::newAssignedNames);
public JavaSupportImpl(final Ruby runtime) {
super(runtime);
@@ -138,8 +138,8 @@ public class JavaSupportImpl extends JavaSupport {
public void tearDown() {
super.tearDown();
- staticAssignedNames.clear();
- instanceAssignedNames.clear();
+// staticAssignedNames.clear();
+// instanceAssignedNames.clear();
synchronized (javaProxyClasses) {
javaProxyClasses.clear();
diff --git a/core/src/main/java/org/jruby/util/collections/ClassValue.java b/core/src/main/java/org/jruby/util/collections/ClassValue.java
index 6641aea029..170a1ae543 100644
--- a/core/src/main/java/org/jruby/util/collections/ClassValue.java
+++ b/core/src/main/java/org/jruby/util/collections/ClassValue.java
@@ -54,6 +54,17 @@ public abstract class ClassValue<T> {
return Options.JI_CLASS_VALUES.load().function.apply(calculator);
}
+ /**
+ * Create a weak-valued ClassValue for caches whose computed values can retain a Ruby runtime.
+ *
+ * java.lang.ClassValue has historically caused retention cycles in OpenJDK when values are
+ * strongly held and those values reference JRuby runtime state. Using weak values here breaks
+ * that cycle while preserving class-keyed lookup semantics.
+ */
+ public static <T> ClassValue<T> newWeakInstance(ClassValueCalculator<T> calculator) {
+ return new Java7ClassValue<>(calculator);
+ }
+
@Deprecated(since = "9.4.13.0")
private static <T> ClassValue<T> newJava7Instance(ClassValueCalculator<T> calculator) {
return new Java7ClassValue<>(calculator);I suspect this is the real difference between your most recent run and mine. Perhaps I should have been more specific and said "after reverting the MapBasedClassValue changes but retaining the newWeakInstance change" or simply provided my entire diff. In any case, you were testing something different than I was. We'll mark that down as a miscommunication aggravated by bad AI analysis. It's notable that even though my run was using your weak proxy cache change, it still leaked 0.75MB per second. That's because the other two ClassValue in JavaSupportImpl went back to being StableClassValue (by default config) and that's likely where the retained memory came from. Copilot was correct that the The reference chain explanations... are correct. With all caches restored to StableClassValue, there is indeed a cycle from the classValueMap back to the StableClassValue.result, keeping it alive. That leaks much more quickly because it does actually retain a whole JRuby runtime (which is almost certainly much more than 3-4MB). Fixing the right problemThere's still a memory issue when using StableClassValue to caching the proxy instances, that much we can agree on. The fix may be to return to weak references, or perhaps we can avoid some additional hidden references that are not cleared when the runtime is torn down. Throwing out the class altogether is not the answer I'm looking for. In my last review, I asked of your MapBasedClassValue changes:
Let's drill down on this particular change. If I go back to stock JavaSupport/Impl (from 10.0.4.0, with the
We can configure it to use MapBasedClassValue instead of StableValue by specifying JVM property Running the same script as above, but with Here's a 10-minute run just to be sure:
But here's the real kicker: my reverted JavaSupport/Impl does not include your MapBasedClassValue clearing logic. My local JavaSupport/Impl diff against 10.0.4.0 is empty. We have configured away the proxy leak solely by switching to MapBasedClassValue. So the answer to "is this just so we can clear those entries?" appears to be "no", because MapBasedClassValue also avoids retaining runtime references from the bootstrap classloader GC root. That's why it is configurable. More historyThe use of ClassValue has gone through many iterations. Originally, we tried to just use the JDK ClassValue directly, guarded by a configuration property (since we still supported earlier Java versions at the time). We made Java7ClassValue opt-in and provided MapBasedClassValue to abstract the API away. Later, once we required Java 8 in JRuby 9k, we switched the default to Java7ClassValue. We quickly ran into the hard-referencing problem and a contributor helped us make Java7ClassValue hold weak references (#3228). The current "breaking" change may have arrived last year when I started switching the default back to a regular JDK ClassValue. (5ac95ea in #8844). The torture tests I ran against this did not appear to leak. Next, I discovered that ClassValue calculation was not idempotent and could execute more than once so I added StableClassValue and a configuration flag to select it (remaining changes in #8844). At this point there were still no obvious leaks, as proved by a 12 hour run of my torture test. There clearly is a leak now. So either my torture test was insufficient to show the leaks we are now seeing, or I later introduced another change that is causing the current leak. That's the fix I currently seek. |
|
The |
|
Well, how about that... the actual leak we seek to fix was also introduced in 8288cac. Here's the fix (the StableClassValue lock fix is also needed): diff --git a/core/src/main/java/org/jruby/javasupport/JavaSupport.java b/core/src/main/java/org/jruby/javasupport/JavaSupport.java
index e9c31e8778..150ee77ebb 100644
--- a/core/src/main/java/org/jruby/javasupport/JavaSupport.java
+++ b/core/src/main/java/org/jruby/javasupport/JavaSupport.java
@@ -114,21 +114,28 @@ public abstract class JavaSupport {
this.javaClassCache = ClassValue.newInstance(klass -> new JavaClass(runtime, getJavaClassClass(), klass));
- this.proxyClassCache = ClassValue.newInstance(this::computeProxyClass);
+ this.proxyClassCache = ClassValue.newInstance(new ProxyComputer(runtime)::computeProxyClass);
// Proxy creation is synchronized (see above) so a HashMap is fine for recursion detection.
this.unfinishedProxies = new ConcurrentHashMap<>(8, 0.75f, 1);
}
- /**
- * Because of the complexity of processing a given class and all its dependencies,
- * we opt to synchronize this logic. Creation of all proxies goes through here,
- * allowing us to skip some threading work downstream.
- */
- private synchronized RubyModule computeProxyClass(Class<?> klass) {
- RubyModule proxyKlass = Java.createProxyClassForClass(runtime, klass);
- JavaExtensions.define(runtime, klass, proxyKlass); // (lazy) load extensions
- return proxyKlass;
+ private static class ProxyComputer {
+ private final Ruby runtime;
+
+ ProxyComputer(Ruby runtime) {
+ this.runtime = runtime;
+ }
+ /**
+ * Because of the complexity of processing a given class and all its dependencies,
+ * we opt to synchronize this logic. Creation of all proxies goes through here,
+ * allowing us to skip some threading work downstream.
+ */
+ private synchronized RubyModule computeProxyClass(Class<?> klass) {
+ RubyModule proxyKlass = Java.createProxyClassForClass(runtime, klass);
+ JavaExtensions.define(runtime, klass, proxyKlass); // (lazy) load extensions
+ return proxyKlass;
+ }
}The problem was actually really simple: Five minutes of monitoring seemed sufficient to prove it's fixed:
This is again not an ideal fix; the synchronization is now against the (singleton) ProxyComputer rather than against JavaSupport, so it won't disallow other JavaSupport synchronizers from running at the same time. It should be replaced with a shared lock that all such synchronizers use. But it's probably good enough for now, and I'll clean it up with the rest of StableClassValue after we merge your other fixes. @jimtng I don't want you to think I was impatient with you. I know you were just trying to find a solution to a complex problem, and AI was the only tool you had to do it. Unfortunately in this case, it obscured the actual problem and led both of us around in circles. I've lost a couple of days dealing with this misleading analysis, when I probably should have just tried to reproduce your case and investigated it myself. My impatience was with an AI telling me I was wrong when I wasn't and backing it up with falsehoods. It's not your fault... you were just the middle man, and important information was getting lost along the way. If you find that there's still memory being leaked, just let me know. Leave Copilot out of it. |
|
Also, I take back what I said earlier. This analysis of the reference chain keeping everything alive was just wrong: The bug I finally fixed involved a reference from ClassValue to the compute function to JavaSupport to the ClassValue. It was a simple sequence and the AI description of it was completely wrong. In fact I'm pretty sure the only things that Copilot got right were:
The rest was noise, or worse than noise since it convinced you that it had found the actual problem and made me doubt my own analysis and understanding of the system. |
|
Fixes #9187 with my additions in place. |
Signed-off-by: Jimmy Tanagra <jcode@tanagra.id.au>
Signed-off-by: Jimmy Tanagra <jcode@tanagra.id.au>
is it though? recall there was a reason to sync against concurrent proxy initialization due issues when multiple threads initializing the same proxy. |
|
I'm going back over all the commits now because I no longer trust any of them. The most recent changes, which are clean and leak-free on my local system, fails a test that Copilot wrote. The first commit in the PR makes claims about a reference path that again seem to be incorrect. We won't be merging this as-is. |
The synchronization against a global lock is necessary, but I'm not sure if that global lock needs to be the JavaSupport instance. In any case, I am going to redo this whole thing and make all places synchronizing this way use a common Lock instance. |
| // ScriptingContainer -> SingleThreadLocalContextProvider -> LocalContext -> Ruby | ||
| // keeps the runtime (and its JRubyClassLoader) alive until this LocalContext | ||
| // is itself collected, which may be delayed by ScriptingContainer.finalize(). | ||
| runtime = null; |
There was a problem hiding this comment.
This change is accepted. It may not be necessary, but it's reasonable to clear a reference to the runtime. I'm adding a termination flag to ensure the context is not able to be re-vivified.
See #9359.
| * space on this runtime). Tracked so that teardown() can clear their referents even on | ||
| * threads we cannot schedule work on (adopted / pool threads). | ||
| */ | ||
| private final List<SoftReference<ThreadContext>> trackedRefs = new CopyOnWriteArrayList<>(); |
There was a problem hiding this comment.
This change is rejected.
- Despite the Copilot analysis below, the reference chain from the ThreadLocal (ThreadService) to the runtime has at least one more weak link: the ThreadLocalMap's Entry. When the ThreadService is cleared at teardown, the ThreadLocal no longer has any GC roots, and will be collected. This will in turn clear out the SoftReference to the ThreadContext and the rest of the graph.
- Accumulating SoftReference here introduces a new leak because as threads terminate during runtime there's nothing to clean their references out of this list.
| this.javaClassCache = ClassValue.newWeakInstance(klass -> new JavaClass(runtime, getJavaClassClass(), klass)); | ||
|
|
||
| this.proxyClassCache = ClassValue.newInstance(this::computeProxyClass); | ||
| this.proxyClassCache = ClassValue.newWeakInstance(this::computeProxyClass); |
There was a problem hiding this comment.
These changes are rejected. The problem here is not in the use of StableClassValue (ignoring a separate bug there that keeps it alive). The problem is that this::computeProxyClass introduces a reference chain from the ClassValue to the compute lambda to the JavaSupport to the ClassValue. We will fix this by removing the JavaSupport reference from that chain, allowing the ClassValue to clear properly when dereferenced.
See #9359
| * java.lang.ClassValue has historically caused retention cycles in OpenJDK when values are | ||
| * strongly held and those values reference JRuby runtime state. Using weak values here breaks | ||
| * that cycle while preserving class-keyed lookup semantics. | ||
| */ |
| threadService = new ThreadService(this); | ||
|
|
||
| // Release classloader resources | ||
| // Release classloader resources before tearing down the thread service. |
There was a problem hiding this comment.
This change is rejected, except for reordering the teardown sequence. LoadService does indeed call getCurrentContext, but it can simply be moved above the ThreadService teardown above. The order of the ThreadService and classloader termination is unimportant, because neither of them depends upon the other.
See #9359
|
Superseded by #9359 |






Clear lingering ThreadContext soft references during teardown, drop the embed LocalContext runtime reference, and avoid re-adopting the current thread while Ruby is shutting down.
Also switch Java integration class caches to weak values and avoid recreating JavaSupport during teardown so terminated runtimes and their JRubyClassLoaders can be collected promptly.
Add regression coverage for same-thread and worker-thread container termination.
Fixes #9092, Fixes #9070, Fixes openhab/openhab-jruby#520
Note: I asked an AI tool to help identify this issue. I verified the result on a patched JRuby 10.0.4.0 running inside openHAB, where the memory leak had previously been observed. After applying the fix, the GC consistently returned the heap to its original size, even after repeated script‑engine creation and teardown cycles.
Note: In order to get the JVM to aggressively perform GC, during testing I've started it with: