Skip to content

fix ClassTag#cache leak#9276

Merged
retronym merged 2 commits intoscala:2.12.xfrom
jtjeferreira:class-tag-leak
Nov 13, 2020
Merged

fix ClassTag#cache leak#9276
retronym merged 2 commits intoscala:2.12.xfrom
jtjeferreira:class-tag-leak

Conversation

@jtjeferreira
Copy link
Contributor

I was diagnosing some classloader leaks and the profiler pointed to ClassTag#cache

JDK issues about the problem:

Also this is not a problem in 2.13.x because the cache does not exist there. See gitter question https://gitter.im/scala/contributors?at=5f941163a0a3072b43a099b7

@scala-jenkins scala-jenkins added this to the 2.12.14 milestone Oct 24, 2020
var tag = ref.get
if (tag == null) {
cache.remove(runtimeClass1)
tag = cache.get(runtimeClass1).asInstanceOf[jWeakReference[ClassTag[T]]].get
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in theory this .get could return null...

@jtjeferreira
Copy link
Contributor Author

Would be interesting to write a test similar to the one in the JDK bug:

public static void main(String[] args) throws Throwable {
        for (long i = 0; i<10000000; i++) {
            File dir = new File("t/t.jar");
            URLClassLoader classLoader = new URLClassLoader(new URL[]{dir.toURI().toURL()});
            ClassValue cv = (ClassValue) classLoader.loadClass("MyClassValue").newInstance();
            Object value = cv.get(Integer.TYPE);
            assert value !=null;
            assert value.getClass().getClassLoader() == classLoader;
            classLoader.close();
        }

    } 

but I would need some help since this is my first contribution 😂

@SethTisue SethTisue modified the milestones: 2.12.14, 2.12.13 Oct 24, 2020
@lrytz
Copy link
Member

lrytz commented Oct 26, 2020

@retronym do you know why the cache doesn't exist in 2.13.x?

It was originally added for 2.12.x in #7879 and then merged forward in #8242, where you marked it as "PICK", but it doesn't show up in the diff.

@retronym
Copy link
Member

Thanks for bringing up this issue.

An alternative fix might be to limit our use of the cache to when val cl = runtimeClass.getClassLoader; cl == null || cl == classOf[scala.reflect.runtime.universe.type].getClassLoader. Classes loaded from child classloaders would not have entries for these caches installed in their ClassValue maps.

We use ClassValue in a few other places; they will also need an update.

I'll try out some variations of the fix and also try to replicate the leak in a test.

@retronym retronym self-assigned this Oct 27, 2020
@retronym
Copy link
Member

retronym commented Oct 27, 2020

Related fix for TypeTag reported in scala/bug#11767 and fixed in #8470 in a manner similar to your proposal.

@jtjeferreira
Copy link
Contributor Author

@retronym looking forward for your variations of the fix, but "do you know why the cache doesn't exist in 2.13.x?" as @lrytz asked? Removing the cache in 2.12.x could also be a fix :)

@retronym
Copy link
Member

retronym commented Oct 27, 2020 via email

@lrytz
Copy link
Member

lrytz commented Oct 28, 2020

... and I didn't spot it in the review

@retronym
Copy link
Member

retronym commented Nov 2, 2020

@jtjeferreira Could you describe the classloader heirarchy that showed up the problem? Is scala-library being loaded in temporary classloaders and leaking its ClassValue in the Class.classValueMap of classes from parent loaders? Or is a single load of scala-library that is used to form ClassTags over a user class in temporary/child classloader preventing unloading of that user class? I can't get any test to show a leak, but if I know which one I'm going for should be able to repro.

@retronym
Copy link
Member

retronym commented Nov 2, 2020

I've pushed a commit to deal with the race condition of the weak reference immediately getting cleared. I do the same as we do in TypeTag's use of ClassValue/WeakReference and just return an uncached result after removing a stale cache entry.

Baseline:

[info] Benchmark                                             Mode  Cnt   Score    Error   Units
[info] ClassTagBenchmark.lookupClassTag                      avgt   20  40.625 ±  2.595   ns/op
[info] ClassTagBenchmark.lookupClassTag:·gc.alloc.rate       avgt   20   0.001 ±  0.002  MB/sec
[info] ClassTagBenchmark.lookupClassTag:·gc.alloc.rate.norm  avgt   20  ≈ 10⁻⁴             B/op
[info] ClassTagBenchmark.lookupClassTag:·gc.count            avgt   20     ≈ 0           counts

This PR:

[info] Benchmark                                             Mode  Cnt   Score    Error   Units
[info] ClassTagBenchmark.lookupClassTag                      avgt   20  47.246 ±  4.295   ns/op
[info] ClassTagBenchmark.lookupClassTag:·gc.alloc.rate       avgt   20   0.001 ±  0.002  MB/sec
[info] ClassTagBenchmark.lookupClassTag:·gc.alloc.rate.norm  avgt   20  ≈ 10⁻⁴             B/op
[info] ClassTagBenchmark.lookupClassTag:·gc.count            avgt   20     ≈ 0           counts

With the cache disabled altogether:

[info] Benchmark                                                          Mode  Cnt     Score     Error   Units
[info] ClassTagBenchmark.lookupClassTag                                   avgt   20    48.622 ±   2.184   ns/op
[info] ClassTagBenchmark.lookupClassTag:·gc.alloc.rate                    avgt   20  2933.149 ± 131.283  MB/sec
[info] ClassTagBenchmark.lookupClassTag:·gc.alloc.rate.norm               avgt   20   224.000 ±   0.001    B/op
[info] ClassTagBenchmark.lookupClassTag:·gc.churn.PS_Eden_Space           avgt   20  2930.875 ± 352.368  MB/sec
[info] ClassTagBenchmark.lookupClassTag:·gc.churn.PS_Eden_Space.norm      avgt   20   223.683 ±  24.055    B/op
[info] ClassTagBenchmark.lookupClassTag:·gc.churn.PS_Survivor_Space       avgt   20     0.097 ±   0.038  MB/sec
[info] ClassTagBenchmark.lookupClassTag:·gc.churn.PS_Survivor_Space.norm  avgt   20     0.007 ±   0.003    B/op
[info] ClassTagBenchmark.lookupClassTag:·gc.count                         avgt   20   119.000            counts
[info] ClassTagBenchmark.lookupClassTag:·gc.time                          avgt   20   363.000                ms

In this microbenchmark, the weak reference version has similar execution time to the uncached version. But it does reduce allocation rates.

@jtjeferreira
Copy link
Contributor Author

@jtjeferreira Could you describe the classloader heirarchy that showed up the problem? Is scala-library being loaded in temporary classloaders and leaking its ClassValue in the Class.classValueMap of classes from parent loaders? Or is a single load of scala-library that is used to form ClassTags over a user class in temporary/child classloader preventing unloading of that user class? I can't get any test to show a leak, but if I know which one I'm going for should be able to repro.

This is from a play 2.7.x project when running the application. Each time the application is stopped and started new classloaders are created. Luckily I have this diagram that I used to understand what was going on:

image

Starting from the bottom:

  • user classes are on the leaf classloader reloader.currentApplicationClassLoader (play.runsupport.DelegatedResourcesClassLoader)
  • libraries and scala-library are on its great parent applicationLoader (play.runsupport.NamedURLClassLoader)
  • every time the application is stopped, everything below commonLoader (play.sbt.PlayCommands$$anon$1) should be GCd,

The workaround I used in playframework/playframework#10500 is to move loading of scala-library to play.sbt.PlayCommands$$anon$1 which is cached and survives reloads...

@jtjeferreira
Copy link
Contributor Author

I also found some interesting discussion here http://mail.openjdk.java.net/pipermail/mlvm-dev/2018-March/006847.html

@jtjeferreira
Copy link
Contributor Author

I was trying to get you a screenshot from the profiler (from running scripted tests of playframework/playframework#10500 without the fix ), but eclipse MAT does not report it, and my yourkit trial just expired (which AFAICT was the one that was showing it clearly)

@retronym
Copy link
Member

/rebuild

Copy link
Member

@retronym retronym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to reproduce with a test case. I've proceeded with my slight refactor of your original idea, together with a system property to disable the cache altogether.

@retronym retronym merged commit 2166efc into scala:2.12.x Nov 13, 2020
var tag = ref.get
if (tag == null) {
cache.remove(runtimeClass1)
tag = cache.computeTag(runtimeClass1).asInstanceOf[ClassTag[T]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be benefitial to add the tag back into the cache here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a rare case so I don't think its worth it.

@jtjeferreira
Copy link
Contributor Author

I wasn't able to reproduce with a test case. I've proceeded with my slight refactor of your original idea, together with a system property to disable the cache altogether.

Thanks for the merge! Do you plan to port this to 2.13?

@retronym
Copy link
Member

Yeah, I will correct the merge error and take this all over to 2.13.x after 2.13.4 is released.

@mkurz
Copy link
Contributor

mkurz commented Dec 10, 2020

Yeah, I will correct the merge error and take this all over to 2.13.x after 2.13.4 is released.

@retronym did you port this fix and #9361 to the 2.13.x branch yet?

retronym added a commit to retronym/scala that referenced this pull request Feb 5, 2021
retronym added a commit to retronym/scala that referenced this pull request Feb 7, 2021
lrytz pushed a commit to retronym/scala that referenced this pull request Feb 8, 2021
@retronym
Copy link
Member

@mkurz The forward port eventually landed in #9487.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants