Skip to content

Convert all types to heap types.#601

Merged
bsteffensmeier merged 8 commits into
ninia:dev_4.3from
bsteffensmeier:all-heap-types
Mar 27, 2025
Merged

Convert all types to heap types.#601
bsteffensmeier merged 8 commits into
ninia:dev_4.3from
bsteffensmeier:all-heap-types

Conversation

@bsteffensmeier

Copy link
Copy Markdown
Member

The primary goal of this change is to fix the crashes described in #599 when isolated sub-interpreters are used. The crashes are mostly a problem in the case where all interpreters are initialized on different threads at the same time.

I was not able to track down the exact source of crashes in the cpython code but it seems to be related to the internal type cache. I found that the crashes would not occur if I stopped using all our static types which also required removing a bunch of jep functionality. Since heap types are specific to an interpreter they are safer for isolated interpreters so I started testing using more heap types instead of removing functionality. Using heap types stopped the crashing. This change converts all of the jep types that were previously static types to be heap types instead. I have not proven definitively that all the types were involved in the crashes but a majority are so it seemed beneficial to do all of them for consistency.

The biggest downside to heap types is the type object can no longer be referenced directly in c. I extended the approach from #594 to store the type definitions in the module state. In the code that creates new PyJTypes there is a need to create many super types, methods, and fields and that needs a reference to the module state. To avoid redundant lookups of the module state when making new PyJTypes the module state is passed around between most of the pyjtype functions.

Another benefit of using heap types is that PyTypeObject used by static types is not part of the cpython Limited ABI but the functions for initializing the types are. This change brings us a little closer to being able to use the stable ABI to distribute a single python binary for multiple versions of cpython.

@bsteffensmeier bsteffensmeier requested a review from ndjensen March 16, 2025 01:46

@ndjensen ndjensen left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment.

Comment thread src/main/c/Objects/pyjclass.c
@ndjensen

Copy link
Copy Markdown
Member

Also Appveyor does not like your new test.

@ndjensen ndjensen left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job finding and correcting the memory leak of heap types.

}

Py_TYPE((PyObject*) self)->tp_free((PyObject*) self);
tp->tp_free((PyObject*) self);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we call tp_free() on this one?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyJObject and the subtypes that we create manually like PyJArray and PyJClass do not use garbage collection because they can't be involved in reference cycles so they need to be freed with PyObject_Del. When we dynamically make subtypes for subclasses, like java.util.ArrayList or any other Java class then python automatically adds cgarbage collection so PyObject_GC_Del needs to be called to free the memory. tp_free is always set to the correct option from those two so we don't have to try anything else to figure out which one is right.

@bsteffensmeier

Copy link
Copy Markdown
Member Author

Although the initial change to heap types was relatively straightforward further testing showed that memory management is quite a bit more complicated with heap types so there are follow-on commits to address issues I found. I now believe this change is complete and properly manages all the memory associated with the module state and heap types.

To test for memory leaks I used a loop that endlessly creates and closes SubInterpreters and monitored the process memory to see if it stabilized. At first I was using Python 3.13 and I noticed substantial leaks but after going back and testing old versions of jep that should not be leaking I found the exact same memory leaks. I eventually found a python bug which indicates recent versions of python leak memory when sub-interpreters are closed so I used python 3.11 for testing for memory leak and found there was memory leaks related to this changes.

I tracked the memory leaks down to the new type objects and by tracking all ref count changes to the types I found the refcount increases when an instance is created but does not decrease when the instance is freed. In 7603147 I added a decref to the tp_dealloc for all the heap allocated types to address this problem.

After that change I was still seeing a memory leak but I could make it go away by removing the java import hook before closing an interpreter. I didn't like that as a solution but this told me that the reference counts must be correct since cpython can free all the memory as long as I let it free the java import hook before closing the interpreter. Eventually I solved that leak in 99889f5 by implementing m_traverse on the module to let GC know about cycles involving the _jep module. I was not able to track down exactly why m_traverse fixes it but I suspect there was a reference cycle involving the module and the system module which holds the java import hook.

After 99889f5 I have not seen any memory leak in normal use cases. However in tracking down the need to implement m_traverse I had also found this python bug and was beginning to suspect we may need to implement GC and tp_traverse for all our types. I was able to trigger a leak by adding a list as a member of the java import hook and then adding the list to itself to make a reference cycle and then putting a pyjobject in the list. It's a rather contrived test but it proves that without implementing GC there will always be ways to theroetically have enough reference cycles to trigger a leak when closing a sub-interpreter so I decided to go ahead and implement gc in all our types in 3d64318 which solves the leaks even in my contrived test.

Finally while writing this comment I was reviewing the documentation for tp_dealloc and for Changing Static Types to Heap Types and they both mention calling PyObject_GC_UnTrack. I was not seeing any problems with this missing call, I suspect we may not need it since our types aren't directly involved in any reference cycles, however I want to follow best practices so in 344d76e I added calls to PyObject_GC_UnTrack.

Comment thread src/main/c/Objects/pyjfield.c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants