Add ability to create new interpreters that share interpreter state.#606
Conversation
ndjensen
left a comment
There was a problem hiding this comment.
You wrote a wall of text but my brain is wired into the old way of using Jep. Does this essentially removes the threading limitations of Jep? Because another interpreter can come in and access the same globals?
| #else | ||
| tstate = Py_NewInterpreter(); | ||
| #endif | ||
| PyObject *mod_main = PyImport_AddModule("__main__"); /* borrowed */ |
There was a problem hiding this comment.
What's the purpose of importing __main__ here?
There was a problem hiding this comment.
Sub-interpreter has done this as far back as I can find(3.2), I am not entirely sure why and assume typical python interpreters have a __main__ module and this helps compatibility.
There was a problem hiding this comment.
Sorry, I didn't realize you had just moved the code.
| * interpreter state consists of <code>sys.modules</code> and other internal | ||
| * Python structures including the GIL. Using this method is similar to | ||
| * creating a {@link SharedInterpreter} but this can be used with | ||
| * {@link SubInterpreter}s. Interpreters that share interpreter state can |
There was a problem hiding this comment.
So what happens if you try to use this with a SharedInterpreter? Having the return type be Interpreter instead of SubInterpreter seems weird to me if this is intended to be used with SubInterpreters. But I also understand it's probably desirable to return the interface instead of the Class. Hmmm....
There was a problem hiding this comment.
This works with SharedInterpreters but the ability to share interpreter state between shared interpreters is not interesting. This does add the ability to share globals with a SharedInterpreter which seems useful. I think it is valuable to have this API treat both types of Interpreters the same so that anyone using this capability can easily switch between SubInterpreter and SharedInterpeter and use the same methods.
In the JavaDoc I wanted to call out the similarities to SharedInterpreter because the JavaDoc on SharedInterpreter expands more on what things are included in "interpreter state" and has some additional warnings. I didn't want to repeat everything here so I mentioned the similarity and linked the javadoc.
I don't really consider the Interpreters returned from this method as SubInterpeters even if they are created from a SubInterpeter because SubInterpreters are usually independent from each other and this can be used to create many Interpeters that are not independent from each other. These are really just new threads accessing an existing Interpreter but we don't really have a nomenclature for that.
There was a problem hiding this comment.
Am I understanding correctly that SubInterpreters and SharedInterpreters can be bound to a new interpreter on another thread? So for shared interpreters the new feature is to also share the globals? That would be perfect :)
There was a problem hiding this comment.
Am I understanding correctly that SubInterpreters and SharedInterpreters can be bound to a new interpreter on another thread? So for shared interpreters the new feature is to also share the globals? That would be perfect :)
Yes, that is an accurate summary.
You are still only allowed one interpreter per thread and each interpreter can only be used on the thread where it was created. But creating new interpreters that share globals basically lets you access an interpreter from any thread. |
| #else | ||
| tstate = Py_NewInterpreter(); | ||
| #endif | ||
| PyObject *mod_main = PyImport_AddModule("__main__"); /* borrowed */ |
There was a problem hiding this comment.
Sorry, I didn't realize you had just moved the code.
| * interpreter state consists of <code>sys.modules</code> and other internal | ||
| * Python structures including the GIL. Using this method is similar to | ||
| * creating a {@link SharedInterpreter} but this can be used with | ||
| * {@link SubInterpreter}s. Interpreters that share interpreter state can |
|
Thanks for working on this - this will be a game changer for multi-threaded Jep usage! :) 👍 Here are some spontaneous naming ideas: Ownership & lifecycle: Proposal: Disclaimer: I only have a high-level understanding of Jep and very little experience with the Python C API. ;) Thanks again for the hard work ! |
Thanks for all the ideas. I am really liking BoundInterpreter right now. AttachedInterpreter would be my second choice, I think those two are really synonyms. I am considering changing the function for creating the new interpreters from
Right now, from my reading of the c-api and the way we use it there is no distinction between "main" and "attached". It is theoretically fine to close the main interpreter. It is also possible to create an attached interpreter from other attached interpreters making a single interpreter both "main" and "attached". I have written the Jep code with that assumption so that all the shared state will be cleaned up when the final interpreter is closed. The caveat here is that I have not tested it. If anyone has the bandwidth to check out this branch and play with it that would be phenomenal. I deliberately avoided mentioning much about the lifecycle because I am worried we might run into unexpected obstacles that require keeping the "main" interpreter open so I am prepared to start adding restrictions or caveats as we find problems but I don't know of any problems yet.
That is definitely the direction where I would like to take this. I see this as the first step where we can try out the ideas of expanded sharing between interpreters and get feedback on how useful it is and if there are unexpected side effects or compatibility issues. I don't plan on trying to implement that for the next Jep release but possibly something to put in next year. For those types of use cases this PR will allow people to create an interpreter on a daemon thread and then never use it again and achieve the same basic idea of a thread-independent interpreter state.
I appreciate the extra input and discussion. |
| JepConfig config = new JepConfig(); | ||
| config.classLoader = this.classLoader; | ||
| config.interactive = this.interactive; | ||
| return new Jep(config, false, this.memoryManager, this, shareGlobals) { |
There was a problem hiding this comment.
Would it be technically feasible to return a SharedInterpreter or a SubInterpreter depending on the actual type of this? Possibly through generics (self type) or by overriding this method on the SharedInterpreter and SubInterpreter subclasses. Here you commented that the last instance will take ownership of the state. So when attaching to a SubInterpreter, it would make sense to get a SubInterpreter returned. Once the original SubInterpreter dies, we still have a SubInterpreter (or SharedInterpreter). In that case we wouldn't need a dedicated name for this "bound" interpreter. Maybe the method name could also be "onThisThread" - ideally you could call it multiple times and always get the thread local instance returned. In that case the shareGlobals option is difficult. I would rather see the returned interpreter to be "the same" just usable on another thread.
There was a problem hiding this comment.
Is there a use case where someone would need a SubInterpreter or SharedIntererpeter specifically? After construction the two behave identically so there is no reason I can think of that someone using this API would need to make a distinction. Since the method signature just returns an Interpreter we can adjust the specific type in future versions if we find it is necessary.
Two problems I see with reusing a ThreadLocal instance are ensuring it is closed before the thread ends and accommodating use cases which may want to reuse a thread for a different interpreter. Many use cases may not have these problems but in those cases I think it is simple enough to add a layer with a thread local outside of jep.
| } | ||
|
|
||
| getMemoryManager().closeInterpreter(this); | ||
| boolean closeInterp = getMemoryManager().closeInterpreter(this); |
There was a problem hiding this comment.
I see this variable is passed to close(), but what does it do? Can you name the variable something that makes it clear? It seems weird to pass a boolean closeInterp to a method named close().
There was a problem hiding this comment.
I renamed it to closeInterpState and also added javadoc to the native close() in an attempt to explain it.
This adds a new method to Interpreter for creating a new Interpreter on a different thread that shares interpreter state with an existing interpreter.
The Python C-API has always had a distinction between the thread state and the interpreter state, but jep has limited this functionality because all sub interpreters have a dedicated thread state and interpreter state that are exclusively linked while all shared interpreters use a new thread state tied to a single interpreter state. This change allows users of jep to have more flexibility, similar to the C-API so that any interpreter state can be used from a new thread by creating a thread state.
I anticipate this being useful for sub-interpreters that have complex initialization that are used from multiple threads. For example I have seen cases of thread pools where each thread has an identical SubInterepter. This new API would allow that to be replaced with a single SubInterpeter and other threads in the Pool could create Interpreters on demand that use the same interpreter state. This would avoid the cost of initializing multiple independent interpreters and make it simpler to scale the thread pool to a different number of threads as needed.
When creating an interpreter that shares interpreter state it is also possible to share global variables. SharedInterpreters can already share some python objects using PyObjects or by storing state in modules but I think the ability to share globals creates a more obvious way to share things between interpreters and allows the new interpreters that share interpreter state to act as a clone of the interpreter they are created from.
Another possible use case is to allow access to interpreters from nearly arbitrary threads. An application could potentially initialize interpreters in the background and rather than using asynchronous mechanisms to execute on the interpreter thread simple create a new interpreter sharing state with a background interpreter whenever access to an interpreter is needed. Creating an interpreter that shares interpreter state is faster than creating a brand new interpreter because it is not necessary to initialize the interpreter state again. A new interpreter with shared interpreter state will allow the caller to do anything they would do with the original interpreter on the existing thread.
One area I struggle with on this change is what to call the interpreters created by this change. If anyone has ideas how to clarify the naming I would appreciate input. I have started to refer to the new interpreters as "interpreters that share interpreter state with an existing interpreter" but that is quite a mouthful and probably not very clear to anyone who is not familiar with the C-API. I would like to call them something more catchy like SharedInterpreter or SharedStateInterpreter but since we already have a SharedInterpeter that is defined more narrowly I think it is confusing to also call these "shared". I considered calling the new interpreters SharedSubInterpreters but that is confusing because I wanted to expose the ability to share globals with SharedInterpereters, which fits into the same API but would not create a SharedSubInterpreter. Ultimately it wasn't necessary to define a name in code since there is only one function to access this functionality. But in writing the javadoc, release notes, and this PR I still feel it is not clear what to call an interpreter that shares interpreter state with an existing interpreter. I have also considered CloneInterpereter but that implies a copy of the interpreter state when really the new and old interpreter continue to reference the same thread state. I avoided InterpereterView because it implies the new interpreter is read-only when it can be used for reading and writing.