Skip to content

Add ability to create new interpreters that share interpreter state.#606

Merged
bsteffensmeier merged 4 commits into
ninia:dev_4.3from
bsteffensmeier:shared-interpreter-state
Sep 5, 2025
Merged

Add ability to create new interpreters that share interpreter state.#606
bsteffensmeier merged 4 commits into
ninia:dev_4.3from
bsteffensmeier:shared-interpreter-state

Conversation

@bsteffensmeier

Copy link
Copy Markdown
Member

This adds a new method to Interpreter for creating a new Interpreter on a different thread that shares interpreter state with an existing interpreter.

The Python C-API has always had a distinction between the thread state and the interpreter state, but jep has limited this functionality because all sub interpreters have a dedicated thread state and interpreter state that are exclusively linked while all shared interpreters use a new thread state tied to a single interpreter state. This change allows users of jep to have more flexibility, similar to the C-API so that any interpreter state can be used from a new thread by creating a thread state.

I anticipate this being useful for sub-interpreters that have complex initialization that are used from multiple threads. For example I have seen cases of thread pools where each thread has an identical SubInterepter. This new API would allow that to be replaced with a single SubInterpeter and other threads in the Pool could create Interpreters on demand that use the same interpreter state. This would avoid the cost of initializing multiple independent interpreters and make it simpler to scale the thread pool to a different number of threads as needed.

When creating an interpreter that shares interpreter state it is also possible to share global variables. SharedInterpreters can already share some python objects using PyObjects or by storing state in modules but I think the ability to share globals creates a more obvious way to share things between interpreters and allows the new interpreters that share interpreter state to act as a clone of the interpreter they are created from.

Another possible use case is to allow access to interpreters from nearly arbitrary threads. An application could potentially initialize interpreters in the background and rather than using asynchronous mechanisms to execute on the interpreter thread simple create a new interpreter sharing state with a background interpreter whenever access to an interpreter is needed. Creating an interpreter that shares interpreter state is faster than creating a brand new interpreter because it is not necessary to initialize the interpreter state again. A new interpreter with shared interpreter state will allow the caller to do anything they would do with the original interpreter on the existing thread.

One area I struggle with on this change is what to call the interpreters created by this change. If anyone has ideas how to clarify the naming I would appreciate input. I have started to refer to the new interpreters as "interpreters that share interpreter state with an existing interpreter" but that is quite a mouthful and probably not very clear to anyone who is not familiar with the C-API. I would like to call them something more catchy like SharedInterpreter or SharedStateInterpreter but since we already have a SharedInterpeter that is defined more narrowly I think it is confusing to also call these "shared". I considered calling the new interpreters SharedSubInterpreters but that is confusing because I wanted to expose the ability to share globals with SharedInterpereters, which fits into the same API but would not create a SharedSubInterpreter. Ultimately it wasn't necessary to define a name in code since there is only one function to access this functionality. But in writing the javadoc, release notes, and this PR I still feel it is not clear what to call an interpreter that shares interpreter state with an existing interpreter. I have also considered CloneInterpereter but that implies a copy of the interpreter state when really the new and old interpreter continue to reference the same thread state. I avoided InterpereterView because it implies the new interpreter is read-only when it can be used for reading and writing.

@bsteffensmeier bsteffensmeier requested a review from ndjensen May 26, 2025 22:26

@ndjensen ndjensen left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You wrote a wall of text but my brain is wired into the old way of using Jep. Does this essentially removes the threading limitations of Jep? Because another interpreter can come in and access the same globals?

Comment thread src/main/c/Jep/pyembed.c
#else
tstate = Py_NewInterpreter();
#endif
PyObject *mod_main = PyImport_AddModule("__main__"); /* borrowed */

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of importing __main__ here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sub-interpreter has done this as far back as I can find(3.2), I am not entirely sure why and assume typical python interpreters have a __main__ module and this helps compatibility.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't realize you had just moved the code.

Comment thread src/main/java/jep/Interpreter.java Outdated
* interpreter state consists of <code>sys.modules</code> and other internal
* Python structures including the GIL. Using this method is similar to
* creating a {@link SharedInterpreter} but this can be used with
* {@link SubInterpreter}s. Interpreters that share interpreter state can

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what happens if you try to use this with a SharedInterpreter? Having the return type be Interpreter instead of SubInterpreter seems weird to me if this is intended to be used with SubInterpreters. But I also understand it's probably desirable to return the interface instead of the Class. Hmmm....

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works with SharedInterpreters but the ability to share interpreter state between shared interpreters is not interesting. This does add the ability to share globals with a SharedInterpreter which seems useful. I think it is valuable to have this API treat both types of Interpreters the same so that anyone using this capability can easily switch between SubInterpreter and SharedInterpeter and use the same methods.

In the JavaDoc I wanted to call out the similarities to SharedInterpreter because the JavaDoc on SharedInterpreter expands more on what things are included in "interpreter state" and has some additional warnings. I didn't want to repeat everything here so I mentioned the similarity and linked the javadoc.

I don't really consider the Interpreters returned from this method as SubInterpeters even if they are created from a SubInterpeter because SubInterpreters are usually independent from each other and this can be used to create many Interpeters that are not independent from each other. These are really just new threads accessing an existing Interpreter but we don't really have a nomenclature for that.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I understanding correctly that SubInterpreters and SharedInterpreters can be bound to a new interpreter on another thread? So for shared interpreters the new feature is to also share the globals? That would be perfect :)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I understanding correctly that SubInterpreters and SharedInterpreters can be bound to a new interpreter on another thread? So for shared interpreters the new feature is to also share the globals? That would be perfect :)

Yes, that is an accurate summary.

@bsteffensmeier

Copy link
Copy Markdown
Member Author

You wrote a wall of text but my brain is wired into the old way of using Jep. Does this essentially removes the threading limitations of Jep? Because another interpreter can come in and access the same globals?

You are still only allowed one interpreter per thread and each interpreter can only be used on the thread where it was created. But creating new interpreters that share globals basically lets you access an interpreter from any thread.

Comment thread src/main/c/Jep/pyembed.c
#else
tstate = Py_NewInterpreter();
#endif
PyObject *mod_main = PyImport_AddModule("__main__"); /* borrowed */

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't realize you had just moved the code.

Comment thread src/main/java/jep/Interpreter.java Outdated
* interpreter state consists of <code>sys.modules</code> and other internal
* Python structures including the GIL. Using this method is similar to
* creating a {@link SharedInterpreter} but this can be used with
* {@link SubInterpreter}s. Interpreters that share interpreter state can

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this.

@jonny88

jonny88 commented May 28, 2025

Copy link
Copy Markdown

Thanks for working on this - this will be a game changer for multi-threaded Jep usage! :) 👍

Here are some spontaneous naming ideas:
ThreadInterpreter, AttachedInterpreter, LinkedInterpreter, BoundInterpreter, ShadowInterpreter, ProxyInterpreter, FriendInterpreter

Ownership & lifecycle:
Is there a concept of “main” vs. “attached” interpreters, where the main interpreter owns the state? Or are all interpreters peers? For example, if the main interpreter is closed, do its attached interpreters die immediately, or is the shared state only cleaned up once the last interpreter closes?

Proposal:
What if we explicitly expose a thread-independent InterpreterState in Java, and let clients construct lightweight, thread-bound Interpreter instances from that state? This would remove the need to tie an interpreter’s lifetime (or its state) to a specific thread.

Disclaimer: I only have a high-level understanding of Jep and very little experience with the Python C API. ;)

Thanks again for the hard work !

@bsteffensmeier

Copy link
Copy Markdown
Member Author

Here are some spontaneous naming ideas: ThreadInterpreter, AttachedInterpreter, LinkedInterpreter, BoundInterpreter, ShadowInterpreter, ProxyInterpreter, FriendInterpreter

Thanks for all the ideas. I am really liking BoundInterpreter right now. AttachedInterpreter would be my second choice, I think those two are really synonyms. I am considering changing the function for creating the new interpreters from useThread() to bindThread(). Then in the javadoc and other places it feels much more natural to say you are creating a new interpreter bound to an existing interpreter. The only ambiguity I see to that name is that it may not be applicable for the entire lifecycle of the interpreter. If you make a BoundInterpreter and then close the "main" Interpreter then it isn't bound to the other interpreter anymore. But "bound" seems very applicable at the time of creation and that is really the only time we currently need to talk about them.

Ownership & lifecycle: Is there a concept of “main” vs. “attached” interpreters, where the main interpreter owns the state? Or are all interpreters peers? For example, if the main interpreter is closed, do its attached interpreters die immediately, or is the shared state only cleaned up once the last interpreter closes?

Right now, from my reading of the c-api and the way we use it there is no distinction between "main" and "attached". It is theoretically fine to close the main interpreter. It is also possible to create an attached interpreter from other attached interpreters making a single interpreter both "main" and "attached". I have written the Jep code with that assumption so that all the shared state will be cleaned up when the final interpreter is closed. The caveat here is that I have not tested it. If anyone has the bandwidth to check out this branch and play with it that would be phenomenal. I deliberately avoided mentioning much about the lifecycle because I am worried we might run into unexpected obstacles that require keeping the "main" interpreter open so I am prepared to start adding restrictions or caveats as we find problems but I don't know of any problems yet.

Proposal: What if we explicitly expose a thread-independent InterpreterState in Java, and let clients construct lightweight, thread-bound Interpreter instances from that state? This would remove the need to tie an interpreter’s lifetime (or its state) to a specific thread.

That is definitely the direction where I would like to take this. I see this as the first step where we can try out the ideas of expanded sharing between interpreters and get feedback on how useful it is and if there are unexpected side effects or compatibility issues. I don't plan on trying to implement that for the next Jep release but possibly something to put in next year. For those types of use cases this PR will allow people to create an interpreter on a daemon thread and then never use it again and achieve the same basic idea of a thread-independent interpreter state.

Disclaimer: I only have a high-level understanding of Jep and very little experience with the Python C API. ;)

Thanks again for the hard work !

I appreciate the extra input and discussion.

JepConfig config = new JepConfig();
config.classLoader = this.classLoader;
config.interactive = this.interactive;
return new Jep(config, false, this.memoryManager, this, shareGlobals) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be technically feasible to return a SharedInterpreter or a SubInterpreter depending on the actual type of this? Possibly through generics (self type) or by overriding this method on the SharedInterpreter and SubInterpreter subclasses. Here you commented that the last instance will take ownership of the state. So when attaching to a SubInterpreter, it would make sense to get a SubInterpreter returned. Once the original SubInterpreter dies, we still have a SubInterpreter (or SharedInterpreter). In that case we wouldn't need a dedicated name for this "bound" interpreter. Maybe the method name could also be "onThisThread" - ideally you could call it multiple times and always get the thread local instance returned. In that case the shareGlobals option is difficult. I would rather see the returned interpreter to be "the same" just usable on another thread.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a use case where someone would need a SubInterpreter or SharedIntererpeter specifically? After construction the two behave identically so there is no reason I can think of that someone using this API would need to make a distinction. Since the method signature just returns an Interpreter we can adjust the specific type in future versions if we find it is necessary.

Two problems I see with reusing a ThreadLocal instance are ensuring it is closed before the thread ends and accommodating use cases which may want to reuse a thread for a different interpreter. Many use cases may not have these problems but in those cases I think it is simple enough to add a layer with a thread local outside of jep.

Comment thread src/main/java/jep/Interpreter.java Outdated
Comment thread src/main/java/jep/Interpreter.java
Comment thread src/main/java/jep/Jep.java Outdated
}

getMemoryManager().closeInterpreter(this);
boolean closeInterp = getMemoryManager().closeInterpreter(this);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this variable is passed to close(), but what does it do? Can you name the variable something that makes it clear? It seems weird to pass a boolean closeInterp to a method named close().

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed it to closeInterpState and also added javadoc to the native close() in an attempt to explain it.

Comment thread src/main/java/jep/Interpreter.java
Comment thread src/main/java/jep/Interpreter.java
@bsteffensmeier bsteffensmeier merged commit efed1d6 into ninia:dev_4.3 Sep 5, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants