Conversation
This is to follow up PR #30630, we need to have GIL when calling jit::toPyObject(), for some binded functions need to be taged with GIL release if underneath C++ codes requires GIL. so 1. pyRef::to_here() and pyRef::local_value() added GIL 2. pyRef::pickle and pyRef::unpickle() added GIL release tag 3. in request_callback_impl, also added GIL as needed 4. for typeParser, use cached jitCompilationUnit_, also clean it up in cleanUp() function Differential Revision: [D19612337](https://our.internmc.facebook.com/intern/diff/D19612337/) [ghstack-poisoned]
This is to follow up PR #30630, we need to have GIL when calling jit::toPyObject(), for some binded functions need to be taged with GIL release if underneath C++ codes requires GIL. so 1. pyRef::to_here() and pyRef::local_value() added GIL 2. pyRef::pickle and pyRef::unpickle() added GIL release tag 3. in request_callback_impl, also added GIL as needed 4. for typeParser, use cached jitCompilationUnit_, also clean it up in cleanUp() function Differential Revision: [D19612337](https://our.internmc.facebook.com/intern/diff/D19612337/) ghstack-source-id: 97355818 Pull Request resolved: #32748
💊 CircleCI build failures summary and remediationsAs of commit 6855011:
Detailed failure analysisOne may explore the probable reasons each build failed interactively on the Dr. CI website. 🕵️ 1 new failure recognized by patternsThe following build failures do not appear to be due to upstream breakage:
|
mrshenli
left a comment
There was a problem hiding this comment.
LGTM! Thanks!
I just have some minor questions inline.
Regarding the test failure, looks like test_quantized_rnn is cleared on master. Could you please try rebase to the current master to trigger tests again?
| { | ||
| // acquiring GIL as torch::jit::toPyObject creates new py::object without | ||
| // grabbing the GIL. | ||
| pybind11::gil_scoped_acquire ag; |
There was a problem hiding this comment.
(can be done in followup PR) do we need to use PROFILE_GIL_SCOPED_ACQUIRE to profile this?
| pySerialize_ = py::none(); | ||
| pyHandleException_ = py::none(); | ||
| jitCompilationUnit_ = nullptr; | ||
| typeParser_ = nullptr; |
There was a problem hiding this comment.
Curious, it does not seem ScriptTypeParser depends on Python. Is setting it to nullptr here just being defensive or will it indeed hit error in certain destruction order?
There was a problem hiding this comment.
just be defensive, do you think whether there is an issue to do so?
| // __setstate__ | ||
| return PyRRef::unpickle(t); | ||
| }), | ||
| py::call_guard<py::gil_scoped_release>()) |
There was a problem hiding this comment.
would I be correct if I assume this GIL acquirement is moved to rref_impl.cpp?
There was a problem hiding this comment.
I added this tag, mainly because PyRRef::unpickle() called pythonrpchandler related stuff that requires GIL. but yes, we moved GIL to rref_impl.cpp as well
There was a problem hiding this comment.
Not important, but to be accurate.
In PyRRef::unpickle(), it seems PythonRpcHandler::getInstance().parseTypeFromStr(rfd.type_str_); does not require GIL.
There was a problem hiding this comment.
yes, but it could wait for pythonrpchandler to be constructed in another thread, and that thread is trying to aquire GIL, if we do not release GIL in unpickle, it could result in deadlock
| if (rref_->isPyObj()) { | ||
| // UserRRef<py::object>::toHere() calls python_rpc_handler which acquires | ||
| // GIL. | ||
| return jit::toPyObject( |
There was a problem hiding this comment.
2 facts that help me understand jit::toPyObject() (which returns a py::object) requires GIL.
- Constructing and destructing an
py::objectC++ object changes the Python object's ref count.
Copy constructor; always increases the reference count.
Destructor; automatically calls
handle::dec_ref().
Ref: https://pybind11.readthedocs.io/en/stable/reference.html#_CPPv4N6object6objectERK6object
- Changing Python object's ref count requires acquiring GIL.
The Python interpreter is not fully thread-safe. In order to support multi-threaded Python programs, there’s a global lock, called the global interpreter lock or GIL, that must be held by the current thread before it can safely access Python objects.
Without the lock, even the simplest operations could cause problems in a multi-threaded program: for example, when two threads simultaneously increment the reference count of the same object, the reference count could end up being incremented only once instead of twice.
Ref: https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock
Combing them together, when constructing or using a py:object C++ object, the GIL should be acquired.
|
|
||
| py::tuple RRefForkData::toPyTuple() const { | ||
| // add GIL as it is contructing a py::object | ||
| pybind11::gil_scoped_acquire ag; |
There was a problem hiding this comment.
Do we need GIL for RRefForkData::fromPyTuple as well, since RRefForkData::fromPyTuple is using the py::object subtype, py::tuple? or it has already acquired GIL, how?
There was a problem hiding this comment.
sounds good, let me add GIL for fromPyTuple as well
There was a problem hiding this comment.
it's a const reference of the py::tuple and does not seem to create/delete py::object. Is it still necessary to use GIL in fromPyTuple.
There was a problem hiding this comment.
I guess GIL here can help avoid underneath python object pointed by pyTuple could be mutated?
There was a problem hiding this comment.
@mrshenli Accessing (both read and write) a Python object also requires GIL, otherwise, other thread could get GIL and invalidate that Python object.
|
@mrshenli I think we added GIL for all py::objects after adding GIL to fromPyTuple |
This is to follow up PR #30630, we need to have GIL when calling jit::toPyObject(), for some binded functions need to be taged with GIL release if underneath C++ codes requires GIL. so 1. pyRef::to_here() and pyRef::local_value() added GIL 2. pyRef::pickle and pyRef::unpickle() added GIL release tag 3. in request_callback_impl, also added GIL as needed 4. for typeParser, use cached jitCompilationUnit_, also clean it up in cleanUp() function Differential Revision: [D19612337](https://our.internmc.facebook.com/intern/diff/D19612337/) [ghstack-poisoned]
Pull Request resolved: #32748 This is to follow up PR #30630, we need to have GIL when calling jit::toPyObject(), for some binded functions need to be taged with GIL release if underneath C++ codes requires GIL. so 1. pyRef::to_here() and pyRef::local_value() added GIL 2. pyRef::pickle and pyRef::unpickle() added GIL release tag 3. in request_callback_impl, also added GIL as needed 4. for typeParser, use cached jitCompilationUnit_, also clean it up in cleanUp() function ghstack-source-id: 97373011 Differential Revision: [D19612337](https://our.internmc.facebook.com/intern/diff/D19612337/)
|
|
||
| py::tuple RRefForkData::toPyTuple() const { | ||
| // add GIL as it is contructing a py::object | ||
| pybind11::gil_scoped_acquire ag; |
There was a problem hiding this comment.
it's a const reference of the py::tuple and does not seem to create/delete py::object. Is it still necessary to use GIL in fromPyTuple.
Summary: Pull Request resolved: pytorch#32748 This is to follow up PR pytorch#30630, we need to have GIL when calling jit::toPyObject(), for some binded functions need to be taged with GIL release if underneath C++ codes requires GIL. so 1. pyRef::to_here() and pyRef::local_value() added GIL 2. pyRef::pickle and pyRef::unpickle() added GIL release tag 3. in request_callback_impl, also added GIL as needed 4. for typeParser, use cached jitCompilationUnit_, also clean it up in cleanUp() function ghstack-source-id: 97373011 Test Plan: unit test Differential Revision: D19612337 fbshipit-source-id: 4d09f9b52ba626545ae7d31fea6b671301ed3890
|
This pull request has been merged in b5d8982. |
1 similar comment
|
This pull request has been merged in b5d8982. |
Summary: Pull Request resolved: pytorch#32748 This is to follow up PR pytorch#30630, we need to have GIL when calling jit::toPyObject(), for some binded functions need to be taged with GIL release if underneath C++ codes requires GIL. so 1. pyRef::to_here() and pyRef::local_value() added GIL 2. pyRef::pickle and pyRef::unpickle() added GIL release tag 3. in request_callback_impl, also added GIL as needed 4. for typeParser, use cached jitCompilationUnit_, also clean it up in cleanUp() function ghstack-source-id: 97373011 Test Plan: unit test Differential Revision: D19612337 fbshipit-source-id: 4d09f9b52ba626545ae7d31fea6b671301ed3890
Stack from ghstack:
This is to follow up PR #30630, we need to have GIL when calling jit::toPyObject(), for some binded functions need to be taged with GIL release if underneath C++ codes requires GIL. so
Differential Revision: D19612337