Reduce function calling overhead by ~25% in ReferencePool::update_counts:#1608
Reduce function calling overhead by ~25% in ReferencePool::update_counts:#1608davidhewitt merged 1 commit intoPyO3:mainfrom
Conversation
davidhewitt
left a comment
There was a problem hiding this comment.
Awesome! These are some really great optimizations which many people will be happy with. I think there's a potential deadlock though (which we can avoid), see comment below.
src/gil.rs
Outdated
| } | ||
|
|
||
| for ptr in swap_vec_with_lock!(self.pointers_to_decref) { | ||
| for ptr in swap_vec!(ops.1) { |
There was a problem hiding this comment.
We need to be careful here. Decrementing references can run arbitrary Python drop code, which can lead to GIL release which may eventually deadlock because we hold the lock here. (I think this mutex is also non reentrant so may deadlock on a single thread.)
After swapping ops.1 we should release the lock before decreasing any references.
There was a problem hiding this comment.
Good call, should be good now.
…nts: 1) Place both increfs and decrefs behind a single mutex, rather than two. Even uncontended mutexes aren't free to acquire. 2) Keep a dirty tracking bool to avoid acquiring any mutexes at all in the common case of no modifications (because everything happened with the GIL held)
davidhewitt
left a comment
There was a problem hiding this comment.
Looks great! Thanks very much 😌
|
Looks like I was late for the party... Very interesting to see this optimization works, thanks. |
Benchmarks
Before
With single lock
With both single lock and dirty flag