Skip to content

Reduce function calling overhead by ~25% in ReferencePool::update_counts:#1608

Merged
davidhewitt merged 1 commit intoPyO3:mainfrom
alex:opt-no-args
May 15, 2021
Merged

Reduce function calling overhead by ~25% in ReferencePool::update_counts:#1608
davidhewitt merged 1 commit intoPyO3:mainfrom
alex:opt-no-args

Conversation

@alex
Copy link
Copy Markdown
Member

@alex alex commented May 15, 2021

  1. Place both increfs and decrefs behind a single mutex, rather than two. Even uncontended mutexes aren't free to acquire.
  2. Keep a dirty tracking bool to avoid acquiring any mutexes at all in the common case of no modifications (because everything happened with the GIL held)

Benchmarks

Before

(.venv) alex@media-01:~/p/pyo3$ python3 -mtimeit -s "from pyo3_benchmarks import no_args" "no_args()"
5000000 loops, best of 5: 63.6 nsec per loop
(.venv) alex@media-01:~/p/pyo3$ python3 -mtimeit -s "from pyo3_benchmarks import no_args" "no_args()"
5000000 loops, best of 5: 63.7 nsec per loop
(.venv) alex@media-01:~/p/pyo3$ python3 -mtimeit -s "from pyo3_benchmarks import no_args" "no_args()"
5000000 loops, best of 5: 64.1 nsec per loop
(.venv) alex@media-01:~/p/pyo3$ python3 -mtimeit -s "from pyo3_benchmarks import no_args" "no_args()"
5000000 loops, best of 5: 63.6 nsec per loop
(.venv) alex@media-01:~/p/pyo3$ python3 -mtimeit -s "from pyo3_benchmarks import no_args" "no_args()"
5000000 loops, best of 5: 63.5 nsec per loop

With single lock

(.venv) alex@media-01:~/p/pyo3$ python3 -mtimeit -s "from pyo3_benchmarks import no_args" "no_args()"
5000000 loops, best of 5: 51.9 nsec per loop
(.venv) alex@media-01:~/p/pyo3$ python3 -mtimeit -s "from pyo3_benchmarks import no_args" "no_args()"
5000000 loops, best of 5: 52 nsec per loop
(.venv) alex@media-01:~/p/pyo3$ python3 -mtimeit -s "from pyo3_benchmarks import no_args" "no_args()"
5000000 loops, best of 5: 52.9 nsec per loop
(.venv) alex@media-01:~/p/pyo3$ python3 -mtimeit -s "from pyo3_benchmarks import no_args" "no_args()"
5000000 loops, best of 5: 51.8 nsec per loop
(.venv) alex@media-01:~/p/pyo3$ python3 -mtimeit -s "from pyo3_benchmarks import no_args" "no_args()"
5000000 loops, best of 5: 54.7 nsec per loop

With both single lock and dirty flag

(.venv) alex@media-01:~/p/pyo3$ python3 -mtimeit -s "from pyo3_benchmarks import no_args" "no_args()"
5000000 loops, best of 5: 47.2 nsec per loop
(.venv) alex@media-01:~/p/pyo3$ python3 -mtimeit -s "from pyo3_benchmarks import no_args" "no_args()"
5000000 loops, best of 5: 46.7 nsec per loop
(.venv) alex@media-01:~/p/pyo3$ python3 -mtimeit -s "from pyo3_benchmarks import no_args" "no_args()"
5000000 loops, best of 5: 46.6 nsec per loop
(.venv) alex@media-01:~/p/pyo3$ python3 -mtimeit -s "from pyo3_benchmarks import no_args" "no_args()"
5000000 loops, best of 5: 46.6 nsec per loop
(.venv) alex@media-01:~/p/pyo3$ python3 -mtimeit -s "from pyo3_benchmarks import no_args" "no_args()"
5000000 loops, best of 5: 47.1 nsec per loop

Copy link
Copy Markdown
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! These are some really great optimizations which many people will be happy with. I think there's a potential deadlock though (which we can avoid), see comment below.

src/gil.rs Outdated
}

for ptr in swap_vec_with_lock!(self.pointers_to_decref) {
for ptr in swap_vec!(ops.1) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be careful here. Decrementing references can run arbitrary Python drop code, which can lead to GIL release which may eventually deadlock because we hold the lock here. (I think this mutex is also non reentrant so may deadlock on a single thread.)

After swapping ops.1 we should release the lock before decreasing any references.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, should be good now.

…nts:

1) Place both increfs and decrefs behind a single mutex, rather than two. Even uncontended mutexes aren't free to acquire.
2) Keep a dirty tracking bool to avoid acquiring any mutexes at all in the common case of no modifications (because everything happened with the GIL held)
Copy link
Copy Markdown
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks very much 😌

@davidhewitt davidhewitt merged commit c48e5b0 into PyO3:main May 15, 2021
@alex alex deleted the opt-no-args branch May 15, 2021 23:53
@kngwyu
Copy link
Copy Markdown
Member

kngwyu commented May 16, 2021

Looks like I was late for the party... Very interesting to see this optimization works, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants