Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-47053: Reduce deoptimization in BINARY_OP_INPLACE_ADD_UNICODE #31318

Merged
merged 9 commits into from Mar 25, 2022

Conversation

sweeneyde
Copy link
Member

@sweeneyde sweeneyde commented Feb 13, 2022

Hopefully "left-hand side is the same as assignment target" is more stable and less miss-prone than Py_REFCNT == 2.

Note that PyUnicode_Append already has lots of overhead, and it checks if it's safe to work in place.

See faster-cpython/ideas#269

https://bugs.python.org/issue47053

@sweeneyde sweeneyde changed the title Don't de-optimize BINARY_OP_INPLACE_ADD_UNICODE if the refcounts are too big Reduce deoptimization in BINARY_OP_INPLACE_ADD_UNICODE Feb 13, 2022
@sweeneyde
Copy link
Member Author

@sweeneyde sweeneyde commented Mar 17, 2022

A benchmark:

from pyperf import Runner, perf_counter
import sys

LENGTHS = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9]
DATA = [('a',) * x for x in LENGTHS]

def bench(loops):
    data = DATA * loops
    t0 = perf_counter()
    for s in data:
        res = ''
        for c in s:
            res += c
    return perf_counter() - t0

runner = Runner()
runner.bench_time_func("inplace add str", bench)
Mean +- std dev: [main_concat] 4.84 us +- 0.06 us -> [nomiss] 4.24 us +- 0.04 us: 1.14x faster

@sweeneyde sweeneyde changed the title Reduce deoptimization in BINARY_OP_INPLACE_ADD_UNICODE bpo-47053: Reduce deoptimization in BINARY_OP_INPLACE_ADD_UNICODE Mar 17, 2022
@sweeneyde sweeneyde marked this pull request as ready for review Mar 17, 2022
@sweeneyde sweeneyde requested a review from markshannon as a code owner Mar 17, 2022
@sweeneyde sweeneyde requested review from brandtbucher and markshannon and removed request for markshannon Mar 17, 2022
@markshannon
Copy link
Member

@markshannon markshannon commented Mar 18, 2022

Looks good for the microbenchmark, do you have numbers for the full suite?

This is the stats for this PR?

@sweeneyde
Copy link
Member Author

@sweeneyde sweeneyde commented Mar 19, 2022

Looks good for the microbenchmark, do you have numbers for the full suite?

On my not-that-stable laptop with GCC on WSL, using --enable-optimizations --with-lto, I get a 1.02x faster geometric mean: https://gist.github.com/sweeneyde/7fb779d28c55ba4b5e8d40f0bf8f596f

This is the stats for this PR?

Yes, that's correct.

@sweeneyde
Copy link
Member Author

@sweeneyde sweeneyde commented Mar 19, 2022

The previous microbenchmark was with MSCV, I got some different results with related benchmarks on GCC:

from pyperf import Runner, perf_counter
import sys

LENGTHS = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9]
DATA = [('a',) * x for x in LENGTHS]

def bench1(loops):
    data = DATA * loops
    t0 = perf_counter()
    for s in data:
        res = ''
        for c in s:
            res += c
    return perf_counter() - t0


def bench2(loops):
    data = []
    n = 10_000
    for i in range(loops):
        data.append(("a" * n, "a"))
        data.append(("", "a"))
    t0 = perf_counter()
    for s, c in data:
        s += c
        s += c
        s += c
    return perf_counter() - t0

runner = Runner()
runner.bench_time_func("bench1", bench1)
runner.bench_time_func("bench2", bench2)
bench1: Mean +- std dev: [bench_concat_main] 2.48 us +- 0.11 us -> [bench_concat_nomiss] 2.66 us +- 0.15 us: 1.07x slower
bench2: Mean +- std dev: [bench_concat_main] 976 ns +- 24 ns -> [bench_concat_nomiss] 866 ns +- 18 ns: 1.13x faster

Geometric mean: 1.02x faster

It could be just a random result of how the PGO decides to shuffle things around?

@markshannon
Copy link
Member

@markshannon markshannon commented Mar 25, 2022

~0% misses is a very clear improvement on 96.6%.

I'm not worried that the benchmarks numbers are a bit noisy, given this is a clear improvement.

@markshannon markshannon merged commit cca43b7 into python:main Mar 25, 2022
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants