bpo-44187: Quickening infrastructure #26264
Conversation
|
@markshannon one thing I noticed is the introduction of HotPy prefix that do not exist on CPython. I understand that the heritage of this code based on some previous projects. |
…tation to make it less surprising.
|
Here's my first round of comments; I haven't gotten to specialize.c yet, but I figured I'd send this in case I don't get to that tonight (which is likely). |
Misc/NEWS.d/next/Core and Builtins/2021-05-20-12-43-04.bpo-44187.3lk0L1.rst
Outdated
Show resolved
Hide resolved
|
Okay, I got through everything this time, and I think I understand it. Some suggestions to make it easier to understand for new readers. |
| static uint8_t adaptive[256] = { 0 }; | ||
|
|
||
| static uint8_t cache_requirements[256] = { 0 }; |
gvanrossum
May 26, 2021
Member
These variables could use some comment explaining their purpose. (Also, maybe we should plan to generate these from info added to opcode.py, like opcode_targets.h?
These variables could use some comment explaining their purpose. (Also, maybe we should plan to generate these from info added to opcode.py, like opcode_targets.h?
| cache_offset = i/2; | ||
| } | ||
| else if (oparg > 255) { | ||
| /* Cannot access required cache_offset */ |
gvanrossum
May 26, 2021
Member
Maybe in some kind of debug mode it would be nice to report whether this happens at all? If we see this frequently we need to change the strategy. OTOH maybe we never expect it and we could put assert(0) here???
Maybe in some kind of debug mode it would be nice to report whether this happens at all? If we see this frequently we need to change the strategy. OTOH maybe we never expect it and we could put assert(0) here???
| /* Cannot access required cache_offset */ | ||
| continue; | ||
| } | ||
| cache_offset += need; |
gvanrossum
May 26, 2021
Member
Looks like this will over-count if there are eligible opcodes with an EXTENDED_ARG prefix.
Looks like this will over-count if there are eligible opcodes with an EXTENDED_ARG prefix.
| @@ -1343,6 +1343,14 @@ eval_frame_handle_pending(PyThreadState *tstate) | |||
| #define JUMPTO(x) (next_instr = first_instr + (x)) | |||
| #define JUMPBY(x) (next_instr += (x)) | |||
|
|
|||
| /* Get opcode and opcode from original instructions, not quickened form. */ | |||
iritkatriel
May 26, 2021
Member
Suggested change
/* Get opcode and opcode from original instructions, not quickened form. */
/* Get opcode and oparg from original instructions, not quickened form. */
| /* Get opcode and opcode from original instructions, not quickened form. */ | |
| /* Get opcode and oparg from original instructions, not quickened form. */ |
| return &last_cache_plus_one[-1-n].entry; | ||
| } | ||
|
|
||
| /* Following two functions determine the index of a cache entry from the |
iritkatriel
May 26, 2021
Member
This comment doesn't seem correct - they take the index and return oparg or offset.
This comment doesn't seem correct - they take the index and return oparg or offset.
|
@gvanrossum , @iritkatriel I think I've addressed all your comments.
|
|
Looking good. Maybe we should ask Dino to have a look? (Between this PR and PEP 659 he should have enough to judge the design, right?) Also, please look at the compiler warnings found by GitHub Actions for Windows (x64). |
| <instr 0> <instr 1> <instr 2> <instr 3> <--- co->co_first_instr | ||
| <instr 4> <instr 5> <instr 6> <instr 7> | ||
| ... | ||
| <instr N-1> |
gvanrossum
May 28, 2021
Member
Maybe M-1? The number of cache entries doesn't have to match the number of instructions.
Maybe M-1? The number of cache entries doesn't have to match the number of instructions.
| } | ||
| previous_opcode = opcode; | ||
| } | ||
| return cache_offset+1; |
gvanrossum
May 28, 2021
Member
Suggested change
return cache_offset+1;
return cache_offset + 1; // One extra for the count entry
| return cache_offset+1; | |
| return cache_offset + 1; // One extra for the count entry |
|
I've been implementing some specialization of
I've implemented those in the last three commits. |
| } | ||
| Py_ssize_t size = PyBytes_GET_SIZE(code->co_code); | ||
| int instr_count = (int)(size/sizeof(_Py_CODEUNIT)); | ||
| if (instr_count > MAX_SIZE_TO_QUICKEN) { |
iritkatriel
Jun 5, 2021
Member
Would it be possible instead to quicken the first 5000 instructions and then exit? (That would avoid a cliff where a minor change in the code tips it over the no-optimization limit and makes it run much slower.)
Would it be possible instead to quicken the first 5000 instructions and then exit? (That would avoid a cliff where a minor change in the code tips it over the no-optimization limit and makes it run much slower.)
bluss
Jun 6, 2021
There's a memory cost to creating a duplicate code array, so it would risk wasting memory unproportionally.
There's a memory cost to creating a duplicate code array, so it would risk wasting memory unproportionally.
|
If you want to schedule another build, you need to add the " |
|
If you want to schedule another build, you need to add the " |
|
Windows tests were failing before this PR. |
001eb52
into
python:main
| @@ -73,9 +73,10 @@ def get_pooled_int(value): | |||
| alloc_deltas = [0] * repcount | |||
| fd_deltas = [0] * repcount | |||
| getallocatedblocks = sys.getallocatedblocks | |||
| getallocatedblocks = sys.getallocatedblocks | |||
methane
Jun 8, 2021
Member
why?
why?
First step toward implementing PEP 659.
https://bugs.python.org/issue44187