YJIT: Allow inlining ISEQ calls with a block#9622
Conversation
09b9d49 to
92acf33
Compare
92acf33 to
5357897
Compare
|
I don't quite understand how this PR works. Please bear with me. The code for send is so complex that I find it hard to follow sometimes. You say it's duplicating the ISEQ for each block. Conceptually, I understand what that means. However, I don't understand why this has an impact on the maximum number of versions in the JIT. Are multiple entry blocks being generated for the ISEQs based on the block given in the call? I guess I'm wondering: would it be possible to somehow duplicate the ISEQ directly at the interpreter level? That might be slightly less complex on the JIT side, and it would avoid having to add extra state to the context? |
Whenever it duplicates each basic block of Say, we have: module Kernel
def tap
yield(self) # basic block 1: putself + invokeblock (version a, version b)
self # basic block 2: putself + leave (version a, version b)
end
end
1.tap { :a }
2.tap { :b }
They would be located linearly in the generated code. It's basically inlining except the frame push/pop for
If "entry blocks" means This PR duplicates blocks of
Technically, it's possible if you rewrite how the interpreter does method lookup. The lookup is currently based on It would also mean to add extra memory overhead to allocate more ISEQs and add more GC pressure because of them. So, yes, it's possible, but harder to implement, uses more memory, and makes things slower in unrelated places. |
| // If the callee has :inline_block annotation and the callsite has a block ISEQ, | ||
| // duplicate a callee block for each block ISEQ to make its `yield` monomorphic. | ||
| if let (Some(BlockHandler::BlockISeq(iseq)), true) = (block, builtin_attrs & BUILTIN_ATTR_INLINE_BLOCK != 0) { | ||
| callee_ctx.set_inline_block(iseq); |
There was a problem hiding this comment.
I guess this is the magic sauce. Because the callee_ctx is probably doesn't match anything that currently exists, it's going to generate a new block directly following the call.
This PR introduces an annotation
Primitive.attr! :inline_blockto allow monomorphizingyieldin methods likeArray#each#9533.yjit-bench
This PR or
Array#eachin Ruby (#9533) doesn't have a significant impact on benchmarks by itself.However, combining this PR and
Array#eachin Ruby with the annotation (branch), it gives the following speedups.hexapdfis 7% fasterchunky-pngandruby-lspare 2% fasteractiverecord,liquid-c, andrailsbenchare 1% fasterSee also: the number of Ruby block calls from
Array#eachruby/ruby-bench#168Memory usage
Running 25 itrs on
lobsterswith this PR andArray#eachusing the annotation (branch),code_region_sizewas increased by 1.2%, andyjit_alloc_sizewas increased by 4.5%.For optimizing the size of
Context, I'm thinking of having a globalHashMaplike{ iseq1 => 1, iseq2 => 2, ... }and store anu16integer instead of anu64ISEQ pointer as a next step, which would likely decreaseyjit_alloc_size.Before
After
microbenchmark
This PR makes the synthetic benchmark of megamorphic
yieldinInteger#times4.5x faster.