refac: extract InstructionContext for Interpreter and Host#2530
refac: extract InstructionContext for Interpreter and Host#2530sergey-melnychuk wants to merge 5 commits intobluealloy:mainfrom
InstructionContext for Interpreter and Host#2530Conversation
CodSpeed Performance ReportMerging #2530 will degrade performances by 3.02%Comparing Summary
Benchmarks breakdown
|
|
|
||
| // Execute instruction. | ||
| instruction_table[opcode as usize](self, host) | ||
| let mut context = InstructionContext { |
There was a problem hiding this comment.
Constructing this is expensive, move it to run_plain and change step to receive InstructionContext
There was a problem hiding this comment.
Hi @rakita , thank you for your comment.
Constructing this is expensive,
What makes you think that? Runtime costs seem to be just two pointers copied, which is far from expensive I'd say.
move it to
run_plainand changestepto receiveInstructionContext
Tried that, borrow checker goes crazy about multiple mutable borrows of self in that case (unless I miss some more elegant way to deal with it, then ofc any hints would be much appreciated).
So this seems to be the least intrusive integration point for the InstructionContext that does not require further de-coupling Interpreter from actual execution internals.
There was a problem hiding this comment.
step logic would need to go outside of Interpreter, or you could add step inside InstructionContext
There was a problem hiding this comment.
This is very hot loop, so any overhead is noticeable.
There was a problem hiding this comment.
Thank you @rakita , it makes way more sense now.
There was a problem hiding this comment.
It is big, and it probably related to the loop. I have idea to use only pointers to check for the end of execution, it could potentially make it faster. But would try to make a poc in separate PR
There was a problem hiding this comment.
It is big, and it probably related to the loop. I have idea to use only pointers to check for the end of execution, it could potentially make it faster. But would try to make a poc in separate PR
Maybe I knew the cause of the performance degradation, we use a wrapper type for combine two references for easy of use.
pub struct InstructionContext<'a, H: ?Sized, ITy: InterpreterTypes> {
pub host: &'a mut H,
pub interpreter: &'a mut Interpreter<ITy>,
}and used as a function parameter.
pub type Instruction<W, H> = fn(&mut context::InstructionContext<'_, H, W>);Here we used &mut InstructionContext instead of &mut Interpreter and &mut Host. But here we use indirect addressing, aka reference to reference,
double addressing: &mut InstructionContext -> &mut host OR &mut interpreter
which results in an extra ptr read instruction at least. For example:
pub struct Wrapper<'a>(&'a i32, &'a i32);
pub fn foo2(c: Wrapper){
println!("{}, {}", c.0, c.1);
}
pub fn foo3(c: &Wrapper){
println!("{}, {}", c.0, c.1);
}
So, we only need to use InstructionContext directly instead of a reference and optimized by rust's zero-cost-abstraction at compile time. For example, using ctx.host and ctx.interpreter in fn data_load is equivalent to using host and interpreter immediately like below:
- pub type Instruction<W, H> = fn(&mut context::InstructionContext<'_, H, W>);
+ pub type Instruction<W, H> = fn(context::InstructionContext<'_, H, W>);
pub fn data_load<WIRE: InterpreterTypes, H: Host + ?Sized>(
- context: &mut InstructionContext<'_, H, WIRE>,
+ context: InstructionContext<'_, H, WIRE>,
) cc @rakita
There was a problem hiding this comment.
This is worth checking out!
@sergey-melnychuk Can you try it?
- pub type Instruction<W, H> = fn(&mut context::InstructionContext<'_, H, W>);
+ pub type Instruction<W, H> = fn(context::InstructionContext<'_, H, W>);
pub fn data_load<WIRE: InterpreterTypes, H: Host + ?Sized>(
- ctx: &mut InstructionContext<'_, H, WIRE>,
+ ctx: InstructionContext<'_, H, WIRE>,
) It is big, and it probably related to the loop. I have idea to use only pointers to check for the end of execution, it could potentially make it faster. But would try to make a poc in separate PR
Maybe I knew the cause of the performance degradation, we use a wrapper type for combine two references for easy of use.
pub struct InstructionContext<'a, H: ?Sized, ITy: InterpreterTypes> { pub host: &'a mut H, pub interpreter: &'a mut Interpreter<ITy>, }and used as a function parameter.
pub type Instruction<W, H> = fn(&mut context::InstructionContext<'_, H, W>);Here we used
&mut InstructionContextinstead of&mut Interpreterand&mut Host. But here we use indirect addressing, akareference to reference,double addressing: &mut InstructionContext -> &mut host OR &mut interpreterwhich results in an extra ptr read instruction at least. For example:
pub struct Wrapper<'a>(&'a i32, &'a i32); pub fn foo2(c: Wrapper){ println!("{}, {}", c.0, c.1); } pub fn foo3(c: &Wrapper){ println!("{}, {}", c.0, c.1); }So, we only need to use `InstructionContext` directly instead of a reference and optimized by rust's zero-cost-abstraction at compile time. For example, using `ctx.host` and `ctx.interpreter` in `fn data_load` is equivalent to using `host` and `interpreter` immediately like below:
- pub type Instruction<W, H> = fn(&mut context::InstructionContext<'_, H, W>); + pub type Instruction<W, H> = fn(context::InstructionContext<'_, H, W>); pub fn data_load<WIRE: InterpreterTypes, H: Host + ?Sized>( - context: &mut InstructionContext<'_, H, WIRE>, + context: InstructionContext<'_, H, WIRE>, )cc @rakita
There was a problem hiding this comment.
Have tried it here #2548
Abd it works!
@sergey-melnychuk can you enable me to push this to your branch?
fd50740 to
e5ac2c9
Compare
e321f28 to
028c3e9
Compare
d20cd50 to
7e66fd2
Compare
|
@sergey-melnychuk I will take this over to push it over the line, amazing work btw |
|
Subseed with #2548 |

Closes #2359