Skip to content

refac: extract InstructionContext for Interpreter and Host#2530

Closed
sergey-melnychuk wants to merge 5 commits intobluealloy:mainfrom
sergey-melnychuk:sm/issues/2359/extract-instruction-context
Closed

refac: extract InstructionContext for Interpreter and Host#2530
sergey-melnychuk wants to merge 5 commits intobluealloy:mainfrom
sergey-melnychuk:sm/issues/2359/extract-instruction-context

Conversation

@sergey-melnychuk
Copy link
Copy Markdown
Contributor

Closes #2359

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented May 20, 2025

CodSpeed Performance Report

Merging #2530 will degrade performances by 3.02%

Comparing sergey-melnychuk:sm/issues/2359/extract-instruction-context (7e66fd2) with main (03cf306)

Summary

❌ 2 regressions
✅ 25 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
burntpix 509.2 ms 525 ms -3.02%
snailtracer 184.6 ms 188.9 ms -2.28%


// Execute instruction.
instruction_table[opcode as usize](self, host)
let mut context = InstructionContext {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constructing this is expensive, move it to run_plain and change step to receive InstructionContext

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rakita , thank you for your comment.

Constructing this is expensive,

What makes you think that? Runtime costs seem to be just two pointers copied, which is far from expensive I'd say.

move it to run_plain and change step to receive InstructionContext

Tried that, borrow checker goes crazy about multiple mutable borrows of self in that case (unless I miss some more elegant way to deal with it, then ofc any hints would be much appreciated).

So this seems to be the least intrusive integration point for the InstructionContext that does not require further de-coupling Interpreter from actual execution internals.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

step logic would need to go outside of Interpreter, or you could add step inside InstructionContext

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very hot loop, so any overhead is noticeable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @rakita , it makes way more sense now.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is big, and it probably related to the loop. I have idea to use only pointers to check for the end of execution, it could potentially make it faster. But would try to make a poc in separate PR

Copy link
Copy Markdown
Contributor

@dyxushuai dyxushuai May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is big, and it probably related to the loop. I have idea to use only pointers to check for the end of execution, it could potentially make it faster. But would try to make a poc in separate PR

Maybe I knew the cause of the performance degradation, we use a wrapper type for combine two references for easy of use.

pub struct InstructionContext<'a, H: ?Sized, ITy: InterpreterTypes> {
    pub host: &'a mut H,
    pub interpreter: &'a mut Interpreter<ITy>,
}

and used as a function parameter.

pub type Instruction<W, H> = fn(&mut context::InstructionContext<'_, H, W>);

Here we used &mut InstructionContext instead of &mut Interpreter and &mut Host. But here we use indirect addressing, aka reference to reference,

double addressing: &mut InstructionContext -> &mut host OR &mut interpreter

which results in an extra ptr read instruction at least. For example:

pub struct Wrapper<'a>(&'a i32, &'a i32);

pub fn foo2(c: Wrapper){
    println!("{}, {}", c.0, c.1);
}

pub fn foo3(c: &Wrapper){
    println!("{}, {}", c.0, c.1);
}
image

So, we only need to use InstructionContext directly instead of a reference and optimized by rust's zero-cost-abstraction at compile time. For example, using ctx.host and ctx.interpreter in fn data_load is equivalent to using host and interpreter immediately like below:

- pub type Instruction<W, H> = fn(&mut context::InstructionContext<'_, H, W>);
+ pub type Instruction<W, H> = fn(context::InstructionContext<'_, H, W>);

pub fn data_load<WIRE: InterpreterTypes, H: Host + ?Sized>(
-	context: &mut InstructionContext<'_, H, WIRE>,
+	context: InstructionContext<'_, H, WIRE>,
) 

cc @rakita

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is worth checking out!

Copy link
Copy Markdown
Contributor

@dyxushuai dyxushuai May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is worth checking out!

@sergey-melnychuk Can you try it?

- pub type Instruction<W, H> = fn(&mut context::InstructionContext<'_, H, W>);
+ pub type Instruction<W, H> = fn(context::InstructionContext<'_, H, W>);

pub fn data_load<WIRE: InterpreterTypes, H: Host + ?Sized>(
-	ctx: &mut InstructionContext<'_, H, WIRE>,
+	ctx: InstructionContext<'_, H, WIRE>,
) 

It is big, and it probably related to the loop. I have idea to use only pointers to check for the end of execution, it could potentially make it faster. But would try to make a poc in separate PR

Maybe I knew the cause of the performance degradation, we use a wrapper type for combine two references for easy of use.

pub struct InstructionContext<'a, H: ?Sized, ITy: InterpreterTypes> {
    pub host: &'a mut H,
    pub interpreter: &'a mut Interpreter<ITy>,
}

and used as a function parameter.

pub type Instruction<W, H> = fn(&mut context::InstructionContext<'_, H, W>);

Here we used &mut InstructionContext instead of &mut Interpreter and &mut Host. But here we use indirect addressing, aka reference to reference,

double addressing: &mut InstructionContext -> &mut host OR &mut interpreter

which results in an extra ptr read instruction at least. For example:

pub struct Wrapper<'a>(&'a i32, &'a i32);

pub fn foo2(c: Wrapper){
    println!("{}, {}", c.0, c.1);
}

pub fn foo3(c: &Wrapper){
    println!("{}, {}", c.0, c.1);
}
image So, we only need to use `InstructionContext` directly instead of a reference and optimized by rust's zero-cost-abstraction at compile time. For example, using `ctx.host` and `ctx.interpreter` in `fn data_load` is equivalent to using `host` and `interpreter` immediately like below:
- pub type Instruction<W, H> = fn(&mut context::InstructionContext<'_, H, W>);
+ pub type Instruction<W, H> = fn(context::InstructionContext<'_, H, W>);

pub fn data_load<WIRE: InterpreterTypes, H: Host + ?Sized>(
-	context: &mut InstructionContext<'_, H, WIRE>,
+	context: InstructionContext<'_, H, WIRE>,
) 

cc @rakita

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have tried it here #2548

Abd it works!

@sergey-melnychuk can you enable me to push this to your branch?

@sergey-melnychuk sergey-melnychuk force-pushed the sm/issues/2359/extract-instruction-context branch from fd50740 to e5ac2c9 Compare May 24, 2025 12:26
@sergey-melnychuk sergey-melnychuk force-pushed the sm/issues/2359/extract-instruction-context branch from e321f28 to 028c3e9 Compare May 26, 2025 13:17
@rakita
Copy link
Copy Markdown
Member

rakita commented May 27, 2025

@sergey-melnychuk I will take this over to push it over the line, amazing work btw

@rakita
Copy link
Copy Markdown
Member

rakita commented May 27, 2025

Subseed with #2548

@rakita rakita closed this May 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make InstructionContext that contains refs of Interpreter and Host

3 participants