Skip to content

Don't copy VMBuiltinFunctionsArray into each VMContext#3741

Merged
alexcrichton merged 2 commits intobytecodealliance:mainfrom
alexcrichton:builtins-nocopy
Jan 28, 2022
Merged

Don't copy VMBuiltinFunctionsArray into each VMContext#3741
alexcrichton merged 2 commits intobytecodealliance:mainfrom
alexcrichton:builtins-nocopy

Conversation

@alexcrichton
Copy link
Copy Markdown
Member

This is another PR along the lines of "let's squeeze all possible
performance we can out of instantiation". Before this PR we would copy,
by value, the contents of VMBuiltinFunctionsArray into each
VMContext allocated. This array of function pointers is modestly-sized
but growing over time as we add various intrinsics. Additionally it's
the exact same for all VMContext allocations.

This PR attempts to speed up instantiation slightly by instead storing
an indirection to the function array. This means that calling a builtin
intrinsic is a tad bit slower since it requires two loads instead of one
(one to get the base pointer, another to get the actual address).
Otherwise though VMContext initialization is now simply setting one
pointer instead of doing a memcpy from one location to another.

With some macro-magic this commit also replaces the previous
implementation with one that's more const-friendly which also gets us
compile-time type-checks of libcalls as well as compile-time
verification that all libcalls are defined.

Overall, as with #3739, the win is very modest here. Locally I measured
a speedup from 1.9us to 1.7us taken to instantiate an empty module with
one function. While small at these scales it's still a 10% improvement!

This is another PR along the lines of "let's squeeze all possible
performance we can out of instantiation". Before this PR we would copy,
by value, the contents of `VMBuiltinFunctionsArray` into each
`VMContext` allocated. This array of function pointers is modestly-sized
but growing over time as we add various intrinsics. Additionally it's
the exact same for all `VMContext` allocations.

This PR attempts to speed up instantiation slightly by instead storing
an indirection to the function array. This means that calling a builtin
intrinsic is a tad bit slower since it requires two loads instead of one
(one to get the base pointer, another to get the actual address).
Otherwise though `VMContext` initialization is now simply setting one
pointer instead of doing a `memcpy` from one location to another.

With some macro-magic this commit also replaces the previous
implementation with one that's more `const`-friendly which also gets us
compile-time type-checks of libcalls as well as compile-time
verification that all libcalls are defined.

Overall, as with bytecodealliance#3739, the win is very modest here. Locally I measured
a speedup from 1.9us to 1.7us taken to instantiate an empty module with
one function. While small at these scales it's still a 10% improvement!
Copy link
Copy Markdown
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

for i in 0..ptrs.len() {
debug_assert!(ptrs[i] != 0, "index {} is not initialized", i);
impl VMBuiltinFunctionsArray {
pub fn new() -> &'static Self {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than new(), can we call this static_instance() or something like that? As written it was somewhat concerning to read VMBuiltinFunctionsArray::new() in the initialization path -- it looked like an allocation of an owned thing.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha looks like I can do one better and have const INIT: VMBuiltinFunctionsArray = ...!

delta: u64,
memory_index: u32,
) -> usize {
) -> *mut u8 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function doesn't return an actual pointer, right?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, this is just how the type-checking with the macro worked out.

@alexcrichton alexcrichton merged commit a25f7bd into bytecodealliance:main Jan 28, 2022
@alexcrichton alexcrichton deleted the builtins-nocopy branch January 28, 2022 22:24
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Feb 2, 2022
…lliance#3741)

* Don't copy `VMBuiltinFunctionsArray` into each `VMContext`

This is another PR along the lines of "let's squeeze all possible
performance we can out of instantiation". Before this PR we would copy,
by value, the contents of `VMBuiltinFunctionsArray` into each
`VMContext` allocated. This array of function pointers is modestly-sized
but growing over time as we add various intrinsics. Additionally it's
the exact same for all `VMContext` allocations.

This PR attempts to speed up instantiation slightly by instead storing
an indirection to the function array. This means that calling a builtin
intrinsic is a tad bit slower since it requires two loads instead of one
(one to get the base pointer, another to get the actual address).
Otherwise though `VMContext` initialization is now simply setting one
pointer instead of doing a `memcpy` from one location to another.

With some macro-magic this commit also replaces the previous
implementation with one that's more `const`-friendly which also gets us
compile-time type-checks of libcalls as well as compile-time
verification that all libcalls are defined.

Overall, as with bytecodealliance#3739, the win is very modest here. Locally I measured
a speedup from 1.9us to 1.7us taken to instantiate an empty module with
one function. While small at these scales it's still a 10% improvement!

* Review comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants