Don't copy VMBuiltinFunctionsArray into each VMContext#3741
Merged
alexcrichton merged 2 commits intobytecodealliance:mainfrom Jan 28, 2022
Merged
Don't copy VMBuiltinFunctionsArray into each VMContext#3741alexcrichton merged 2 commits intobytecodealliance:mainfrom
VMBuiltinFunctionsArray into each VMContext#3741alexcrichton merged 2 commits intobytecodealliance:mainfrom
Conversation
This is another PR along the lines of "let's squeeze all possible performance we can out of instantiation". Before this PR we would copy, by value, the contents of `VMBuiltinFunctionsArray` into each `VMContext` allocated. This array of function pointers is modestly-sized but growing over time as we add various intrinsics. Additionally it's the exact same for all `VMContext` allocations. This PR attempts to speed up instantiation slightly by instead storing an indirection to the function array. This means that calling a builtin intrinsic is a tad bit slower since it requires two loads instead of one (one to get the base pointer, another to get the actual address). Otherwise though `VMContext` initialization is now simply setting one pointer instead of doing a `memcpy` from one location to another. With some macro-magic this commit also replaces the previous implementation with one that's more `const`-friendly which also gets us compile-time type-checks of libcalls as well as compile-time verification that all libcalls are defined. Overall, as with bytecodealliance#3739, the win is very modest here. Locally I measured a speedup from 1.9us to 1.7us taken to instantiate an empty module with one function. While small at these scales it's still a 10% improvement!
fitzgen
approved these changes
Jan 28, 2022
cfallin
approved these changes
Jan 28, 2022
crates/runtime/src/vmcontext.rs
Outdated
| for i in 0..ptrs.len() { | ||
| debug_assert!(ptrs[i] != 0, "index {} is not initialized", i); | ||
| impl VMBuiltinFunctionsArray { | ||
| pub fn new() -> &'static Self { |
Member
There was a problem hiding this comment.
Rather than new(), can we call this static_instance() or something like that? As written it was somewhat concerning to read VMBuiltinFunctionsArray::new() in the initialization path -- it looked like an allocation of an owned thing.
Member
Author
There was a problem hiding this comment.
Aha looks like I can do one better and have const INIT: VMBuiltinFunctionsArray = ...!
bjorn3
reviewed
Jan 28, 2022
| delta: u64, | ||
| memory_index: u32, | ||
| ) -> usize { | ||
| ) -> *mut u8 { |
Contributor
There was a problem hiding this comment.
This function doesn't return an actual pointer, right?
Member
Author
There was a problem hiding this comment.
Correct, this is just how the type-checking with the macro worked out.
alexcrichton
added a commit
to alexcrichton/wasmtime
that referenced
this pull request
Feb 2, 2022
…lliance#3741) * Don't copy `VMBuiltinFunctionsArray` into each `VMContext` This is another PR along the lines of "let's squeeze all possible performance we can out of instantiation". Before this PR we would copy, by value, the contents of `VMBuiltinFunctionsArray` into each `VMContext` allocated. This array of function pointers is modestly-sized but growing over time as we add various intrinsics. Additionally it's the exact same for all `VMContext` allocations. This PR attempts to speed up instantiation slightly by instead storing an indirection to the function array. This means that calling a builtin intrinsic is a tad bit slower since it requires two loads instead of one (one to get the base pointer, another to get the actual address). Otherwise though `VMContext` initialization is now simply setting one pointer instead of doing a `memcpy` from one location to another. With some macro-magic this commit also replaces the previous implementation with one that's more `const`-friendly which also gets us compile-time type-checks of libcalls as well as compile-time verification that all libcalls are defined. Overall, as with bytecodealliance#3739, the win is very modest here. Locally I measured a speedup from 1.9us to 1.7us taken to instantiate an empty module with one function. While small at these scales it's still a 10% improvement! * Review comments
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is another PR along the lines of "let's squeeze all possible
performance we can out of instantiation". Before this PR we would copy,
by value, the contents of
VMBuiltinFunctionsArrayinto eachVMContextallocated. This array of function pointers is modestly-sizedbut growing over time as we add various intrinsics. Additionally it's
the exact same for all
VMContextallocations.This PR attempts to speed up instantiation slightly by instead storing
an indirection to the function array. This means that calling a builtin
intrinsic is a tad bit slower since it requires two loads instead of one
(one to get the base pointer, another to get the actual address).
Otherwise though
VMContextinitialization is now simply setting onepointer instead of doing a
memcpyfrom one location to another.With some macro-magic this commit also replaces the previous
implementation with one that's more
const-friendly which also gets uscompile-time type-checks of libcalls as well as compile-time
verification that all libcalls are defined.
Overall, as with #3739, the win is very modest here. Locally I measured
a speedup from 1.9us to 1.7us taken to instantiate an empty module with
one function. While small at these scales it's still a 10% improvement!