Add basic support for profiling Pulley#10034
Merged
alexcrichton merged 10 commits intobytecodealliance:mainfrom Jan 16, 2025
Merged
Add basic support for profiling Pulley#10034alexcrichton merged 10 commits intobytecodealliance:mainfrom
alexcrichton merged 10 commits intobytecodealliance:mainfrom
Conversation
This commit adds basic support for profiling the Pulley interpreter. This is partially achievable previously through the use of native profilers, but the downside of that approach is that you can find hot instructions but it's not clear in what context the hot instructions are being executed nor what functions are hot. The goal of this profiler is to show pulley bytecode and time spent in bytecode itself to better understand the shape of code around a hot instruction to identify new macro opcodes for example. The general structure of this new profiler is: * There is a compile-time feature for Pulley which is off-by-default where, when enabled, Pulley will record its current program counter into an `AtomicUsize` before each instruction. * When the CLI has `--profile pulley` Wasmtime will spawn a sampling thread in the same process which will periodically read from this `AtomicUsize` to record where the program is currently executing. * The Pulley profiler additionally records all bytecode through the use of the `ProfilingAgent` trait to ensure that the recording has access to all bytecode as well. * Samples are taken throughout the process and emitted to a `pulley-$pid.data` file. This file is then interpreted and printed by an "example" program `profiler-html.rs` in the `pulley/examples` directory. The end result is that hot functions of Pulley bytecode can be seen and instructions are annotated with how frequently they were executed. This enables finding hot loops and understanding more about the whole loop, bytecodes that were selected, and such.
alexcrichton
commented
Jan 16, 2025
Comment on lines
-34
to
+39
| type Handler = fn(Interpreter<'_>) -> Done; | ||
| /// ABI signature of each opcode handler. | ||
| /// | ||
| /// Note that this "explodes" the internals of `Interpreter` to individual | ||
| /// arguments to help get them all into registers. | ||
| type Handler = fn(&mut MachineState, UnsafeBytecodeStream, ExecutingPcRef<'_>) -> Done; |
Member
Author
There was a problem hiding this comment.
I'll note that this change was done to ensure/guarantee that these three components of Interpreter are passed in registers. I was worried about crossing a threshold where if Interpeter got too big it would be passed by-ref instead of "exploded" into components like we want.
Member
Author
There was a problem hiding this comment.
I'll also note that ExecutingPcRef is a zero-sized-type when profile is disabled, otherwise it's a pointer-large.
Member
There was a problem hiding this comment.
Makes sense, thanks for the explanation.
fitzgen
approved these changes
Jan 16, 2025
Comment on lines
-34
to
+39
| type Handler = fn(Interpreter<'_>) -> Done; | ||
| /// ABI signature of each opcode handler. | ||
| /// | ||
| /// Note that this "explodes" the internals of `Interpreter` to individual | ||
| /// arguments to help get them all into registers. | ||
| type Handler = fn(&mut MachineState, UnsafeBytecodeStream, ExecutingPcRef<'_>) -> Done; |
Member
There was a problem hiding this comment.
Makes sense, thanks for the explanation.
github-merge-queue bot
pushed a commit
that referenced
this pull request
Jan 16, 2025
* Add basic support for profiling Pulley This commit adds basic support for profiling the Pulley interpreter. This is partially achievable previously through the use of native profilers, but the downside of that approach is that you can find hot instructions but it's not clear in what context the hot instructions are being executed nor what functions are hot. The goal of this profiler is to show pulley bytecode and time spent in bytecode itself to better understand the shape of code around a hot instruction to identify new macro opcodes for example. The general structure of this new profiler is: * There is a compile-time feature for Pulley which is off-by-default where, when enabled, Pulley will record its current program counter into an `AtomicUsize` before each instruction. * When the CLI has `--profile pulley` Wasmtime will spawn a sampling thread in the same process which will periodically read from this `AtomicUsize` to record where the program is currently executing. * The Pulley profiler additionally records all bytecode through the use of the `ProfilingAgent` trait to ensure that the recording has access to all bytecode as well. * Samples are taken throughout the process and emitted to a `pulley-$pid.data` file. This file is then interpreted and printed by an "example" program `profiler-html.rs` in the `pulley/examples` directory. The end result is that hot functions of Pulley bytecode can be seen and instructions are annotated with how frequently they were executed. This enables finding hot loops and understanding more about the whole loop, bytecodes that were selected, and such. * Add missing source file * Check the profile-pulley feature in CI * Miscellaneous fixes for CI * Fix type-checking of `become` on nightly Rust * Fix more misc CI issues * Fix dispatch in tail loop * Update test expectations * Review comments
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This commit adds basic support for profiling the Pulley interpreter. This is partially achievable previously through the use of native profilers, but the downside of that approach is that you can find hot instructions but it's not clear in what context the hot instructions are being executed nor what functions are hot. The goal of this profiler is to show pulley bytecode and time spent in bytecode itself to better understand the shape of code around a hot instruction to identify new macro opcodes for example.
The general structure of this new profiler is:
There is a compile-time feature for Pulley which is off-by-default where, when enabled, Pulley will record its current program counter into an
AtomicUsizebefore each instruction.When the CLI has
--profile pulleyWasmtime will spawn a sampling thread in the same process which will periodically read from thisAtomicUsizeto record where the program is currently executing.The Pulley profiler additionally records all bytecode through the use of the
ProfilingAgenttrait to ensure that the recording has access to all bytecode as well.Samples are taken throughout the process and emitted to a
pulley-$pid.datafile. This file is then interpreted and printed by an "example" programprofiler-html.rsin thepulley/examplesdirectory.The end result is that hot functions of Pulley bytecode can be seen and instructions are annotated with how frequently they were executed. This enables finding hot loops and understanding more about the whole loop, bytecodes that were selected, and such.