pulley: Implement interpreter-to-host calls#9665
pulley: Implement interpreter-to-host calls#9665alexcrichton merged 2 commits intobytecodealliance:mainfrom
Conversation
This commit is an initial stab at implementing interpreter-to-host communication in Pulley. The basic problem is that Pulley needs the ability to call back into Wasmtime to implement tasks such as `memory.grow`, imported functions, etc. For native platforms this is a simple `call_indirect` operation in Cranelift but the story for Pulley must be different because it's effectively switching from interpreted code to native code. The initial idea for this in bytecodealliance#9651 is replaced here and looks mostly similar but with a few changes. The overall structure of how this works is: * A new `call_indirect_host` opcode is added to Pulley. * Function signatures that can be called from Pulley bytecode are statically enumerated at build-time. * This enables the implementation of `call_indirect_host` to take an immediate of which signature is being used and cast the function pointer to the right type. * A new pulley-specific relocation is added to Cranelift for this opcode. * `RelocDistance::Far` calls to a name trigger the use of `call_indirect_host`. * The relocation is filled in by Wasmtime after compilation where the signature number is inserted. * A new `NS_*` value for user-function namespaces is reserved in `wasmtime-cranelift` for this new namespace of functions. * Code generation for Pulley in `wasmtime-cranelift` now has Pulley-specific handling of the wasm-to-host transition where all previous `call_indirect` instructions are replaced with a call to a "backend intrinsic" which gets lowered to a `call_indirect_host`. Note that most of this still isn't hooked up everywhere in Wasmtime. That means that the testing here is pretty light at this time. It'll require a fair bit more work to get everything fully integrated from Wasmtime in Pulley. This is expected to be one of the significant remaining chunks of work and should help unblock future testing (or make those diffs smaller ideally).
0ea4d86 to
1c9e235
Compare
Subscribe to Label Actioncc @fitzgen DetailsThis issue or pull request has been labeled: "cranelift", "cranelift:area:machinst", "pulley", "wasmtime:api"Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
abrown
left a comment
There was a problem hiding this comment.
Makes sense to get something working.
This commit is a continuation of the plan of implementing host calls in Pulley through bytecodealliance#9665, bytecodealliance#9675, and bytecodealliance#9693. Here the `Compiler::call_indirect_host` method is updated to take a new type, `HostCall`, which indicates what type of host call is being performed. This is then serialized to a 32-bit integer which will be present in the pulley instruction being generated. This 32-bit integer will then be used to perform a dispatch (the dispatch is left for a future PR with more Pulley integration). This new `HostCall` structure is defined with `BuiltinFunctionIndex` internally. Additionally a new `ComponentBuiltinFunctionIndex` is added to enumerate the same set of indexes for components as well. Along the way the split between component transcoders/builtins were removed and they're now all lumped together in one macro for builtins. (no need to have two separate macros). This new `HostCall` is used to implement the `call_indirect_host` instruction for Pulley to fill out an unimplemented piece of code.
This commit is a continuation of the plan of implementing host calls in Pulley through bytecodealliance#9665, bytecodealliance#9675, and bytecodealliance#9693. Here the `Compiler::call_indirect_host` method is updated to take a new type, `HostCall`, which indicates what type of host call is being performed. This is then serialized to a 32-bit integer which will be present in the pulley instruction being generated. This 32-bit integer will then be used to perform a dispatch (the dispatch is left for a future PR with more Pulley integration). This new `HostCall` structure is defined with `BuiltinFunctionIndex` internally. Additionally a new `ComponentBuiltinFunctionIndex` is added to enumerate the same set of indexes for components as well. Along the way the split between component transcoders/builtins were removed and they're now all lumped together in one macro for builtins. (no need to have two separate macros). This new `HostCall` is used to implement the `call_indirect_host` instruction for Pulley to fill out an unimplemented piece of code.
This commit is a continuation of the plan of implementing host calls in Pulley through bytecodealliance#9665, bytecodealliance#9675, and bytecodealliance#9693. Here the `Compiler::call_indirect_host` method is updated to take a new type, `HostCall`, which indicates what type of host call is being performed. This is then serialized to a 32-bit integer which will be present in the pulley instruction being generated. This 32-bit integer will then be used to perform a dispatch (the dispatch is left for a future PR with more Pulley integration). This new `HostCall` structure is defined with `BuiltinFunctionIndex` internally. Additionally a new `ComponentBuiltinFunctionIndex` is added to enumerate the same set of indexes for components as well. Along the way the split between component transcoders/builtins were removed and they're now all lumped together in one macro for builtins. (no need to have two separate macros). This new `HostCall` is used to implement the `call_indirect_host` instruction for Pulley to fill out an unimplemented piece of code.
This commit is a continuation of the plan of implementing host calls in Pulley through bytecodealliance#9665, bytecodealliance#9675, and bytecodealliance#9693. Here the `Compiler::call_indirect_host` method is updated to take a new type, `HostCall`, which indicates what type of host call is being performed. This is then serialized to a 32-bit integer which will be present in the pulley instruction being generated. This 32-bit integer will then be used to perform a dispatch (the dispatch is left for a future PR with more Pulley integration). This new `HostCall` structure is defined with `BuiltinFunctionIndex` internally. Additionally a new `ComponentBuiltinFunctionIndex` is added to enumerate the same set of indexes for components as well. Along the way the split between component transcoders/builtins were removed and they're now all lumped together in one macro for builtins. (no need to have two separate macros). This new `HostCall` is used to implement the `call_indirect_host` instruction for Pulley to fill out an unimplemented piece of code.
This commit is a continuation of the plan of implementing host calls in Pulley through bytecodealliance#9665, bytecodealliance#9675, and bytecodealliance#9693. Here the `Compiler::call_indirect_host` method is updated to take a new type, `HostCall`, which indicates what type of host call is being performed. This is then serialized to a 32-bit integer which will be present in the pulley instruction being generated. This 32-bit integer will then be used to perform a dispatch (the dispatch is left for a future PR with more Pulley integration). This new `HostCall` structure is defined with `BuiltinFunctionIndex` internally. Additionally a new `ComponentBuiltinFunctionIndex` is added to enumerate the same set of indexes for components as well. Along the way the split between component transcoders/builtins were removed and they're now all lumped together in one macro for builtins. (no need to have two separate macros). This new `HostCall` is used to implement the `call_indirect_host` instruction for Pulley to fill out an unimplemented piece of code.
This commit is a continuation of the plan of implementing host calls in Pulley through bytecodealliance#9665, bytecodealliance#9675, and bytecodealliance#9693. Here the `Compiler::call_indirect_host` method is updated to take a new type, `HostCall`, which indicates what type of host call is being performed. This is then serialized to a 32-bit integer which will be present in the pulley instruction being generated. This 32-bit integer will then be used to perform a dispatch (the dispatch is left for a future PR with more Pulley integration). This new `HostCall` structure is defined with `BuiltinFunctionIndex` internally. Additionally a new `ComponentBuiltinFunctionIndex` is added to enumerate the same set of indexes for components as well. Along the way the split between component transcoders/builtins were removed and they're now all lumped together in one macro for builtins. (no need to have two separate macros). This new `HostCall` is used to implement the `call_indirect_host` instruction for Pulley to fill out an unimplemented piece of code.
This commit is a continuation of the plan of implementing host calls in Pulley through bytecodealliance#9665, bytecodealliance#9675, and bytecodealliance#9693. Here the `Compiler::call_indirect_host` method is updated to take a new type, `HostCall`, which indicates what type of host call is being performed. This is then serialized to a 32-bit integer which will be present in the pulley instruction being generated. This 32-bit integer will then be used to perform a dispatch (the dispatch is left for a future PR with more Pulley integration). This new `HostCall` structure is defined with `BuiltinFunctionIndex` internally. Additionally a new `ComponentBuiltinFunctionIndex` is added to enumerate the same set of indexes for components as well. Along the way the split between component transcoders/builtins were removed and they're now all lumped together in one macro for builtins. (no need to have two separate macros). This new `HostCall` is used to implement the `call_indirect_host` instruction for Pulley to fill out an unimplemented piece of code.
| /// Pulley - call a host function indirectly where the embedder resolving | ||
| /// this relocation needs to fill in the expected signature. | ||
| PulleyCallIndirectHost, |
There was a problem hiding this comment.
Does the signature actually need to be resolved at reloc time? It can't be done at compile time and embedded in the instruction itself?
The address of any host function obviously needs to be reloc time (this is a bit of an aside because my understanding is that we aren't actually embedding any host function addresses in the pulley bytecode) however the signature doesn't seem like it should need to be resolved at reloc time.
There was a problem hiding this comment.
I think what you're thinking is already done actually, but the phrasing here is ambiguous. The "reloc time" technically happens twice -- once when linking things into artifacts and again when loading the artifacts. Putting the signature into the instruction happens in the first of these, during linking time. The relocation here is needed because the UserExternalName isn't available during compilation, only after the compile has finished, so that level of relocation processing is required to stuff it in.
Otherwise though there's no runtime relocation when we load the bytecode itself, it's all frozen and loaded as-is from disk or the compile artifact.
There was a problem hiding this comment.
Ahhhh it is the function's id/code that is being reloc'd at link time here? That makes sense to me. When I read "signature" I was thinking "parameter and result types" and perhaps "calling convention", which happens to align with cranelift_codegen::ir::Signature.
Can we replace "signature" with "code" or "id" in these bits?
There was a problem hiding this comment.
Good point yeah, this is also something that changed halfway through the design and I didn't get around to updating all the docs
| _tmp: Writable<Reg>, | ||
| info: CallInfo<()>, | ||
| ) -> SmallVec<[Self::I; 2]> { | ||
| match dest { |
There was a problem hiding this comment.
Should we also check the calling conventions at all here? That was what I was (hackily) using to distinguish between pulley-to-pulley and pulley-to-host before. I like reloc-distance better but maybe we should be asserting that pulley-to-pulley always uses tail and pulley-to-host always uses systemv (which is a bit of a lie) or something like that?
There was a problem hiding this comment.
I think that's reasonable yeah, I'll try to go back and add some assertions.
This commit is a continuation of the plan of implementing host calls in Pulley through bytecodealliance#9665, bytecodealliance#9675, and bytecodealliance#9693. Here the `Compiler::call_indirect_host` method is updated to take a new type, `HostCall`, which indicates what type of host call is being performed. This is then serialized to a 32-bit integer which will be present in the pulley instruction being generated. This 32-bit integer will then be used to perform a dispatch (the dispatch is left for a future PR with more Pulley integration). This new `HostCall` structure is defined with `BuiltinFunctionIndex` internally. Additionally a new `ComponentBuiltinFunctionIndex` is added to enumerate the same set of indexes for components as well. Along the way the split between component transcoders/builtins were removed and they're now all lumped together in one macro for builtins. (no need to have two separate macros). This new `HostCall` is used to implement the `call_indirect_host` instruction for Pulley to fill out an unimplemented piece of code.
Handling review comments from bytecodealliance#9665 and fully updating documentation to reflect the mid-pr design shift to the currently-landed state.
* Enumerate all host calls in `wasmtime_environ::HostCall` This commit is a continuation of the plan of implementing host calls in Pulley through #9665, #9675, and #9693. Here the `Compiler::call_indirect_host` method is updated to take a new type, `HostCall`, which indicates what type of host call is being performed. This is then serialized to a 32-bit integer which will be present in the pulley instruction being generated. This 32-bit integer will then be used to perform a dispatch (the dispatch is left for a future PR with more Pulley integration). This new `HostCall` structure is defined with `BuiltinFunctionIndex` internally. Additionally a new `ComponentBuiltinFunctionIndex` is added to enumerate the same set of indexes for components as well. Along the way the split between component transcoders/builtins were removed and they're now all lumped together in one macro for builtins. (no need to have two separate macros). This new `HostCall` is used to implement the `call_indirect_host` instruction for Pulley to fill out an unimplemented piece of code. * Rename `max` to `len`
Handling review comments from #9665 and fully updating documentation to reflect the mid-pr design shift to the currently-landed state.
This commit is an initial stab at implementing interpreter-to-host communication in Pulley. The basic problem is that Pulley needs the ability to call back into Wasmtime to implement tasks such as
memory.grow, imported functions, etc. For native platforms this is a simplecall_indirectoperation in Cranelift but the story for Pulley must be different because it's effectively switching from interpreted code to native code.The initial idea for this in #9651 is replaced here and looks mostly similar but with a few changes. The overall structure of how this works is:
call_indirect_hostopcode is added to Pulley.call_indirect_hostto take an immediate of which signature is being used and cast the function pointer to the right type.RelocDistance::Farcalls to a name trigger the use ofcall_indirect_host.NS_*value for user-function namespaces is reserved inwasmtime-craneliftfor this new namespace of functions.wasmtime-craneliftnow has Pulley-specific handling of the wasm-to-host transition where all previouscall_indirectinstructions are replaced with a call to a "backend intrinsic" which gets lowered to acall_indirect_host.Note that most of this still isn't hooked up everywhere in Wasmtime. That means that the testing here is pretty light at this time. It'll require a fair bit more work to get everything fully integrated from Wasmtime in Pulley. This is expected to be one of the significant remaining chunks of work and should help unblock future testing (or make those diffs smaller ideally).