-
Notifications
You must be signed in to change notification settings - Fork 702
Closed
Description
For the discussion of the design and pros and cons of an expressionless encoding. I'll update this opening comment with the current proposal and rationale.
The proposal:
- A set of operators (aka instructions) that read their arguments from local variables (aka registers) and write their results to local variables so that wasm code can be expressed without expressions. These operators would return no values if used in an expression, e.g the same result as
nop. For example:(set_local $r1 (i32.add (get_local $v1) (get_local $v2)))becomes(i32.add $r1 $v1 $v2). - Some opcodes would have have variations that accept immediate constants. This is noted to significantly improve the encoding efficiency. For example
(i32.add/lc $r1 $v1 10). - The existing parametric operators
selectandset_localbecome type specific. Thus all local variable indexes are type specific and use a type specific index space which helps compression and simplifies a prediction algorithm by avoiding special consider for these operators. - The exiting
ifoperator might no longer be as useful and might be removed, or kept for efficiently encoding diamond control flow patterns, but might be subsumed by an alternative more efficient encoding ofblock, and one suggestion is to replaceifbyunlessReplace theifoperator by anunlessoperator. #676. - The break operators would have no expression value. Encoding the number of values would not be necessary, a small encoding advantage.
- Functions could only return values via the
returnoperator. The proposedendoperator might want to accept arguments. - Function multiple return values are supported by the call operators writing to multiple local variables, and the
returnoperator reading a local variable for each returned value. - A predictor is defined that assumes roughly an expression stack allocation pattern in the local variables. A simple proposal is to track the index of the last local variable written and for each type. The arguments of an operator are predicted to access the last value written, with an assumption of single use and a stack allocation pattern, so each access hit decreases the predicted index of that type, and the arguments are assumed to be read last-to-first. The result is predicted to be the next index. This proposal avoids a stack-top register.
- The type of the last variable written might also be a good predictor of the type of the next operator. Where special opcodes are defined that omit arguments that are prediction hits, these might also usefully be selected based on type of the last variable written to handle the frequent case by using a smaller set of opcodes. For example, to select between
i32.addi64.addf32.addandf64.addbased on the last variable written.
Rationale:
- Prediction can replace the advantages of the expression stack, and a simple predictor does a relatively good job. Consider if the current wasm expressions implicitly used local variables to store expression temporary values rather than a separate stack. The encoding efficiency has not changed, but now there is the option to re-use these values. Next examine each operator, and note that it is now reading and writing to local variables, but with some implicit get_local and set_local operations. The encoding efficiency has not change, but the runtime complexity of supporting expressions can be replaced by a local variable predictor, and a simple one at that, that predicts an expression-stack allocation pattern. The fallback, no prediction hit, operators would then accept general local variable indexes and these would more efficiently handle the non-expression patterns. The prediction might be exploited by simply assigning the index zero to a hit, and leaving it to the compressor to assign a short code.
For example the follow expression converted to expressionless statements would have a perfect prediction hit rate:
(i32.add
(i32.add
(i32.add (i32.const 1)
(i32.const 2))
(i32.const 3))
(i32.add (i32.const 4)
(i32.const 5)))
=>
(i32.const $ti1 1)
(i32.add/lc $ti1 $ti1 2)
(i32.add/lc $ti1 $ti1 3)
(i32.const $ti2 4)
(i32.add/lc $ti2 $ti2 5)
(i32.add $ti1 $ti1 $ti2)
- The expression based predictor would bias wasm to expression patterns, give an incentive for developers to emit code in this allocation pattern and to reorder operators to fit this pattern, just as the existing expression support does.
- Existing strategies such as returning a value from store operators could be expressed in a predictor too, by simply setting the predicted next read index to the index of the local variable holding this argument. But adding such complexity to a predictor might have a small return. The last value written by
set_localwould be the predicted to be the next value read, so that pattern is optimized well, and changing the order of the arguments to the store operators would fit a simpler predictor. - The simple predictor proposed keeps an index per type, so unlike an expression stack it can efficiently pipeline operators working on different types of data. For example it could efficiently encode two interleaved expression patterns working on different types of data which could not have been expressed with expressions with a single stack.
- The simple predictor has no top-of-stack, and resets with each write operation. Code could jump between expression stacks with little cost if this happened to help. An operator that writes results that are not used will like cause a write prediction miss on the next write which will reset the prediction index.
- The simple predictor does not update the predicted index on a read miss. This supports common patterns in which one argument is outside the expression stack, for example one argument is a function argument in a local variable.
- A producer does not need to use the predictor, and could just emit the raw indexes, which might suit a producer within a web browser which is not compressing the encoder data.
- In contrast to a post-order encoding of a group of
get_local/operator/set_local, bundling the local variable indexes into the operator allow their type to be known when decoding which supports using type specific variable index spaces and avoids redundant types on immediate constants when they are also bundled. - These expressionless operators could replace the existing expression based encodings, or be a parallel alternative.
- Annotating code with debug information or meta information might be simpler with a flat expressionless encoding.
- Interpreters would be able to allocate a single block of stack on entry, and would not need to dynamically allocate room for expression temporary values.
- Expressions demand some infrastructure, so there is a runtime complexity.
- Returning values from blocks is pushing the 'expression pattern' and adding complexity to do so. Often multiple values are passed around, and the 'expression pattern' can be pushed a little further with multiple value support. Adding
block1can push it a little further. Operators could be added to produce any number and order of values, but the options to consume the values are still very limited to single use and: destructuring, pass-through parametric operators, result values, arguments to a call. A type system is need to validate these operators. It seems like a lot of language development work and a lot of potential bike-shedding, and the product might become rather specific and baggage from other perspectives and presentation forms. - A decoder that rebuilds expressions for presentation would have the option to use block local variables and other programming conveniences to present the code, even where these were considered too much or a burden for the wasm language.
- Having immediate local variables makes multiple use of values in variables more efficient. For example constants loaded into variables can be used repeatedly with lower encoding size than with the wasm expressions which need a separate
get_localfor each constant use. If it helped runtime compilers then these might be declarable at the start of the function and read-only which might allow a compiler to put them in a constant pool - that is constant pool references sharing the index space with local variables. - There is a demo of AngryBots converted to an expressionless form but using the existing encoding and now updated for the post-order encoding. It can probably be decoded to wast using sexpr-wasm. It is a simple expansion of expressions and not a global function reallocation of local variables, and is 138% of the original size. When encoded into operators that read and write from immediate local variables, and with some opcodes supporting immediate arguments, the file size is 96% of the original size and this is without using
ifjust blocks and breaks and without a predictor (which may not change the uncompressed size if just using a zero index for a hit). Using prediction and specialized opcodes for predicted operands reduces the file size to 80% of the original size. - A patch for Firefox to add some support for this proposal will run AngryBots encoded using 'register' base operators. Note this has not yet been updated for the post-order encoding. This currently uses opcode
0xbbto enable this alternative mode for operators, and per function. The uncompressed file size is 113% of the original. This does not yet implement opcodes with immediate constants which improves the encoding efficiency or the prediction model.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels