Implemented instruction encoder (Closes #129)#254
Conversation
cfbe0aa to
5ccf3c3
Compare
|
First of all again thank you very much for this very high quality PR! I did still not check all the code in detail yet, but already have some input:
From what I see, the
Do you have a list of all affected instructions?
For the
That was as well our approach in our early encoder draft. Should work pretty well.
I like that idea. Binary size is important for some micro-controllers and stuff, but I personally would always prioritize performance over size.
I think we should brainstorm about this point. I would definitely love to fully support this somehow. Most users won't care if the resulting bytecode in the process of
Probably we can choose a similar approach like in the decoder tables where we have different structs for Another point that would be great for the future is support for the conditional feature defines to disable AVX512 / KNC entirely to reduce binay size (and probably gain a pint of additional performance). |
|
I pushed scripts: https://github.com/mappzor/zydis-db/tree/encoder/ZydisEncoder They require Python 3, no external dependencies. Generator uses As mentioned main loop does same processing as ZydisEditor however it does some extra merging as well (allowed CPU modes, operand sizes, address sizes). Not everything there is very elegant but it just works. After main loop there are 2 elimination passes: one for mandatory prefixes and another one for hidden operand scaling. Hidden operand scaling is probably the biggest new concept introduced to instruction definitions and there is some non-trivial processing being done here (not in the elimination pass itself but mostly later inside You may notice that nothing gets ever removed from instruction list and that's intended. I flag instructions with extra attributes for various reasons but don't remove them. This way
Yes, control is provided where it's absolutely necessary e.g. branching. Whole idea behind things like
Sorting process will involve calculating things like "minimal instruction size for this definition", maybe maximal as well. I believe that implementation process here will shed some light on whole "user-requested size" idea and probably help to discover funny edge cases like always.
Second part that's missing is of course about those damned displacements. Use-case you've mentioned was in my head since day 1. I believe it will be a very common one and I will probably even need this for myself. That's what motivated me to take some steps in this direction already. Luckily number of users that need size-optimal transformations for AVX-512 instructions is probably quite low, so while this item should get done eventually it's not the most urgent thing. The more I think about it, the more I lean towards pre-computing scales and putting them into tables. Especially after considering next point...
I fully agree. While it's tricky to strike a balance between size, performance and ISA-support levels for decoder I don't think we have that issue with encoder. I can't think of any use-case where x86 code generation happens but environment is so constrained that few kilobytes of extra data make a drastic difference. However I can think of many use-cases where a lot of code needs to be generated as fast as possible so for me it's simple, performance is king here. Btw just to let you know what we are talking about here's some data about current tables. Under MSVC, natural size of a single entry in main table was 48 bytes. After rearranging members, dropping enum types, packing and using bitfields this size got reduced to only 8 bytes per entry and that's where we stand today. With 7708 definitions (if I remember correctly) that's
I thought about this a lot and wanted to avoid it at least for initial implementation (as we discussed some weeks ago) because it complicated lookups. I think as we go forward this might be a good step to take though. It will enable us to have compile-time features for encoder just like decoder has.
No, I will try to fetch. It's a bunch of AMD-specific stuff if I remember correctly. Thing to consider here is that there are only 2 configurations here: normal and
Second was very minor. I wanted to have something like |
|
List of offending |
|
I added support for compressed displacements. After closer look I've changed my mind about pre-computing, so everything is handled via code. |
I tried this and as expected improvement is quite significant. For a simple benchmark I decoded and re-encoded every instruction from |
|
Very nice! Definitely sounds more than worth to add these few bytes. |
There was a problem hiding this comment.
The quality of this PR is just insane, mad respect. The design looks very well thought-out. You managed to emulate our code-style so close that it's hard to believe given that we don't have it written down anywhere.
I'm nowhere near having gone though all the details yet, but I figured I'd leave some preliminary comments. I'll try to do a full review within this week.
athre0z
left a comment
There was a problem hiding this comment.
Some more comments, still not through with all of it!
| ZYDIS_ENCODABLE_BRANCH_TYPE_NONE, | ||
| ZYDIS_ENCODABLE_BRANCH_TYPE_SHORT, | ||
| ZYDIS_ENCODABLE_BRANCH_TYPE_NEAR16, | ||
| ZYDIS_ENCODABLE_BRANCH_TYPE_NEAR32, | ||
| ZYDIS_ENCODABLE_BRANCH_TYPE_NEAR64, | ||
| ZYDIS_ENCODABLE_BRANCH_TYPE_FAR16, | ||
| ZYDIS_ENCODABLE_BRANCH_TYPE_FAR32, | ||
| ZYDIS_ENCODABLE_BRANCH_TYPE_FAR64, |
There was a problem hiding this comment.
For de-deplication, would it make sense to merge this with the decoder's ZydisBranchType, extending the decoder variant with the size suffixes included in this one?
There was a problem hiding this comment.
From encoder's perspective that's a desirable change. I needed a type that essentially encodes a tuple (branch_type, branch_immediate_value_size) hence ZydisEncodableBranchType was born. Merging these types would make translation between them unnecessary and that's a welcome change.
Now let's look at decoder impact. In decoder branch_type provides branch type ONLY and if anyone needs information about its size it must be extracted separately. Only downside is that if you are interested in just branch type e.g. NEAR you would need to test for NEAR16 | NEAR32 | NEAR64. Is that a big deal? Probably not.
Only piece of code that uses branch_type in decoder is this one:
instruction->meta.branch_type = definition->branch_type;
ZYAN_ASSERT((instruction->meta.branch_type == ZYDIS_BRANCH_TYPE_NONE) ||
((instruction->meta.category == ZYDIS_CATEGORY_CALL) ||
(instruction->meta.category == ZYDIS_CATEGORY_COND_BR) ||
(instruction->meta.category == ZYDIS_CATEGORY_UNCOND_BR) ||
(instruction->meta.category == ZYDIS_CATEGORY_RET)));
So no major problems here as well. I'd say I'm in favor of such change for the sake of simplicity. However I will need to count on you guys because this requires generator change (definition->branch_type as seen above).
There was a problem hiding this comment.
I would be fine with converting the decoder flag enum to flags adding a combined mask for backwards compatibility.
We just have to check if the decoder always easily knows the size (e.g. even in minimal mode). The tables only save the NEAR/FAR/SHORT because some of the instructions are affected by the 66h prefix.
There was a problem hiding this comment.
Note that this would still be a breaking change even if defining it as flags, because for the mask to work, you have to do & BRANCH_TYPE whereas existing code will most likely currently use == BRANCH_TYPE.
Unrelated to the comment above, if we do flags, instead of having a mask, we could also represent it like this:
enum ZydisBranchFlags {
ZYDIS_BRANCH_FLAGS_16 = 1 << 0,
ZYDIS_BRANCH_FLAGS_32 = 1 << 1,
ZYDIS_BRANCH_FLAGS_64 = 1 << 2,
ZYDIS_BRANCH_FLAGS_SHORT = 1 << 3,
ZYDIS_BRANCH_FLAGS_NEAR = 1 << 4,
ZYDIS_BRANCH_FLAGS_FAR = 1 << 5,
};The drawback with this approach is that it would allow for nonsensical combinations of flags like short+64.
|
@mappzor @flobernd what do you think about preparing to merge this PR mostly as-is for now, converting the remaining open points into issues? For me personally, none of them are deal-breakers, just overall clean-up and enhancements. Since this PR is so big, I feel like it would make more sense to have dedicated issues for discussing the remaining stuff properly. If you both agree, I'd probably perform an interactive rebase to clean up the commit history, rebase on master and merge. |
|
@athre0z I'd like to finish size-optimal outputs and regression tests as a part of this PR (there are some other minor improvements as well). I'm really close, hopefully will push this tonight. From review items I definitely want to implement
Some commits probably can be merged but it's not like we have a ton of "fixed a typo", etc. commits. Having some history for feature-like things and bigger changes is generally a good practice. |
Alright!
I agree. In fact, this is exactly what caused my suggestion to merge and discuss this in a dedicated issue! :) I typed up an alternative proposal, and before posting it, I figured that it might be better to discuss it separately.
Yes, I intend to keep the majority of them, just filtering out the fix-up commits. |
| ], start=0) | ||
|
|
||
|
|
||
| ZydisMnemonic = IntEnum('ZydisMnemonic', [ |
There was a problem hiding this comment.
Possible future improvement: at least make ZydisMnemonic and ZydisRegister auto-generated. Ideally whole file can be auto-generated. There are two ways to go about it:
- use existing generator to do heavy-lifting here (good for mnemonics and registers, but not for other enums)
- at the cost of extra python dependency it's possible to parse C header files and generate all types at run-time
There was a problem hiding this comment.
Very nice work on the regression tests :-)
I personally prefer the second option here as it reduces the amount of places where we have to update enums (especially the ones different to ZydisMnemonic and ZydisRegister). The current generator would only be able to auto-generate ZydisMnemonic. For ZydisRegister there is a second generator, but I never finished it (it was supposed to generate the C enums and as well the Delphi ones for the primary generator from a json file).
|
Some changes are still under way. I've spent more time than expected on regression tests because my idea has changed drastically. At first I just wanted to upload my crash files and make a simple script to run fuzz binaries on them. However I thought I'd be great to have a human-readable database of tests, so I invented |
|
After today's changes we have size-optimal outputs, better fuzzer, bunch of bugs fixed (resistance against ill-formed encoder requests, found with improved fuzzer), more test cases, As far as I'm concerned, it's ready to merge :) |

Design goals
ZydisEncoderRequest+ZydisEncoderEncodeInstruction).ZydisEncoderDecodedInstructionToEncoderRequest).ZydisEncoderRequeststructure.Non-goals
ZydisEncoderDecodedInstructionToEncoderRequestand back toZydisEncoderEncodeInstructionwithout any modifications, output generated by the encoder might be different although semantically equivalent to original instruction.REXwhen it's not necessary, prioritizing 2-byteVEXwhen possible, not emitting useless legacy prefixes at all times, using optimalModR/M/SIBencodings, optimizing sizes of immediates while taking advantage of sign-extending behaviors. Some possible future improvements will be discussed later.Design
ZydisEncoderRequeststructure toZydisEncoderEncodeInstructionfunction.ZydisEncoderDecodedInstructionToEncoderRequestfacilitates one of most important use cases for this feature: decoding (-> modification) -> conversion to encoder request -> re-encoding. It's a lightweight function that performs minimal validation and simply moves data around. It will never crash but it might produce invalid encoder request when malformedZydisDecodedInstructionis passed to it (btw you shouldn't do that in general). Corrupting some critical fields like e.g. operand count will be detected though, soZYAN_STATUS_INVALID_ARGUMENTmight be reported. Don't forget to check for errors :)ZydisEncoderRequestshould be self-explanatory but some of them are results of non-trivial design decisions.allowed_encodingsmost of the time, most users will set this toZYDIS_ENCODABLE_ENCODING_DEFAULT. This field can be used to gain some control over physical encoding e.g. you can effectively disable AVX-512 by prohibitingEVEXencodings. This is important because many instructions can have same mnemonic but definitions for more than one prefix (e.g.VEXandEVEX).branch_typegives granular control over branch size and type (near,far,wherever you are), used only for control-flow instructions.address_size_hint/operand_size_hintfor vast majority of instructions those hints are simply ignored. For some instructions though, they serve a very important purpose of expressing semantics that cannot be derived from explicit/implicit operands. String instructions (e.g.stosfamily) are a good example as they take no arguments but hidden operands can scale with address size attribute.evex/mvexsubstructures contain information about features specific to those prefixes. Don't specify broadcast if instruction uses static broadcast.zeroing_maskshould beZYAN_TRUEif instruction uses forced zeroing mask.ZydisEncoderOperand.reg.is4that's probably one of my favourite special cases in whole ISA.is4is meant to denote 4th source operand. If it's 4th source why is it necessary to specify that? Because with some instructions more than one configuration is possible and 4th operand becomes... 3rd operand. Brilliant and perfectly logical! This field helps to resolve that ambiguity.66/67prefixes to match user-requested semantics.Implementation
Conceptually encoding process is divided into 4 phases:
ZydisEncoderCheckRequestSanity)ZydisFindMatchingDefinition) - parts of encoder request are matched against definition candidates, tons of checks happen here to ensure a proper instruction will be generatedZydisBuildInstruction) - if matching definition is found, encoder converts user-provided operands to members ofZydisEncoderInstructionusing instruction definition as a blueprintZydisEmitInstruction) - instruction is emitted byte by byte, this is the simplest part of the process, only error that can happen at this stage is running out of space in user-provided bufferTODOs, known issues, limitations, future improvements, etc.
Item tags:
Items:
ZydisFuzzIncan be selected viaZYDIS_FUZZ_TARGETdefine. I think it would be better to move shared code to a separate file and make 3 separate projects for different fuzz targets (every project would include shared source file + target-only file).ededfuzz target by adding extra validation step. Semantics in encoder request should be compared with decoded instruction to make sure encoder doesn't misrepresent anything. It's not critical becausededtarget should catch such cases but it would be nice to have.cd8_scalevalue in code is just hell. It's possible to cheat and add it to encoder's table but this will waste space as onlyEVEX/MVEXneeds this. Currently 32-bit displacements are forced to avoid the problem (2 places).Closes #129