Conversation
0633749 to
1241203
Compare
a3e7003 to
ad99855
Compare
athre0z
left a comment
There was a problem hiding this comment.
Excellent PR, all good changes. Reviewed everything except for the fuzzing logic, which I want to take a look at when I'm less tired, presumably sometime on the weekend.
Do we intend to keep the API as-is, or do we turn the ZydisDecoderDecodeBuffer function into ZydisDecoderDecodeBufferEx later, providing a simpler variant that automatically decodes the operands as well, naming it ZydisDecoderDecodeBuffer later? I originally thought this would be the way to go, but looking at the formatter interface, we'd also have to support the struct filled by the simple variant in that one, resulting in an explosion of complexity that I'd strongly prefer to avoid.
I'm kind of leaning towards keeping the interface as proposed in this PR as our proper interface and instead providing an even simpler function that fills an even larger struct that also does formatting in one wash, the audience of which would primarily be beginner users.
In any case, I'd be happy to keep as-is and discuss this in a separate issue later.
| /** | ||
| * Internal EVEX-specific information. | ||
| */ | ||
| struct |
There was a problem hiding this comment.
Did you intentionally not ifdef away these when EVEX support is compiled out? I guess we don't really need to if the struct is purely internal, just wondering if you meant to do that.
There was a problem hiding this comment.
Yes, that was intentional. I thought about this for quite some time. As we rely on the user to allocate our structs, we have to make sure the input struct size matches our internal expectations. In a pure C/C++ environment, we could probably ifdef-ing these members away without any problem, but that's a different story for bindings. We would always have to check if the struct size is matching (e.g. going the Windows API way and introducing a size field in the ZydisDecodedInstruction that needs to be initialized by the user - extra complexity which I don't really like).
I think we should discuss this in detail. One good solution might be to introduce a ZYDIS_SIZE_OPTIMIZED_UNSAFE_FOR_BINDINGS option. This would indicate that Zydis is used in a pure C/C++ project and always built on demand (-> we know the feature defines of headers and code files are the same). This would as well allow us to e.g. change the ZydisDecodedOperand struct to unions or to introduce bit-fields in the public API.
There was a problem hiding this comment.
I think we should discuss this in detail.
Agreed, but this definitely deserves a dedicated issue for proper discussion -- let's not do that in this PR.
There was a problem hiding this comment.
let's not do that in this PR
Will create a separate issue tomorrow.
mappzor
left a comment
There was a problem hiding this comment.
Nice changes :)
Some thoughs about API (@athre0z):
Do we intend to keep the API as-is, or do we turn the ZydisDecoderDecodeBuffer function into ZydisDecoderDecodeBufferEx later, providing a simpler variant that automatically decodes the operands as well, naming it ZydisDecoderDecodeBuffer later? I originally thought this would be the way to go, but looking at the formatter interface, we'd also have to support the struct filled by the simple variant in that one, resulting in an explosion of complexity that I'd strongly prefer to avoid.
I'm kind of leaning towards keeping the interface as proposed in this PR as our proper interface and instead providing an even simpler function that fills an even larger struct that also does formatting in one wash, the audience of which would primarily be beginner users.
What about this:
ZydisDecoderDecodeBuffer-> function that decodes instruction + operands (simple utility function)ZydisDecoderDecodeInstruction-> minimal decoding (renamedZydisDecoderDecodeBufferfrom this PR)ZydisDecoderDecodeOperands-> as is in this PR
In this concept signature for ZydisDecoderDecodeBuffer would change to accept operands, so it would be another breaking change since v3. This has some advantages though:
- problem you highlighted (with having one big struct) is avoided
- function still does what it did in v3 - decodes instruction and operands, just the output is split into new v4-fashion
- this is easy to explain in porting guide and keeps migration rather staigtforward. For most existing users documentation of new
ZydisDecoderDecodeBufferwould introduce them to a new concept of decoding instruction and operands separately while immediately informing them it's optional to do so because they can keep usingZydisDecoderDecodeBufferafter making minimal adjustments to existing codebases.
| # define ZYDIS_INSTRUCTION_DEFINITION_BASE \ | ||
| ZyanU16 mnemonic ZYAN_BITFIELD(ZYDIS_MNEMONIC_REQUIRED_BITS); \ | ||
| ZyanU8 operand_count ZYAN_BITFIELD( 4); \ | ||
| ZyanU8 operand_count_visible ZYAN_BITFIELD( 3); \ |
There was a problem hiding this comment.
I like this change, it allows converting some obsolete checks inside ZydisIsDefinitionCompatible into assertion.
I like the idea of providing one function that combines instruction- and operand-decoding. However I would probably just name the utility function to something like |
Both sound good to me. However if latter one is preferred I'd go with |
a8fcd64 to
1199d4b
Compare
1199d4b to
e8f71e6
Compare
| else | ||
| if (!operand->ignore_seg_override && | ||
| instruction->attributes & ZYDIS_ATTRIB_HAS_SEGMENT_ES) | ||
| { | ||
| operands[i].mem.segment = ZYDIS_REGISTER_ES; | ||
| } | ||
| else | ||
| if (!operand->ignore_seg_override && | ||
| instruction->attributes & ZYDIS_ATTRIB_HAS_SEGMENT_FS) | ||
| { | ||
| operands[i].mem.segment = ZYDIS_REGISTER_FS; | ||
| } | ||
| else | ||
| if (!operand->ignore_seg_override && | ||
| instruction->attributes & ZYDIS_ATTRIB_HAS_SEGMENT_GS) | ||
| { | ||
| operands[i].mem.segment = ZYDIS_REGISTER_GS; | ||
| } | ||
| else | ||
| { | ||
| if (operands[i].mem.segment == ZYDIS_REGISTER_NONE) | ||
| { | ||
| if ((operands[i].mem.base == ZYDIS_REGISTER_RSP) || | ||
| (operands[i].mem.base == ZYDIS_REGISTER_RBP) || | ||
| (operands[i].mem.base == ZYDIS_REGISTER_ESP) || | ||
| (operands[i].mem.base == ZYDIS_REGISTER_EBP) || | ||
| (operands[i].mem.base == ZYDIS_REGISTER_SP) || | ||
| (operands[i].mem.base == ZYDIS_REGISTER_BP)) | ||
| { | ||
| operands[i].mem.segment = ZYDIS_REGISTER_SS; | ||
| } | ||
| else | ||
| { | ||
| operands[i].mem.segment = ZYDIS_REGISTER_DS; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Not a fan of the rightward drift here, perhaps just going with inverted conditions and gotos here might be preferable? Could also do do { .. } while (ZYAN_FALSE); and break out of it, but that's essentially just a less honest goto.
f756d6a to
e8f71e6
Compare
There was a problem hiding this comment.
LGTM! :-) One remaining open comment, but it's just style / not a blocker.
Edit: Ah, wait! We should also update the porting guide.
|
Will update the porting guide before merge. Open points for later:
|
Closes #110