Decouple operand decoding by flobernd · Pull Request #278 · zyantific/zydis

flobernd · 2021-12-01T09:16:05Z

Closes #110

athre0z

Excellent PR, all good changes. Reviewed everything except for the fuzzing logic, which I want to take a look at when I'm less tired, presumably sometime on the weekend.

Do we intend to keep the API as-is, or do we turn the ZydisDecoderDecodeBuffer function into ZydisDecoderDecodeBufferEx later, providing a simpler variant that automatically decodes the operands as well, naming it ZydisDecoderDecodeBuffer later? I originally thought this would be the way to go, but looking at the formatter interface, we'd also have to support the struct filled by the simple variant in that one, resulting in an explosion of complexity that I'd strongly prefer to avoid.

I'm kind of leaning towards keeping the interface as proposed in this PR as our proper interface and instead providing an even simpler function that fills an even larger struct that also does formatting in one wash, the audience of which would primarily be beginner users.

In any case, I'd be happy to keep as-is and discuss this in a separate issue later.

examples/RewriteCode.c

include/Zydis/DecoderTypes.h

include/Zydis/SharedTypes.h

athre0z · 2021-12-02T23:23:04Z

include/Zydis/DecoderTypes.h

+    /**
+     * Internal EVEX-specific information.
+     */
+    struct


Did you intentionally not ifdef away these when EVEX support is compiled out? I guess we don't really need to if the struct is purely internal, just wondering if you meant to do that.

Yes, that was intentional. I thought about this for quite some time. As we rely on the user to allocate our structs, we have to make sure the input struct size matches our internal expectations. In a pure C/C++ environment, we could probably ifdef-ing these members away without any problem, but that's a different story for bindings. We would always have to check if the struct size is matching (e.g. going the Windows API way and introducing a size field in the ZydisDecodedInstruction that needs to be initialized by the user - extra complexity which I don't really like).

I think we should discuss this in detail. One good solution might be to introduce a ZYDIS_SIZE_OPTIMIZED_UNSAFE_FOR_BINDINGS option. This would indicate that Zydis is used in a pure C/C++ project and always built on demand (-> we know the feature defines of headers and code files are the same). This would as well allow us to e.g. change the ZydisDecodedOperand struct to unions or to introduce bit-fields in the public API.

I think we should discuss this in detail.

Agreed, but this definitely deserves a dedicated issue for proper discussion -- let's not do that in this PR.

let's not do that in this PR

Will create a separate issue tomorrow.

src/Decoder.c

include/Zydis/Decoder.h

include/Zydis/DecoderTypes.h

include/Zydis/Formatter.h

mappzor

Nice changes :)

Some thoughs about API (@athre0z):

Do we intend to keep the API as-is, or do we turn the ZydisDecoderDecodeBuffer function into ZydisDecoderDecodeBufferEx later, providing a simpler variant that automatically decodes the operands as well, naming it ZydisDecoderDecodeBuffer later? I originally thought this would be the way to go, but looking at the formatter interface, we'd also have to support the struct filled by the simple variant in that one, resulting in an explosion of complexity that I'd strongly prefer to avoid.

I'm kind of leaning towards keeping the interface as proposed in this PR as our proper interface and instead providing an even simpler function that fills an even larger struct that also does formatting in one wash, the audience of which would primarily be beginner users.

What about this:

ZydisDecoderDecodeBuffer -> function that decodes instruction + operands (simple utility function)
ZydisDecoderDecodeInstruction -> minimal decoding (renamed ZydisDecoderDecodeBuffer from this PR)
ZydisDecoderDecodeOperands -> as is in this PR

In this concept signature for ZydisDecoderDecodeBuffer would change to accept operands, so it would be another breaking change since v3. This has some advantages though:

problem you highlighted (with having one big struct) is avoided
function still does what it did in v3 - decodes instruction and operands, just the output is split into new v4-fashion
this is easy to explain in porting guide and keeps migration rather staigtforward. For most existing users documentation of new ZydisDecoderDecodeBuffer would introduce them to a new concept of decoding instruction and operands separately while immediately informing them it's optional to do so because they can keep using ZydisDecoderDecodeBuffer after making minimal adjustments to existing codebases.

README.md

mappzor · 2021-12-03T01:09:51Z

include/Zydis/Internal/SharedData.h

 #   define ZYDIS_INSTRUCTION_DEFINITION_BASE \
        ZyanU16 mnemonic                       ZYAN_BITFIELD(ZYDIS_MNEMONIC_REQUIRED_BITS); \
        ZyanU8 operand_count                   ZYAN_BITFIELD( 4); \
+        ZyanU8 operand_count_visible           ZYAN_BITFIELD( 3); \


I like this change, it allows converting some obsolete checks inside ZydisIsDefinitionCompatible into assertion.

src/Encoder.c

tools/ZydisFuzzEncoder.c

flobernd · 2021-12-03T06:49:13Z

@mappzor

What about this: [...]

I like the idea of providing one function that combines instruction- and operand-decoding. However I would probably just name the utility function to something like ZydisDecoderDecode/ZydisDecoderFullDecode. Renaming ZydisDecoderDecodeBuffer -> ZydisDecoderDecodeInstruction actually makes sense!

mappzor · 2021-12-03T07:09:13Z

I like the idea of providing one function that combines instruction- and operand-decoding. However I would probably just name the utility function to something like ZydisDecoderDecode/ZydisDecoderFullDecode.

Both sound good to me. However if latter one is preferred I'd go with ZydisDecoderDecodeFull to stay consistent with ZydisDecoderDecode(Instruction|Operands). It's important to document that change in porting guide and it's probably best place to mention that alternative approach to decoding exists in v4.

athre0z · 2021-12-05T22:18:29Z

src/Decoder.c

+                    else
+                        if (!operand->ignore_seg_override &&
+                            instruction->attributes & ZYDIS_ATTRIB_HAS_SEGMENT_ES)
+                        {
+                            operands[i].mem.segment = ZYDIS_REGISTER_ES;
+                        }
+                        else
+                            if (!operand->ignore_seg_override &&
+                                instruction->attributes & ZYDIS_ATTRIB_HAS_SEGMENT_FS)
+                            {
+                                operands[i].mem.segment = ZYDIS_REGISTER_FS;
+                            }
+                            else
+                                if (!operand->ignore_seg_override &&
+                                    instruction->attributes & ZYDIS_ATTRIB_HAS_SEGMENT_GS)
+                                {
+                                    operands[i].mem.segment = ZYDIS_REGISTER_GS;
+                                }
+                                else
+                                {
+                                    if (operands[i].mem.segment == ZYDIS_REGISTER_NONE)
+                                    {
+                                        if ((operands[i].mem.base == ZYDIS_REGISTER_RSP) ||
+                                            (operands[i].mem.base == ZYDIS_REGISTER_RBP) ||
+                                            (operands[i].mem.base == ZYDIS_REGISTER_ESP) ||
+                                            (operands[i].mem.base == ZYDIS_REGISTER_EBP) ||
+                                            (operands[i].mem.base == ZYDIS_REGISTER_SP) ||
+                                            (operands[i].mem.base == ZYDIS_REGISTER_BP))
+                                        {
+                                            operands[i].mem.segment = ZYDIS_REGISTER_SS;
+                                        }
+                                        else
+                                        {
+                                            operands[i].mem.segment = ZYDIS_REGISTER_DS;
+                                        }
+                                    }
+                                }


Not a fan of the rightward drift here, perhaps just going with inverted conditions and gotos here might be preferable? Could also do do { .. } while (ZYAN_FALSE); and break out of it, but that's essentially just a less honest goto.

src/Formatter.c

src/Encoder.c

athre0z

LGTM! :-) One remaining open comment, but it's just style / not a blocker.

Edit: Ah, wait! We should also update the porting guide.

flobernd · 2021-12-07T06:08:33Z

Will update the porting guide before merge.

Open points for later:

Discussion: Feature IFDEFS and bitstructs in public API (Discussion: Feature IFDEFs and unions/bitfields in public API structs #280)

I think we should discuss this in detail. One good solution might be to introduce a ZYDIS_SIZE_OPTIMIZED_UNSAFE_FOR_BINDINGS option. This would indicate that Zydis is used in a pure C/C++ project and always built on demand (-> we know the feature defines of headers and code files are the same). This would as well allow us to e.g. change the ZydisDecodedOperand struct to unions or to introduce bit-fields in the public API.

Discussion: Remove ZydisFormatterFormatEx
Optimize encoder (operand masks)

flobernd requested a review from athre0z December 1, 2021 09:16

flobernd force-pushed the deccouple-operand-decoding branch 2 times, most recently from 0633749 to 1241203 Compare December 1, 2021 17:20

flobernd marked this pull request as ready for review December 2, 2021 13:56

flobernd force-pushed the deccouple-operand-decoding branch 2 times, most recently from a3e7003 to ad99855 Compare December 2, 2021 14:34

athre0z reviewed Dec 3, 2021

View reviewed changes

mappzor reviewed Dec 3, 2021

View reviewed changes

flobernd force-pushed the deccouple-operand-decoding branch 7 times, most recently from a8fcd64 to 1199d4b Compare December 5, 2021 10:30

Decouple operand decoding

e8f71e6

flobernd force-pushed the deccouple-operand-decoding branch from 1199d4b to e8f71e6 Compare December 5, 2021 10:34

athre0z reviewed Dec 5, 2021

View reviewed changes

flobernd force-pushed the deccouple-operand-decoding branch from f756d6a to e8f71e6 Compare December 6, 2021 14:43

athre0z previously approved these changes Dec 6, 2021

View reviewed changes

Update porting guide

d3a9175

flobernd dismissed athre0z’s stale review via d3a9175 December 7, 2021 10:00

flobernd merged commit 979f0be into master Dec 7, 2021

flobernd deleted the deccouple-operand-decoding branch December 7, 2021 10:28

flobernd mentioned this pull request Dec 22, 2021

Add decoder-mode to skip decoding of hidden operands #105

Closed

Uh oh!

Conversation

flobernd commented Dec 1, 2021

Uh oh!

athre0z left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

athre0z Dec 2, 2021

Choose a reason for hiding this comment

Uh oh!

flobernd Dec 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

athre0z Dec 3, 2021

Choose a reason for hiding this comment

Uh oh!

flobernd Dec 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mappzor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mappzor Dec 3, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

flobernd commented Dec 3, 2021

Uh oh!

mappzor commented Dec 3, 2021

Uh oh!

athre0z Dec 5, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

athre0z left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

flobernd commented Dec 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

flobernd Dec 3, 2021 •

edited

Loading

flobernd Dec 3, 2021 •

edited

Loading

athre0z left a comment •

edited

Loading

flobernd commented Dec 7, 2021 •

edited

Loading