Skip to content

Cleanup byte swapping utilities to generate optimal code on the platforms we care about.#11394

Closed
resistor wants to merge 1 commit intopytorch:masterfrom
resistor:byteorder
Closed

Cleanup byte swapping utilities to generate optimal code on the platforms we care about.#11394
resistor wants to merge 1 commit intopytorch:masterfrom
resistor:byteorder

Conversation

@resistor
Copy link
Contributor

@resistor resistor commented Sep 7, 2018

While the use of memcpy as part of the byte swapping sequence looks funky, all major
compilers recognize and optimize this pattern reliably, resulting in essentially
optimal code generation.

For example, decodeUInt32LE goes from this on iOS arm64:

    ldrb    w8, [x0, #3]
    ldrb    w9, [x0, #2]
    bfi     w8, w9, #8, #8
    ldrb    w9, [x0, #1]
    bfi     w8, w9, #16, #8
    ldrb            w9, [x0]
    bfi     w8, w9, #24, #8
    mov      x0, x8
    ret

To this:

    ldr             w8, [x0]
    rev     w0, w8
    ret

@resistor resistor force-pushed the byteorder branch 2 times, most recently from bc610a9 to 232402d Compare September 7, 2018 20:43
…orms we care about.

While the use of memcpy as part of the byte swapping sequence looks funky, all major
compilers recognize and optimize this pattern reliably, resulting in essentially
optimal code generation.

For example, decodeUInt32LE goes from this on iOS arm64:
        ldrb    w8, [x0, pytorch#3]
        ldrb    w9, [x0, pytorch#2]
        bfi     w8, w9, pytorch#8, pytorch#8
        ldrb    w9, [x0, pytorch#1]
        bfi     w8, w9, pytorch#16, pytorch#8
        ldrb            w9, [x0]
        bfi     w8, w9, pytorch#24, pytorch#8
        mov      x0, x8
        ret

To this:
        ldr             w8, [x0]
        rev     w0, w8
        ret
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resistor has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resistor is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resistor has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

PenghuiCheng pushed a commit to PenghuiCheng/pytorch that referenced this pull request Sep 11, 2018
…orms we care about. (pytorch#11394)

Summary:
While the use of memcpy as part of the byte swapping sequence looks funky, all major
compilers recognize and optimize this pattern reliably, resulting in essentially
optimal code generation.

For example, decodeUInt32LE goes from this on iOS arm64:
>         ldrb    w8, [x0, pytorch#3]
>         ldrb    w9, [x0, pytorch#2]
>         bfi     w8, w9, pytorch#8, pytorch#8
>         ldrb    w9, [x0, pytorch#1]
>         bfi     w8, w9, pytorch#16, pytorch#8
>         ldrb            w9, [x0]
>         bfi     w8, w9, pytorch#24, pytorch#8
>         mov      x0, x8
>         ret

To this:
>         ldr             w8, [x0]
>         rev     w0, w8
>         ret
Pull Request resolved: pytorch#11394

Reviewed By: SsnL

Differential Revision: D9728659

Pulled By: resistor

fbshipit-source-id: 9afbd4adfad1d1fb7b01f1179e6707ee21fa726f
@ezyang ezyang added the merged label Jun 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants