Add MLFloat16 support for LayerNormalization, SkipLayerNormalization by amarin16 · Pull Request #22063 · microsoft/onnxruntime

amarin16 · 2024-09-11T18:05:02Z

Add MLFloat16 support for:

LayerNormalization
SimplifiedLayerNormalization
SkipLayerNormalization
SkipSimplifiedLayerNormalization

There are existing LayerNormTest unit tests that cover the MLFloat16 functionality for LayerNormalization once MLFloat16 is registered (for example LayerNormTest.LayerNorm_Scale_Float16Input).

Similarly, there are unit tests such as SkipLayerNormTest.SkipLayerNormBatch1_Float16 that cover MLFloat16 inputs for SkipLayerNormalization.

…Normalization

tianleiwu · 2024-09-11T18:41:21Z

Could it be faster to use MlasConvertHalfToFloatBuffer to convert inputs?

amarin16 · 2024-09-11T21:31:16Z

Could it be faster to use MlasConvertHalfToFloatBuffer to convert inputs?

@tianleiwu From what I can see, the logic inside MLFloat16.ToFloat() is very similar to the one inside MlasConvertHalfToFloatBuffer

tianleiwu · 2024-09-11T21:43:01Z

@tianleiwu From what I can see, the logic inside MLFloat16.ToFloat() is very similar to the one inside MlasConvertHalfToFloatBuffer

That's slow path. The fast path uses assembly kernel of AVX_NE_CONVERT instructions, which might be faster (It is not guarantee since there is extra I/O if we use temp buffers to hold the casted inputs. May need run benchmark to see whether it could help).

…rib_kernels

tianleiwu · 2024-09-12T01:25:57Z

Please fix format like the following:

pip install requirements-lintrunner.txt
pip install lintrunner
lintrunner init
lintrunner -a

tianleiwu · 2024-09-12T01:26:51Z

/azp run Linux CPU CI Pipeline, Windows CPU CI Pipeline

azure-pipelines · 2024-09-12T01:27:06Z

Azure Pipelines successfully started running 2 pipeline(s).

tianleiwu · 2024-09-12T03:59:16Z

SkipLayerNormTest.SkipLayerNormBatch1 unit test failed:
https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1491341&view=logs&j=39f143cf-c993-54bc-2606-bcdaa7decea3&t=2af06c87-8e25-5eee-1e83-1b24e38a9f8f&l=8588

amarin16 · 2024-09-12T17:51:28Z

SkipLayerNormTest.SkipLayerNormBatch1 unit test failed: https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1491341&view=logs&j=39f143cf-c993-54bc-2606-bcdaa7decea3&t=2af06c87-8e25-5eee-1e83-1b24e38a9f8f&l=8588

Seems to be passing in the latest build, as well as locally

xadupre · 2024-09-12T17:58:51Z

It seems to work. You still need to update the markdown pages for the documentation. You can generate them again or manually fix the differences by looking at the job output which makes a diff between the current version and the automatically generated one: https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1491838&view=logs&j=7f366e99-16b2-52cc-e1ff-653af284e397&t=5c9cf234-f957-5fdc-9c37-89899bf73c0c&l=58.

yufenglee · 2024-09-23T22:34:51Z

How is the perf comparing with fp32 version?

- Added a microbenchmark for the `LayerNormalization` MLFloat16 support added in #22063. - Updated the `LayerNormalization` MLFloat16 implementation to improve the latency. ``` ---------------------------------------------------------------------------------------------- Original MLFloat16 support Time CPU Iterations ---------------------------------------------------------------------------------------------- BM_LayerNormalization<MLFloat16, float>/1/real_time 15599 us 15625 us 47 BM_LayerNormalization<MLFloat16, float>/1/real_time 14714 us 14824 us 39 BM_LayerNormalization<MLFloat16, float>/1/real_time 14634 us 14688 us 50 ---------------------------------------------------------------------------------------------- Updated MLFloat16 support Time CPU Iterations ---------------------------------------------------------------------------------------------- BM_LayerNormalization<MLFloat16, float>/1/real_time 7276 us 7254 us 84 BM_LayerNormalization<MLFloat16, float>/1/real_time 6820 us 6720 us 93 BM_LayerNormalization<MLFloat16, float>/1/real_time 6840 us 6882 us 84 ```

- Added a microbenchmark for the `LayerNormalization` MLFloat16 support added in microsoft/onnxruntime#22063. - Updated the `LayerNormalization` MLFloat16 implementation to improve the latency. ``` ---------------------------------------------------------------------------------------------- Original MLFloat16 support Time CPU Iterations ---------------------------------------------------------------------------------------------- BM_LayerNormalization<MLFloat16, float>/1/real_time 15599 us 15625 us 47 BM_LayerNormalization<MLFloat16, float>/1/real_time 14714 us 14824 us 39 BM_LayerNormalization<MLFloat16, float>/1/real_time 14634 us 14688 us 50 ---------------------------------------------------------------------------------------------- Updated MLFloat16 support Time CPU Iterations ---------------------------------------------------------------------------------------------- BM_LayerNormalization<MLFloat16, float>/1/real_time 7276 us 7254 us 84 BM_LayerNormalization<MLFloat16, float>/1/real_time 6820 us 6720 us 93 BM_LayerNormalization<MLFloat16, float>/1/real_time 6840 us 6882 us 84 ```

amarin16 added 6 commits September 11, 2024 07:07

Add MLFloat16 support for LayerNormalization

6c53ff2

register LayerNormalization

af04260

inline convert functions

1a6c1d8

a few renames

38a24b8

enable_if_t, is_same_v

079f0c9

Add MLFloat16 support for SkipLayerNormalization, SkipSimplifiedLayer…

ba1bbdd

…Normalization

github-advanced-security AI found potential problems Sep 11, 2024

View reviewed changes

amarin16 added 3 commits September 11, 2024 15:29

register LayerNormalization, SimplifiedLayerNormalization in cpu_cont…

9b0de4b

…rib_kernels

add constexpr

d2e0b91

reorder

246332e

xadupre reviewed Sep 12, 2024

View reviewed changes

Comment thread onnxruntime/contrib_ops/cpu/skip_layer_norm.cc Outdated

xadupre reviewed Sep 12, 2024

View reviewed changes

Comment thread onnxruntime/contrib_ops/cpu/skip_layer_norm.cc Outdated

xadupre reviewed Sep 12, 2024

View reviewed changes

Comment thread onnxruntime/contrib_ops/cpu/skip_layer_norm.cc Outdated

xadupre reviewed Sep 12, 2024

View reviewed changes

Comment thread onnxruntime/contrib_ops/cpu/skip_layer_norm.cc Outdated

xadupre reviewed Sep 12, 2024

View reviewed changes

Comment thread onnxruntime/contrib_ops/cpu/skip_layer_norm.cc Outdated

amarin16 added 3 commits September 12, 2024 08:06

lint

32e8c76

Fix null check

fdc0ac4

save a cast

16f7c01

Update documentation

de49a89

amarin16 marked this pull request as ready for review September 12, 2024 20:41

Merge branch 'main' into dev/amarin16/layer_norm

484dc17

yufenglee reviewed Sep 18, 2024

View reviewed changes

Comment thread onnxruntime/contrib_ops/cpu/skip_layer_norm.cc Outdated

use fp32 output buffer to avoid a conversion

fc625ea

github-advanced-security AI found potential problems Sep 18, 2024

View reviewed changes

amarin16 added 5 commits September 18, 2024 06:15

fix check warnings

25bfdce

fix lint error

ea6e388

lint

134a609

use size_t instead of int64_t

528e1e2

add cast to fix pipeline errors

a7d056c

yufenglee approved these changes Sep 23, 2024

View reviewed changes

amarin16 merged commit eb2506d into microsoft:main Sep 24, 2024

amarin16 mentioned this pull request Sep 25, 2024

Add microbenchmark for layer normalization and improve latency #22223

Merged

Conversation

amarin16 commented Sep 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tianleiwu commented Sep 11, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amarin16 commented Sep 11, 2024

Uh oh!

tianleiwu commented Sep 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tianleiwu commented Sep 12, 2024

Uh oh!

tianleiwu commented Sep 12, 2024

Uh oh!

azure-pipelines Bot commented Sep 12, 2024

Uh oh!

tianleiwu commented Sep 12, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amarin16 commented Sep 12, 2024

Uh oh!

xadupre commented Sep 12, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yufenglee commented Sep 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

amarin16 commented Sep 11, 2024 •

edited

Loading

tianleiwu commented Sep 11, 2024 •

edited

Loading