Skip to content

fix LayerNorm f16 CPU implementation#22479

Merged
fs-eire merged 5 commits intomainfrom
fs-eire/fix-layernorm-cpu-f16
Oct 18, 2024
Merged

fix LayerNorm f16 CPU implementation#22479
fs-eire merged 5 commits intomainfrom
fs-eire/fix-layernorm-cpu-f16

Conversation

@fs-eire
Copy link
Copy Markdown
Contributor

@fs-eire fs-eire commented Oct 17, 2024

Description

The recent PR #22223 introduced 2 bugs in implementation of CPU LayerNorm f16:

  • possible access to nullptr for bias
    const TensorShape& bias_shape = bias->Shape(); will crash when bias does not exist. (amazingly seems this one is not coverred by any test case)
    • fix: guard with pointer check
  • a racing condition inside ComputeJob
    ComputeJob() is dispatched to threadpool and it internally tries to modify LayerNormImpl::scale_fp32_ and LayerNormImpl::bias_fp32_, which are std::unique_ptrs and are not thread-safe.
    • fix: move the modification of LayerNormImpl::scale_fp32_ and LayerNormImpl::bias_fp32_ out of ComputeJob() and put into LayerNormImpl::ComputeWithoutContext(). It may still have racing condition because ConcurrentRunSupported is set to true for CPU EP. Added an OrtMutex.

This should fixes the recent flaky tests as well.

Comment thread onnxruntime/core/providers/cpu/nn/layer_norm_impl.h Outdated
amarin16
amarin16 previously approved these changes Oct 17, 2024
Comment thread onnxruntime/core/providers/cpu/nn/layer_norm_impl.h Outdated
Comment thread onnxruntime/core/providers/cpu/nn/layer_norm_impl.cc Outdated
Copy link
Copy Markdown
Contributor

@tianleiwu tianleiwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mutex is not needed. See other comments.

@fs-eire
Copy link
Copy Markdown
Contributor Author

fs-eire commented Oct 17, 2024

Updated to resolve the comments.

  • mutex removed.
  • the members of unique_ptr of float* are only assigned in Prepack()
    • if prepack is used, always use the stored prepacked_scale_fp32_data_ and prepacked_bias_fp32_data_.
    • if prepack is not used, it means the scale and bias are not initializers. always convert from f16 to f32 for them.

@fs-eire fs-eire requested a review from tianleiwu October 17, 2024 22:46
tianleiwu
tianleiwu previously approved these changes Oct 17, 2024
Comment thread onnxruntime/test/onnx/microbenchmark/layer_normalization.cc Outdated
Comment thread onnxruntime/core/providers/cpu/nn/layer_norm_impl.cc Outdated
@fs-eire fs-eire merged commit b4cb937 into main Oct 18, 2024
@fs-eire fs-eire deleted the fs-eire/fix-layernorm-cpu-f16 branch October 18, 2024 01:49
guschmue pushed a commit that referenced this pull request Oct 18, 2024
### Description

The recent PR #22223 introduced 2 bugs in implementation of CPU
LayerNorm f16:
- possible access to nullptr for bias
`const TensorShape& bias_shape = bias->Shape();` will crash when `bias`
does not exist. (amazingly seems this one is not coverred by any test
case)
   - fix: guard with pointer check
- a racing condition inside ComputeJob
`ComputeJob()` is dispatched to threadpool and it internally tries to
modify `LayerNormImpl::scale_fp32_` and `LayerNormImpl::bias_fp32_`,
which are `std::unique_ptr`s and are not thread-safe.
- fix: move the modification of `LayerNormImpl::scale_fp32_` and
`LayerNormImpl::bias_fp32_` out of `ComputeJob()` and put into
`LayerNormImpl::ComputeWithoutContext()`. It may still have racing
condition because `ConcurrentRunSupported` is set to `true` for CPU EP.
Added an OrtMutex.

This should fixes the recent flaky tests as well.
tianleiwu pushed a commit that referenced this pull request Oct 18, 2024
### Description

The recent PR #22223 introduced 2 bugs in implementation of CPU
LayerNorm f16:
- possible access to nullptr for bias
`const TensorShape& bias_shape = bias->Shape();` will crash when `bias`
does not exist. (amazingly seems this one is not coverred by any test
case)
   - fix: guard with pointer check
- a racing condition inside ComputeJob
`ComputeJob()` is dispatched to threadpool and it internally tries to
modify `LayerNormImpl::scale_fp32_` and `LayerNormImpl::bias_fp32_`,
which are `std::unique_ptr`s and are not thread-safe.
- fix: move the modification of `LayerNormImpl::scale_fp32_` and
`LayerNormImpl::bias_fp32_` out of `ComputeJob()` and put into
`LayerNormImpl::ComputeWithoutContext()`. It may still have racing
condition because `ConcurrentRunSupported` is set to `true` for CPU EP.
Added an OrtMutex.

This should fixes the recent flaky tests as well.
apsonawane pushed a commit that referenced this pull request Oct 22, 2024
### Description

The recent PR #22223 introduced 2 bugs in implementation of CPU
LayerNorm f16:
- possible access to nullptr for bias
`const TensorShape& bias_shape = bias->Shape();` will crash when `bias`
does not exist. (amazingly seems this one is not coverred by any test
case)
   - fix: guard with pointer check
- a racing condition inside ComputeJob
`ComputeJob()` is dispatched to threadpool and it internally tries to
modify `LayerNormImpl::scale_fp32_` and `LayerNormImpl::bias_fp32_`,
which are `std::unique_ptr`s and are not thread-safe.
- fix: move the modification of `LayerNormImpl::scale_fp32_` and
`LayerNormImpl::bias_fp32_` out of `ComputeJob()` and put into
`LayerNormImpl::ComputeWithoutContext()`. It may still have racing
condition because `ConcurrentRunSupported` is set to `true` for CPU EP.
Added an OrtMutex.

This should fixes the recent flaky tests as well.
@sophies927 sophies927 added the cherry-picked Cherry-picked for a cherrypicks branch label Oct 22, 2024
@snnn
Copy link
Copy Markdown
Contributor

snnn commented Sep 5, 2025

This PR has been cherry-picked into the rel-1.20.0 branch in PR #22526. Removing the release:1.20.0 label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-picked Cherry-picked for a cherrypicks branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants