Use templates instead of macro when defining Vec256<BFloat16> bin operators#35844
Use templates instead of macro when defining Vec256<BFloat16> bin operators#35844xuhdev wants to merge 5 commits intogh/xuhdev/69/basefrom
Conversation
…rators Also, bitwise operators can operate on the underlying __m256i representation directly instead of making expensive conversions to float16. [ghstack-poisoned]
💊 CircleCI build failures summary and remediationsAs of commit a2e7aa8 (more details on the Dr. CI page): ✅ None of the build failures appear to be your fault 💚
🚧 2 upstream failures:These were probably caused by upstream breakages:
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker. This comment has been revised 24 times. |
| auto o2 = func(a_hi, b_hi); \ | ||
| return cvtfp32_bf16(o1, o2); \ | ||
| template<typename Op> | ||
| Vec256<BFloat16> inline bfloat16_binary_op_as_fp32(const Vec256<BFloat16>& a, const Vec256<BFloat16>& b, Op op) { |
There was a problem hiding this comment.
@XiaobingSuper Do you think this function can also be used for implementing operators >, <, >=, and <=? Now #35117 should be waiting on these operators.
…16> bin operators" Also, bitwise operators can operate on the underlying __m256i representation directly instead of making expensive conversions to float16. [ghstack-poisoned]
…16> bin operators" Also, bitwise operators can operate on the underlying __m256i representation directly instead of making expensive conversions to float16. [ghstack-poisoned]
…16> bin operators" Also, bitwise operators can operate on the underlying __m256i representation directly instead of making expensive conversions to float16. [ghstack-poisoned]
…16> bin operators" Also, bitwise operators can operate on the underlying __m256i representation directly instead of making expensive conversions to float16. [ghstack-poisoned]
| } | ||
|
|
||
| Vec256<BFloat16> inline operator&(const Vec256<BFloat16>& a, const Vec256<BFloat16>& b) { | ||
| return _mm256_and_si256(a, b); |
There was a problem hiding this comment.
this is instruction for signed integers, not for floats? It used to be _mm256_and_ps, which is indeed instruction for floats. Ah, nm, I see what you are doing.
There was a problem hiding this comment.
Yes. The point is that it is not necessary to convert to float in this case, because bitwise operators have the same effects. There are two different instructions for integers and float because they can be directly applied to different data types (__m256i and __m256).
…rators (pytorch#35844) Summary: Pull Request resolved: pytorch#35844 Also, bitwise operators can operate on the underlying __m256i representation directly instead of making expensive conversions to float16. Test Plan: Imported from OSS Differential Revision: D20927639 Pulled By: ngimel fbshipit-source-id: 148c503df090580c8504f0df8d6ed2648d614120
…rators (pytorch#35844) Summary: Pull Request resolved: pytorch#35844 Also, bitwise operators can operate on the underlying __m256i representation directly instead of making expensive conversions to float16. Test Plan: Imported from OSS Differential Revision: D20927639 Pulled By: ngimel fbshipit-source-id: 148c503df090580c8504f0df8d6ed2648d614120
Stack from ghstack:
Also, bitwise operators can operate on the underlying __m256i
representation directly instead of making expensive conversions to
float16.
Differential Revision: D20927639