Conversation
44d82c9 to
66b1684
Compare
Codecov Report
@@ Coverage Diff @@
## master #1190 +/- ##
==========================================
- Coverage 96.86% 96.79% -0.08%
==========================================
Files 212 213 +1
Lines 8274 8403 +129
==========================================
+ Hits 8015 8134 +119
- Misses 259 269 +10
Continue to review full report at Codecov.
|
66b1684 to
c6de8be
Compare
|
@marehr ping |
include/seqan3/core/simd/concept.hpp
Outdated
| // requires std::Same<decltype(a - b), simd_t>; | ||
| template <typename simd_t> | ||
| SEQAN3_CONCEPT Simd = requires (simd_t a, simd_t b) | ||
| SEQAN3_CONCEPT Simd = requires (std::remove_reference_t<simd_t> a, std::remove_reference_t<simd_t> b) |
There was a problem hiding this comment.
a and b should be without std::remove_reference_t<> because the operations (like a == b, a != b) should be still valid for reference types. And from a user perspective, he should be able to expect that an expression a == b is working for the given type.
| SEQAN3_CONCEPT Simd = requires (std::remove_reference_t<simd_t> a, std::remove_reference_t<simd_t> b) | |
| SEQAN3_CONCEPT Simd = requires (simd_t a, simd_t b) |
The rest of the std::remove_reference_t<> in this concept are okay.
On a side note what happens with const simds? Should we introduce a Writable/MutableSimd concept?
There was a problem hiding this comment.
I thought about it and honestly, I think we need to distinguish between them. Also for many operations we seldomly update a vector but always get a new one returned. That's at least my feeling how we use it in the algorithms.
| * \see seqan3::detail::is_native_builtin_simd_v | ||
| */ | ||
| template <typename builtin_simd_t> | ||
| constexpr bool is_native_builtin_simd_v = is_native_builtin_simd<builtin_simd_t>::value; |
There was a problem hiding this comment.
why not evaluate it here as a lambda function? that would be more readable.
There was a problem hiding this comment.
We can also just define the bool constants if we don't need it as a type anyway. Otherwise it follows the STL way of providing unary type traits.
| return this_view->padding_value; | ||
| } | ||
| else | ||
| { // only increment if not at end. |
There was a problem hiding this comment.
| { // only increment if not at end. | |
| { // only increment if not at end. |
| // Thus, for the 8 sequences we need to load two times 16 consecutive bytes to fill the matrix. | ||
| // This quadratic byte matrix can be transposed efficiently with simd instructions. | ||
| constexpr int8_t max_size = simd_traits<max_simd_t>::length; | ||
| constexpr int8_t num_chunks = max_size / chunk_size; |
There was a problem hiding this comment.
This you called chunks_per_load
| constexpr int8_t num_chunks = max_size / chunk_size; | |
| constexpr int8_t num_chunks = chunks_per_load; |
| // To fill the 16x16 matrix we need four 8x8 matrices. | ||
| // Thus, for the 8 sequences we need to load two times 16 consecutive bytes to fill the matrix. | ||
| // This quadratic byte matrix can be transposed efficiently with simd instructions. | ||
| constexpr int8_t max_size = simd_traits<max_simd_t>::length; |
There was a problem hiding this comment.
| constexpr int8_t max_size = simd_traits<max_simd_t>::length; | |
| constexpr int8_t max_size = simd_traits<simd_t>::max_length; |
a1eee15 to
0328b07
Compare
|
@marehr ok, I think I addressed all your issues so far. Ready for the next ones 😏 |
marehr
left a comment
There was a problem hiding this comment.
puh I think the high-level design seems fine, but under the hood it is pretty messy
| { | ||
| detail::transpose_matrix_sse4(matrix); | ||
| } | ||
| else // Element wise transpose matrix which is possibly auto vectorised. |
There was a problem hiding this comment.
I tested everyhing! For SSE4, AVX2 and AVX512 and no extension at all.
There was a problem hiding this comment.
I meant that it auto vectorises.
There was a problem hiding this comment.
yes I tested with the auto vectorisation and in fact the intrinsics version was roughly 20% faster. So I decided to add it, but kept the auto vectorisation for larger instruction sets available for now.
| template <Simd target_simd_t, Simd source_simd_t> | ||
| constexpr target_simd_t upcast_signed(source_simd_t const & src) | ||
| { | ||
| if constexpr (simd_traits<source_simd_t>::max_length == 16) // SSE4 |
There was a problem hiding this comment.
This works for now, but I'm not really a fan of the current design. It does not check wether current architecture really supports sse4, avx2 and avx512.
It will ungracefully fail if you create a simd vector that has avx512 size, but the architecture does not include avx512.
There was a problem hiding this comment.
Well, I hope you don't mind, that I really don't care about corner cases right now. We don't even have a proper testing system for this right now. Not sure, if you plan to add these sometime soon, but it was already quite a bit of work to test everything properly manually. In general the whole design can/should be adapted to the SIMD proposal but this is not yet relevant. We can make it safe once we have the algorithms.
| debug_stream << "\n\n"; | ||
| } | ||
| return 0; | ||
| } |
There was a problem hiding this comment.
no output? it would be helpful to provide output.
There was a problem hiding this comment.
You mean a file containing the output? Or a comment with the output?
| { | ||
| auto & it = cached_iter[i]; | ||
| max_simd_type & tmp = matrix[pos]; | ||
| tmp = simd::fill<max_simd_type>(~0); |
There was a problem hiding this comment.
why no fill it here with the padding value? and omit the ~0 semantic
There was a problem hiding this comment.
because the padding value is based on the scalar type of the target vector size which might be bigger than one byte.
0328b07 to
e1ef5f5
Compare
|
@marehr I either added all your requests or answered your comments. |
|
Thank you I have a (second) look :) |
e1ef5f5 to
86a55af
Compare
|
@marehr I know you already agreed upon everything, but I applied 99% of your suggestions. Maybe you want to still have a look? |
Implements the to_simd view which does AoS to SoA transformation: