Feature/view to simd by rrahn · Pull Request #1190 · seqan/seqan3

rrahn · 2019-07-15T17:39:59Z

Implements the to_simd view which does AoS to SoA transformation:

-----------------------------------------------------------------------------------------------------------------
Benchmark                                                       Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------------------------
// Naive implementation
to_simd_naive<std::vector<dna4>, simd_type_t<int8_t>>        7223 ns         7196 ns        86953 value=65.3887M
to_simd_naive<std::vector<dna4>, simd_type_t<int16_t>>       1938 ns         1934 ns       371708 value=279.524M
to_simd_naive<std::vector<dna4>, simd_type_t<int32_t>>       1078 ns         1074 ns       646675 value=486.3M
to_simd_naive<std::vector<dna4>, simd_type_t<int64_t>>        673 ns          671 ns       948484 value=713.26M
to_simd_naive<std::deque<dna4>, simd_type_t<int8_t>>        24870 ns        24838 ns        28445 value=21.3906M
to_simd_naive<std::deque<dna4>, simd_type_t<int16_t>>        7647 ns         7594 ns        90960 value=68.4019M
to_simd_naive<std::deque<dna4>, simd_type_t<int32_t>>        2087 ns         2083 ns       332964 value=250.389M
to_simd_naive<std::deque<dna4>, simd_type_t<int64_t>>        1140 ns         1138 ns       619266 value=465.688M

// View implementation
to_simd<std::vector<dna4>, simd_type_t<int8_t>>              2326 ns         2320 ns       305367 value=229.636M
to_simd<std::vector<dna4>, simd_type_t<int16_t>>             1462 ns         1459 ns       479160 value=360.328M
to_simd<std::vector<dna4>, simd_type_t<int32_t>>              638 ns          637 ns      1075946 value=809.111M
to_simd<std::vector<dna4>, simd_type_t<int64_t>>              348 ns          347 ns      2019136 value=1.51839G
to_simd<std::deque<dna4>, simd_type_t<int8_t>>               9592 ns         9568 ns        72257 value=54.3373M
to_simd<std::deque<dna4>, simd_type_t<int16_t>>              5360 ns         5351 ns       128215 value=96.4177M
to_simd<std::deque<dna4>, simd_type_t<int32_t>>              1573 ns         1568 ns       459113 value=345.253M
to_simd<std::deque<dna4>, simd_type_t<int64_t>>               786 ns          784 ns       774791 value=582.643M
to_simd<std::list<dna4>, simd_type_t<int8_t>>               13080 ns        13053 ns        53440 value=40.1869M
to_simd<std::list<dna4>, simd_type_t<int16_t>>               4550 ns         4539 ns       152027 value=114.324M
to_simd<std::list<dna4>, simd_type_t<int32_t>>               1053 ns         1049 ns       668098 value=502.41M
to_simd<std::list<dna4>, simd_type_t<int64_t>>                847 ns          845 ns       934093 value=702.438M

codecov · 2019-07-15T22:29:48Z

Codecov Report

Merging #1190 into master will decrease coverage by 0.07%.
The diff coverage is 91.93%.

@@            Coverage Diff             @@
##           master    #1190      +/-   ##
==========================================
- Coverage   96.86%   96.79%   -0.08%     
==========================================
  Files         212      213       +1     
  Lines        8274     8403     +129     
==========================================
+ Hits         8015     8134     +119     
- Misses        259      269      +10

Impacted Files	Coverage Δ
include/seqan3/core/simd/simd_algorithm.hpp	`100% <100%> (ø)`	⬆️
include/seqan3/core/simd/view_to_simd.hpp	`90.47% <90.47%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5a0354a...86a55af. Read the comment docs.

rrahn · 2019-07-18T07:59:24Z

@marehr ping

marehr

Review until now

include/seqan3/core/simd/detail/builtin_simd.hpp

marehr · 2019-07-18T15:38:29Z

include/seqan3/core/simd/concept.hpp

 //          requires std::Same<decltype(a - b), simd_t>;
 template <typename simd_t>
-SEQAN3_CONCEPT Simd = requires (simd_t a, simd_t b)
+SEQAN3_CONCEPT Simd = requires (std::remove_reference_t<simd_t> a, std::remove_reference_t<simd_t> b)


a and b should be without std::remove_reference_t<> because the operations (like a == b, a != b) should be still valid for reference types. And from a user perspective, he should be able to expect that an expression a == b is working for the given type.

Suggested change

SEQAN3_CONCEPT Simd = requires (std::remove_reference_t<simd_t> a, std::remove_reference_t<simd_t> b)

SEQAN3_CONCEPT Simd = requires (simd_t a, simd_t b)

The rest of the std::remove_reference_t<> in this concept are okay.

On a side note what happens with const simds? Should we introduce a Writable/MutableSimd concept?

I thought about it and honestly, I think we need to distinguish between them. Also for many operations we seldomly update a vector but always get a new one returned. That's at least my feeling how we use it in the algorithms.

include/seqan3/core/simd/concept.hpp

include/seqan3/core/simd/detail/builtin_simd.hpp

marehr · 2019-07-19T12:07:23Z

include/seqan3/core/simd/detail/builtin_simd.hpp

+ * \see seqan3::detail::is_native_builtin_simd_v
+ */
+template <typename builtin_simd_t>
+constexpr bool is_native_builtin_simd_v = is_native_builtin_simd<builtin_simd_t>::value;


why not evaluate it here as a lambda function? that would be more readable.

We can also just define the bool constants if we don't need it as a type anyway. Otherwise it follows the STL way of providing unary type traits.

include/seqan3/core/simd/view_to_simd.hpp

marehr · 2019-08-07T13:29:05Z

include/seqan3/core/simd/view_to_simd.hpp

+                    return this_view->padding_value;
+                }
+                else
+                {  // only increment if not at end.


Suggested change

{ // only increment if not at end.

{ // only increment if not at end.

marehr · 2019-08-07T13:32:23Z

include/seqan3/core/simd/view_to_simd.hpp

+            // Thus, for the 8 sequences we need to load two times 16 consecutive bytes to fill the matrix.
+            // This quadratic byte matrix can be transposed efficiently with simd instructions.
+            constexpr int8_t max_size = simd_traits<max_simd_t>::length;
+            constexpr int8_t num_chunks = max_size / chunk_size;


This you called chunks_per_load

Suggested change

constexpr int8_t num_chunks = max_size / chunk_size;

constexpr int8_t num_chunks = chunks_per_load;

marehr · 2019-08-07T13:33:17Z

include/seqan3/core/simd/view_to_simd.hpp

+            // To fill the 16x16 matrix we need four 8x8 matrices.
+            // Thus, for the 8 sequences we need to load two times 16 consecutive bytes to fill the matrix.
+            // This quadratic byte matrix can be transposed efficiently with simd instructions.
+            constexpr int8_t max_size = simd_traits<max_simd_t>::length;


Suggested change

constexpr int8_t max_size = simd_traits<max_simd_t>::length;

constexpr int8_t max_size = simd_traits<simd_t>::max_length;

rrahn · 2019-08-08T11:58:22Z

@marehr ok, I think I addressed all your issues so far. Ready for the next ones 😏

marehr

puh I think the high-level design seems fine, but under the hood it is pretty messy

include/seqan3/core/simd/simd_algorithm.hpp

marehr · 2019-08-11T08:53:04Z

include/seqan3/core/simd/simd_algorithm.hpp

+    {
+        detail::transpose_matrix_sse4(matrix);
+    }
+    else // Element wise transpose matrix which is possibly auto vectorised.


I imagine you didn't test that

I tested everyhing! For SSE4, AVX2 and AVX512 and no extension at all.

I meant that it auto vectorises.

yes I tested with the auto vectorisation and in fact the intrinsics version was roughly 20% faster. So I decided to add it, but kept the auto vectorisation for larger instruction sets available for now.

marehr · 2019-08-11T21:15:11Z

include/seqan3/core/simd/simd_algorithm.hpp

+template <Simd target_simd_t, Simd source_simd_t>
+constexpr target_simd_t upcast_signed(source_simd_t const & src)
+{
+    if constexpr (simd_traits<source_simd_t>::max_length == 16) // SSE4


This works for now, but I'm not really a fan of the current design. It does not check wether current architecture really supports sse4, avx2 and avx512.

It will ungracefully fail if you create a simd vector that has avx512 size, but the architecture does not include avx512.

Well, I hope you don't mind, that I really don't care about corner cases right now. We don't even have a proper testing system for this right now. Not sure, if you plan to add these sometime soon, but it was already quite a bit of work to test everything properly manually. In general the whole design can/should be adapted to the SIMD proposal but this is not yet relevant. We can make it safe once we have the algorithms.

include/seqan3/core/simd/view_to_simd.hpp

marehr · 2019-08-11T21:33:15Z

test/snippet/core/simd/view_to_simd.cpp

+        debug_stream << "\n\n";
+    }
+    return 0;
+}


no output? it would be helpful to provide output.

You mean a file containing the output? Or a comment with the output?

Either would be fine

include/seqan3/core/simd/view_to_simd.hpp

marehr · 2019-08-11T23:26:09Z

include/seqan3/core/simd/view_to_simd.hpp

+                {
+                    auto & it = cached_iter[i];
+                    max_simd_type & tmp = matrix[pos];
+                    tmp = simd::fill<max_simd_type>(~0);


why no fill it here with the padding value? and omit the ~0 semantic

because the padding value is based on the scalar type of the target vector size which might be bigger than one byte.

include/seqan3/core/simd/view_to_simd.hpp

rrahn · 2019-08-14T12:32:37Z

@marehr I either added all your requests or answered your comments.

marehr · 2019-08-14T12:37:24Z

Thank you I have a (second) look :)

marehr

llstm

…orithms.

rrahn · 2019-08-20T12:51:21Z

@marehr I know you already agreed upon everything, but I applied 99% of your suggestions. Maybe you want to still have a look?

rrahn requested a review from marehr July 15, 2019 17:39

rrahn force-pushed the feature/view_to_simd branch 2 times, most recently from 44d82c9 to 66b1684 Compare July 15, 2019 22:29

rrahn force-pushed the feature/view_to_simd branch from 66b1684 to c6de8be Compare July 16, 2019 15:33

marehr reviewed Aug 7, 2019

View reviewed changes

rrahn force-pushed the feature/view_to_simd branch 3 times, most recently from a1eee15 to 0328b07 Compare August 8, 2019 11:57

rrahn mentioned this pull request Aug 8, 2019

Feature/simd load #1018

Closed

rrahn requested a review from marehr August 8, 2019 14:15

marehr reviewed Aug 11, 2019

View reviewed changes

rrahn commented Aug 12, 2019

View reviewed changes

include/seqan3/core/simd/view_to_simd.hpp Show resolved Hide resolved

rrahn added 3 commits August 14, 2019 14:26

[MISC] Adds helper to check if builtin type is native.

5294f43

[MISC] Adds alias for ranges all_of and max_element.

4fb358f

[MISC] Refines SIMD Concept to work with reference types as well.

3a2ec83

rrahn force-pushed the feature/view_to_simd branch from 0328b07 to e1ef5f5 Compare August 14, 2019 12:31

rrahn requested a review from marehr August 14, 2019 12:31

marehr approved these changes Aug 15, 2019

View reviewed changes

rrahn added 3 commits August 20, 2019 14:49

[FEATURE] Adds some simd functions.

75b72d0

[FEATURE] Adds to_simd_view

8c3bdcd

[TEST] Adds performance benchmarks for to_simd_view and some simd alg…

86a55af

…orithms.

rrahn force-pushed the feature/view_to_simd branch from e1ef5f5 to 86a55af Compare August 20, 2019 12:50

rrahn merged commit 6f15cc5 into seqan:master Aug 21, 2019

rrahn deleted the feature/view_to_simd branch September 3, 2019 07:53

omnisip mentioned this pull request Dec 15, 2020

#372 Integer Sign/Zero Extension for {8,16}->{32,64} WebAssembly/simd#395

Closed

	SEQAN3_CONCEPT Simd = requires (std::remove_reference_t<simd_t> a, std::remove_reference_t<simd_t> b)
	SEQAN3_CONCEPT Simd = requires (simd_t a, simd_t b)

	{ // only increment if not at end.
	{ // only increment if not at end.

	constexpr int8_t num_chunks = max_size / chunk_size;
	constexpr int8_t num_chunks = chunks_per_load;

	constexpr int8_t max_size = simd_traits<max_simd_t>::length;
	constexpr int8_t max_size = simd_traits<simd_t>::max_length;

Conversation

rrahn commented Jul 15, 2019

Uh oh!

codecov bot commented Jul 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

rrahn commented Jul 18, 2019

Uh oh!

marehr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rrahn commented Aug 8, 2019

Uh oh!

marehr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rrahn commented Aug 14, 2019

Uh oh!

marehr commented Aug 14, 2019

Uh oh!

marehr left a comment

Choose a reason for hiding this comment

Uh oh!

rrahn commented Aug 20, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

codecov bot commented Jul 15, 2019 •

edited

Loading