<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"><title>Dirty hands coding - Comments: Vectorizing small fixed-size sort</title><link href="https://dirtyhandscoding.github.io/posts/vectorizing-small-fixed-size-sort.html/" rel="alternate"></link><link href="/feeds/comment.vectorizing-small-fixed-size-sort.atom.xml" rel="self"></link><id>https://dirtyhandscoding.github.io/posts/vectorizing-small-fixed-size-sort.html/</id><updated>2021-08-11T11:46:00+07:00</updated><entry><title>Posted by: dirtyhandscoding</title><link href="https://dirtyhandscoding.github.io/posts/vectorizing-small-fixed-size-sort.html/#comment-2md" rel="alternate"></link><published>2021-08-11T11:46:00+07:00</published><updated>2021-08-11T11:46:00+07:00</updated><author><name>dirtyhandscoding</name></author><id>tag:dirtyhandscoding.github.io,2021-08-11:/posts/vectorizing-small-fixed-size-sort.html//comment-2md</id><summary type="html">&lt;p&gt;Only SSE is used in the article, not even AVX.
The code repo contains AVX implementation, but it works slower than SSE.
The size of the problem is very small, so it is hard to benefit from wider vectorization, and for greater size vectorized code will lose to &lt;em&gt;O(N …&lt;/em&gt;&lt;/p&gt;</summary><content type="html">&lt;p&gt;Only SSE is used in the article, not even AVX.
The code repo contains AVX implementation, but it works slower than SSE.
The size of the problem is very small, so it is hard to benefit from wider vectorization, and for greater size vectorized code will lose to &lt;em&gt;O(N log N)&lt;/em&gt; solutions.
So, I don't think AVX-512 will help here.&lt;/p&gt;
&lt;p&gt;Also, I think AVX-512 is still unavailable in consumer CPUs, so I cannot test it anyway.&lt;/p&gt;
&lt;p&gt;As for use case, I don't have one.
I did it purely for fun after seeing many questions on stackoverflow.
Looking at the links I provided at the beginning of the article, question posters did not describe their use cases too.&lt;/p&gt;</content><category term="Uncategorized"></category></entry><entry><title>Posted by: Touisteur</title><link href="https://dirtyhandscoding.github.io/posts/vectorizing-small-fixed-size-sort.html/#comment-1md" rel="alternate"></link><published>2021-08-11T00:54:00+02:00</published><updated>2021-08-11T00:54:00+02:00</updated><author><name>Touisteur</name></author><id>tag:dirtyhandscoding.github.io,2021-08-11:/posts/vectorizing-small-fixed-size-sort.html//comment-1md</id><summary type="html">&lt;p&gt;Neat! It makes me think of prefix sums and all the things you can use them for.&lt;/p&gt;
&lt;p&gt;I'm thinking for a selection algorithm the 'moving the result cells in their new position' can be forsaken, but I'm wondering how it would end then instead.&lt;/p&gt;
&lt;p&gt;Once more a really excellent, excellent …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Neat! It makes me think of prefix sums and all the things you can use them for.&lt;/p&gt;
&lt;p&gt;I'm thinking for a selection algorithm the 'moving the result cells in their new position' can be forsaken, but I'm wondering how it would end then instead.&lt;/p&gt;
&lt;p&gt;Once more a really excellent, excellent post. Can't wait to see what avx512 could bring here.&lt;/p&gt;
&lt;p&gt;Pure curiosity (and to connect dots): Can you (please?) share your use case(s?) for wanting to sort lots of small arrays so fast?&lt;/p&gt;
&lt;p&gt;Thanks a lot.&lt;/p&gt;</content><category term="Uncategorized"></category></entry></feed>