Skip to content

Improve codegen for mixing in length#112

Merged
tkaitchuck merged 1 commit intotkaitchuck:masterfrom
stepantubanov:opt-add-length-codegen
Feb 27, 2022
Merged

Improve codegen for mixing in length#112
tkaitchuck merged 1 commit intotkaitchuck:masterfrom
stepantubanov:opt-add-length-codegen

Conversation

@stepantubanov
Copy link
Copy Markdown
Contributor

@stepantubanov stepantubanov commented Feb 25, 2022

Minor detail, but produces +17% speed improvement for small strings/byte slices (x86-64).

Measured on Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz (Skylake).

RUSTFLAGS="-C target-cpu=native" cargo bench aeshash/string

Before:
aeshash/string          time:   [69.371 ns 69.966 ns 70.507 ns]

After:
aeshash/string          time:   [56.352 ns 56.830 ns 57.309 ns]                           
                        change: [-18.211% -17.441% -16.670%] (p = 0.00 < 0.05)

Before (relevant part):

   	mov	  rsi, qword ptr [rbx + 16]	# length
   	vmovq	  rax, xmm1			# enc[0]
   	add	  rax, rsi
   	vmovq	  xmm3, rax
   	vpblendd  xmm11, xmm3, xmm1, 12

In order to add the length it extracts enc[0] into general-purpose register, adds there and then moves it back to vector register (and uses blend).

After:

   	vmovq	xmm3, qword ptr [rbx + 16]	# length
   	vpaddq	xmm11, xmm3, xmm1		# enc[0] += length

LLVM able to recognize there is no need to go through GRP and blend.

@tkaitchuck tkaitchuck self-requested a review February 26, 2022 06:09
Comment thread src/operations.rs Outdated

#[inline(always)]
pub(crate) fn add_in_length(enc: &mut u128, len: u64) {
#[cfg(all(target_feature = "sse2", not(miri)))]
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refers to sse2 and below it refers to sss3, but it looks like these were both meant to be the same so as to provide two alternative paths.

Comment thread src/operations.rs Outdated
use core::arch::x86_64::*;

unsafe {
let enc = std::ptr::addr_of_mut!(*enc);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't assume std here.

Comment thread src/operations.rs

unsafe {
let enc = std::ptr::addr_of_mut!(*enc);
let len = _mm_cvtsi64_si128(len as i64);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this 64 bit only? It is giving a compile error on i686.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, 64-bit. I've added target_arch to the cfg on this branch.

@stepantubanov
Copy link
Copy Markdown
Contributor Author

@tkaitchuck Thanks for the review, I've just pushed updated diff with the fixes.

@tkaitchuck tkaitchuck merged commit fbd7485 into tkaitchuck:master Feb 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants