Dear zlib-ng community,
I'm building zlib-ng version 2.0.6 with the nvc 23.9-0 compiler on a 64-bit Linux system and a Meteor Lake CPU. This CPU supports the avx_vnni instruction set. The build is performed as part of creating a Docker image. Containers based on that image run fine on the system on which the image was created. However, when I pull the image on another system with an older CPU of Cascade Lake-X architecture (this CPU supports avx512_vnni but not avx_vnni) and try to run a container there, it fails with Illegal instruction (core dumped).
The backtrace looks like this:
(gdb) bt
#0 adler32_avx2 () at /.../zlib-ng/arch/x86/adler32_avx.c:78
#1 0x00007ffff1cf9636 in adler32_stub () at /.../zlib-ng/functable.c:238
#2 0x00007ffff1cfe4b3 in inflate () at /.../zlib-ng/inflate.c:1054
(gdb) display/i $pc
=> 0x7ffff1d06926 <adler32_avx2()+742>: {vex} vpdpwssd %ymm2,%ymm7,%ymm5
As can be seen from the debugger, the application crashes because it tries to execute instruction vpdpwssd from the avx_vnni instruction set on a system without support for it.
An excerpt from running objdump -d -S .../arch/x86/adler32_avx.c.o looks as follows:
2cc: 0f 1f 40 00 nopl 0x0(%rax)
__m256i vbuf = _mm256_loadu_si256((__m256i*)buf);
2d0: c5 fe 6f 36 vmovdqu (%rsi),%ymm6
buf += 32;
2d4: 48 83 c6 20 add $0x20,%rsi
2d8: c4 e2 4d 04 f9 vpmaddubsw %ymm1,%ymm6,%ymm7
2dd: c4 e2 4d 04 f3 vpmaddubsw %ymm3,%ymm6,%ymm6
2e2: c5 55 f2 c0 vpslld %xmm0,%ymm5,%ymm8
vs1 = _mm256_add_epi32(vsum1, vs1);
2e6: c4 e2 45 52 ea {vex} vpdpwssd %ymm2,%ymm7,%ymm5
vsum2 = _mm256_add_epi32(vsum2, vs2);
2eb: c4 e2 4d 52 e2 {vex} vpdpwssd %ymm2,%ymm6,%ymm4
vs2 = _mm256_add_epi32(vsum2, vs1_0);
2f0: c5 bd fe e4 vpaddd %ymm4,%ymm8,%ymm4
}
2f4: 48 39 ce cmp %rcx,%rsi
It seems like the compiler actually replaces intrinsic _mm256_add_epi32 by the instruction vpdpwssd. This is a problem because zlib-ng checks for the AVX2 instruction set at runtime but seems to have no influence about the real instructions the compiler might have been generated for the given intrinsics.
In tried the same with versions 2.0.7 and 2.2.5 with the same result.
Given this, I have the following questions:
- Am I doing something obviously wrong here?
- Is there any guarantee from the compiler side that written intrinsics are never optimized/replace using other (more modern) instructions?
- If there is no guarantee, the current approach seems broken to me. Would you agree?
- Is there any way to create a portable build of zlib-ng on a modern system without disabling AVX2 entirely?
Thank you very much for your time!
Dear zlib-ng community,
I'm building zlib-ng version 2.0.6 with the nvc 23.9-0 compiler on a 64-bit Linux system and a Meteor Lake CPU. This CPU supports the
avx_vnniinstruction set. The build is performed as part of creating a Docker image. Containers based on that image run fine on the system on which the image was created. However, when I pull the image on another system with an older CPU of Cascade Lake-X architecture (this CPU supportsavx512_vnnibut notavx_vnni) and try to run a container there, it fails withIllegal instruction (core dumped).The backtrace looks like this:
As can be seen from the debugger, the application crashes because it tries to execute instruction
vpdpwssdfrom theavx_vnniinstruction set on a system without support for it.An excerpt from running
objdump -d -S .../arch/x86/adler32_avx.c.olooks as follows:It seems like the compiler actually replaces intrinsic
_mm256_add_epi32by the instructionvpdpwssd. This is a problem because zlib-ng checks for the AVX2 instruction set at runtime but seems to have no influence about the real instructions the compiler might have been generated for the given intrinsics.In tried the same with versions 2.0.7 and 2.2.5 with the same result.
Given this, I have the following questions:
Thank you very much for your time!