Skip to content

NVC compiler support #1985

@PERFACCT-JS

Description

@PERFACCT-JS

Dear zlib-ng community,

I'm building zlib-ng version 2.0.6 with the nvc 23.9-0 compiler on a 64-bit Linux system and a Meteor Lake CPU. This CPU supports the avx_vnni instruction set. The build is performed as part of creating a Docker image. Containers based on that image run fine on the system on which the image was created. However, when I pull the image on another system with an older CPU of Cascade Lake-X architecture (this CPU supports avx512_vnni but not avx_vnni) and try to run a container there, it fails with Illegal instruction (core dumped).

The backtrace looks like this:

(gdb) bt
#0  adler32_avx2 () at /.../zlib-ng/arch/x86/adler32_avx.c:78
#1  0x00007ffff1cf9636 in adler32_stub () at /.../zlib-ng/functable.c:238
#2  0x00007ffff1cfe4b3 in inflate () at /.../zlib-ng/inflate.c:1054

(gdb) display/i $pc
=> 0x7ffff1d06926 <adler32_avx2()+742>:	{vex} vpdpwssd %ymm2,%ymm7,%ymm5

As can be seen from the debugger, the application crashes because it tries to execute instruction vpdpwssd from the avx_vnni instruction set on a system without support for it.

An excerpt from running objdump -d -S .../arch/x86/adler32_avx.c.o looks as follows:

 2cc:	0f 1f 40 00          	nopl   0x0(%rax)
           __m256i vbuf = _mm256_loadu_si256((__m256i*)buf);
 2d0:	c5 fe 6f 36          	vmovdqu (%rsi),%ymm6
           buf += 32;
 2d4:	48 83 c6 20          	add    $0x20,%rsi
 2d8:	c4 e2 4d 04 f9       	vpmaddubsw %ymm1,%ymm6,%ymm7
 2dd:	c4 e2 4d 04 f3       	vpmaddubsw %ymm3,%ymm6,%ymm6
 2e2:	c5 55 f2 c0          	vpslld %xmm0,%ymm5,%ymm8
           vs1 = _mm256_add_epi32(vsum1, vs1);
 2e6:	c4 e2 45 52 ea       	{vex} vpdpwssd %ymm2,%ymm7,%ymm5
           vsum2 = _mm256_add_epi32(vsum2, vs2);
 2eb:	c4 e2 4d 52 e2       	{vex} vpdpwssd %ymm2,%ymm6,%ymm4
           vs2   = _mm256_add_epi32(vsum2, vs1_0);
 2f0:	c5 bd fe e4          	vpaddd %ymm4,%ymm8,%ymm4
       }
 2f4:	48 39 ce             	cmp    %rcx,%rsi

It seems like the compiler actually replaces intrinsic _mm256_add_epi32 by the instruction vpdpwssd. This is a problem because zlib-ng checks for the AVX2 instruction set at runtime but seems to have no influence about the real instructions the compiler might have been generated for the given intrinsics.

In tried the same with versions 2.0.7 and 2.2.5 with the same result.

Given this, I have the following questions:

  1. Am I doing something obviously wrong here?
  2. Is there any guarantee from the compiler side that written intrinsics are never optimized/replace using other (more modern) instructions?
  3. If there is no guarantee, the current approach seems broken to me. Would you agree?
  4. Is there any way to create a portable build of zlib-ng on a modern system without disabling AVX2 entirely?

Thank you very much for your time!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions