NVC compiler support

Dear *zlib-ng* community,

I'm building *zlib-ng* version *2.0.6* with the [nvc 23.9-0 compiler](https://docs.nvidia.com/hpc-sdk/archive/23.9/index.html) on a 64-bit Linux system and a *Meteor Lake* CPU. This CPU supports the `avx_vnni` instruction set. The build is performed as part of creating a Docker image. Containers based on that image run fine on the system on which the image was created. However, when I pull the image on another system with an older CPU of *Cascade Lake-X* architecture (this CPU supports `avx512_vnni` but not `avx_vnni`) and try to run a container there, it fails with `Illegal instruction (core dumped)`.

The backtrace looks like this:
```
(gdb) bt
#0  adler32_avx2 () at /.../zlib-ng/arch/x86/adler32_avx.c:78
#1  0x00007ffff1cf9636 in adler32_stub () at /.../zlib-ng/functable.c:238
#2  0x00007ffff1cfe4b3 in inflate () at /.../zlib-ng/inflate.c:1054

(gdb) display/i $pc
=> 0x7ffff1d06926 <adler32_avx2()+742>:	{vex} vpdpwssd %ymm2,%ymm7,%ymm5
```

As can be seen from the debugger, the application crashes because it tries to execute instruction `vpdpwssd` from the `avx_vnni` instruction set on a system without support for it.

An excerpt from running `objdump -d -S .../arch/x86/adler32_avx.c.o` looks as follows:
```
 2cc:	0f 1f 40 00          	nopl   0x0(%rax)
           __m256i vbuf = _mm256_loadu_si256((__m256i*)buf);
 2d0:	c5 fe 6f 36          	vmovdqu (%rsi),%ymm6
           buf += 32;
 2d4:	48 83 c6 20          	add    $0x20,%rsi
 2d8:	c4 e2 4d 04 f9       	vpmaddubsw %ymm1,%ymm6,%ymm7
 2dd:	c4 e2 4d 04 f3       	vpmaddubsw %ymm3,%ymm6,%ymm6
 2e2:	c5 55 f2 c0          	vpslld %xmm0,%ymm5,%ymm8
           vs1 = _mm256_add_epi32(vsum1, vs1);
 2e6:	c4 e2 45 52 ea       	{vex} vpdpwssd %ymm2,%ymm7,%ymm5
           vsum2 = _mm256_add_epi32(vsum2, vs2);
 2eb:	c4 e2 4d 52 e2       	{vex} vpdpwssd %ymm2,%ymm6,%ymm4
           vs2   = _mm256_add_epi32(vsum2, vs1_0);
 2f0:	c5 bd fe e4          	vpaddd %ymm4,%ymm8,%ymm4
       }
 2f4:	48 39 ce             	cmp    %rcx,%rsi
```

It seems like the compiler actually replaces intrinsic ` _mm256_add_epi32` by the instruction  `vpdpwssd`. This is a problem because *zlib-ng* checks for the AVX2 instruction set at runtime but seems to have no influence about the real instructions the compiler might have been generated for the given intrinsics.

In tried the same with versions *2.0.7* and *2.2.5* with the same result.

Given this, I have the following questions:
1. Am I doing something obviously wrong here?
2. Is there any guarantee from the compiler side that written intrinsics are never optimized/replace using other (more modern) instructions?
3. If there is no guarantee, the current approach seems broken to me. Would you agree?
4. Is there any way to create a portable build of *zlib-ng* on a modern system without disabling AVX2 entirely?

Thank you very much for your time!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NVC compiler support #1985

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

NVC compiler support #1985

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions