Skip to content

"Illegal instruction" when running simple array additions #10330

@ulido

Description

@ulido

Numpy crashes with an "illegal instruction" when running a simple array addition (past a certain array size) when running inside a VM under Xen. It is probably somehow related to #9534. I narrowed it down the commit 37740eb, which introduced AVX2 simd code:

Python 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> numpy.__version__
'1.12.0.dev0+37740eb'
>>> a = numpy.arange(15)
>>> a[1:] + a[:-1]
Illegal instruction (core dumped)

I also tried the current master, which shows the same error.

It is working fine up to the preceding commit eeb4e17:

Python 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> numpy.__version__
'1.12.0.dev0+eeb4e17'
>>> a = numpy.arange(15)
>>> a[1:] + a[:-1]
array([ 1,  3,  5,  7,  9, 11, 13, 15, 17, 19, 21, 23, 25, 27])`

Note that /proc/cpuinfo doesn't list AVX2:

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 63
model name	: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
stepping	: 2
microcode	: 0x31
cpu MHz		: 2494.282
cache size	: 30720 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes rdrand hypervisor lahf_lm abm fsgsbase tsc_adjust bmi1 smep bmi2 erms invpcid
bugs		:
bogomips	: 4988.56
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 63
model name	: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
stepping	: 2
microcode	: 0x31
cpu MHz		: 2494.282
cache size	: 30720 KB
physical id	: 0
siblings	: 4
core id		: 2
cpu cores	: 4
apicid		: 2
initial apicid	: 2
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes rdrand hypervisor lahf_lm abm fsgsbase tsc_adjust bmi1 smep bmi2 erms invpcid
bugs		:
bogomips	: 4988.56
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

processor	: 2
vendor_id	: GenuineIntel
cpu family	: 6
model		: 63
model name	: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
stepping	: 2
microcode	: 0x31
cpu MHz		: 2494.282
cache size	: 30720 KB
physical id	: 0
siblings	: 4
core id		: 4
cpu cores	: 4
apicid		: 4
initial apicid	: 4
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes rdrand hypervisor lahf_lm abm fsgsbase tsc_adjust bmi1 smep bmi2 erms invpcid
bugs		:
bogomips	: 4988.56
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

processor	: 3
vendor_id	: GenuineIntel
cpu family	: 6
model		: 63
model name	: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
stepping	: 2
microcode	: 0x31
cpu MHz		: 2494.282
cache size	: 30720 KB
physical id	: 0
siblings	: 4
core id		: 6
cpu cores	: 4
apicid		: 6
initial apicid	: 6
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes rdrand hypervisor lahf_lm abm fsgsbase tsc_adjust bmi1 smep bmi2 erms invpcid
bugs		:
bogomips	: 4988.56
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

I tried to understand how numpy decides when to use the avx2 simd codepaths but failed miserably.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions