Skip to content

Segfaults or wrong code execution on Intel Skylake / Kaby Lake CPUs with hyperthreading enabled #7452

@vicuna

Description

@vicuna

Original bug ID: 7452
Reporter: enguerrand
Assigned to: @mshinwell
Status: closed (set by @mshinwell on 2017-06-09T17:02:32Z)
Resolution: not a bug
Priority: normal
Severity: crash
Platform: Linux
OS: Debian
Version: 4.03.0
Target version: later
Category: back end (clambda to assembly)
Monitored by: @gasche @ygrek @yallop @alainfrisch

Bug description

While switching a 4.02.3 codebase to 4.03 recently, we stumbled upon some random crashes from the compiler, and more rarely, occurrences of bad assembly code being generated (which as failed to compile), or instruction being trapped at runtime while the compiler is running.

Those problems occurs on an OCaml source file generated using the Extprot library.

The problem doesn't seems to happen all the time.
Most of the time, the file will compile successfully, and if enough retries are given, the compiler will then crash, example of returns from dmesg after a few crashes:

[22241.838551] ocamlopt.opt[48175]: segfault at ffffffffffde7768 ip 000055f75e412e3c sp 00007ffc3ee31de0 error 7 in ocamlopt.opt[55f75e0b6000+613000]
[22985.879907] ocamlopt.opt[48221]: segfault at af8 ip 00005564455169bd sp 00007ffc9f36b130 error 4 in ocamlopt.opt[556445006000+613000]
[23936.341126] ocamlopt.opt[48306]: segfault at 5837 ip 00005641554a16c8 sp 00007ffe1278f8e0 error 4 in ocamlopt.opt[56415514a000+613000]
[25395.780978] ocamlopt.opt[48445]: segfault at ffffffffffde7608 ip 0000557e25ea5cf4 sp 00007ffc2eac79d0 error 5 in ocamlopt.opt[557e25b49000+613000]

Backtraces obtained for those crashes give us informations which doesn't seems to show always the same thing. Example backtraces can be found in the attached archive.

The compiler will more rarely generated an assembly file that as won't be able to compile:

/tmp/camlasmc92578.s: Assembler messages:
/tmp/camlasmc92578.s:1005308: Error: operand type mismatch for `add'

Where the line 1005308 is: add $2300, $5199

Or:

/tmp/camlasm601e1c.s: Assembler messages:
/tmp/camlasm601e1c.s:820172: Error: operand type mismatch for `or'

Where the line 820172 is: orq $139950828249720, %rax

We haven't noticed as of now any misbehaviour in a successfully compiled and running instance of this file, but the issue is still very new for us so we will be watching it closely.

Steps to reproduce

The problem doesn't seems to happen all the time, at least it doesn't crash at every build. We sometimes don't witness the crash before 30 minutes of retries.

Steps to reproduce:

OCaml 4.03 and 4.04 has been witnessed as triggering the problem.
Sample file is attached as the test case used to reproduce the problem: Extprot library must be installed in order to compile the file, since it was generated using Extprot. (we use the latest version from Opam)

Test case can be found in the attachment (test.ml)

To reproduce:
Just compile this file, preferably in a loop, with this command:

while ocamlfind opt -c -g -bin-annot -ccopt -g -ccopt -O2 -ccopt -Wextra -ccopt '-Wstrict-overflow=5' -thread -w +a-4-40..42-44-45-48-58 -w -27-32 -package extprot test.ml -o test.cmx; do echo "ok"; done

Additional information

  • If the crash doesn't occur for some time, after it occured again at least once, the probability of the compiler crashing seems to be increasing

  • Crash was witnessed running ocamlopt and ocamlopt.opt

File attachments

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions