Skip to content

Enable native code on aarch64-*-openbsd*#11092

Merged
avsm merged 2 commits intoocaml:trunkfrom
madroach:OpenBSD_aarch64
Mar 21, 2022
Merged

Enable native code on aarch64-*-openbsd*#11092
avsm merged 2 commits intoocaml:trunkfrom
madroach:OpenBSD_aarch64

Conversation

@madroach
Copy link
Copy Markdown
Contributor

@madroach madroach commented Mar 6, 2022

This follows up on #10828 (comment)

OCaml builds fine on OpenBSD/aarch64, the testsuite is currently running.

@kit-ty-kate
Copy link
Copy Markdown
Member

I tried the same thing on FreeBSD/aarch64 and NetBSD/aarch64 but a few items in the testsuite fail.

In FreeBSD:

List of unexpected errors:
    tests/parallel/'domain_serial_spawn_burn.ml' with 1.1 (bytecode)
    tests/parallel/'domain_serial_spawn_burn.ml' with 1.2 (native)

In NetBSD:

List of unexpected errors:
    tests/lib-systhreads/'testyield.ml' with 1.1.2 (native)
    tests/memory-model/'forbidden.ml' with 1 (native)
    tests/runtime-errors/'stackoverflow.ml' with 1.1.1 (run)
    tests/parallel/'domain_serial_spawn_burn.ml' with 1.1 (bytecode)
    tests/parallel/'domain_parallel_spawn_burn.ml' with 1.2 (native)
    tests/parallel/'domain_serial_spawn_burn.ml' with 1.2 (native)
    tests/memory-model/'forbidden.ml' with 2 (bytecode)

(both tested on RaspberryPi4)

Did you not get any errors in the testsuite with OpenBSD?

@madroach
Copy link
Copy Markdown
Contributor Author

madroach commented Mar 6, 2022

OpenBSD -current as of February 1st with OCaml -trunk as of today.

List of failed tests:
    tests/effects/'evenodd.ml' with 1 (native)
    tests/lib-threads/'sockets.ml' with 1.1.2 (native)
    tests/lib-threads/'pr4466.ml' with 1.1 (native)
    tests/lib-threads/'sockets.ml' with 1.1.1 (bytecode)
    tests/effects/'evenodd.ml' with 2 (bytecode)
    tests/lib-threads/'pr5325.ml' with 1.2 (native)
    tests/lib-threads/'pr5325.ml' with 1.1 (bytecode)

List of unexpected errors:
    tests/memory-model/'publish.ml' with 1 (native)
    tests/memory-model/'forbidden.ml' with 1 (native)
    tests/memory-model/'publish.ml' with 2 (bytecode)
    tests/parallel/'mctest.ml' with 1.2 (native)
    tests/parallel/'pingpong.ml' with 1.2 (native)
    tests/parallel/'pingpong.ml' with 1.1 (bytecode)
    tests/weak-ephe-final/'weaklifetime_par.ml' with 1 (native)
    tests/parallel/'mctest.ml' with 1.1 (bytecode)
    tests/weak-ephe-final/'weaklifetime_par.ml' with 2 (bytecode)
    tests/memory-model/'forbidden.ml' with 2 (bytecode)

evenodd and forbidden I encountered on amd64, too. They disappeared after raising ulimit -d to > 2GB and TIMEOUT=3600. My RockPro64 is still testing with increased limits.

Current state: Wall clock: tests/parallel/mctest.ml took 3101.31s

@madroach
Copy link
Copy Markdown
Contributor Author

madroach commented Mar 6, 2022

List of failed tests:
    tests/lib-threads/'sockets.ml' with 1.1.2 (native)
    tests/lib-threads/'pr4466.ml' with 1.1 (native)
    tests/lib-threads/'sockets.ml' with 1.1.1 (bytecode)
    tests/lib-threads/'pr5325.ml' with 1.2 (native)
    tests/lib-threads/'pr5325.ml' with 1.1 (bytecode)

Summary:
  3144 tests passed
   70 tests skipped
    5 tests failed
  181 tests not started (parent test skipped or failed)
    0 unexpected errors
  3400 tests considered

@madroach
Copy link
Copy Markdown
Contributor Author

madroach commented Mar 6, 2022

Those failing tests sockets.ml, pr4466.ml and pr5325.ml failed due to strict packetfilter rules on my loopback interface. They now pass after relaxing pf rules.

@madroach
Copy link
Copy Markdown
Contributor Author

madroach commented Mar 7, 2022

No failing tests when running with ulimit -d = 3145728 and env OCAML_TEST_SIZE=1 gmake SHOW_TIMINGS=true TIMEOUT=3600 all:

Summary:
  3150 tests passed
   70 tests skipped
    0 tests failed
  181 tests not started (parent test skipped or failed)
    0 unexpected errors
  3401 tests considered

6182.41s user 1235.63s system 71% cpu 2:52:10.55 total
This is on OpenBSD -current of March, 6th.

The slowest tests by a huge margin where:

    tests/parallel/domain_parallel_spawn_burn.ml: 919.11
    tests/parallel/domain_serial_spawn_burn.ml: 5266.94

@madroach madroach marked this pull request as ready for review March 7, 2022 15:57
@kit-ty-kate
Copy link
Copy Markdown
Member

The slowest tests by a huge margin where:

    tests/parallel/domain_parallel_spawn_burn.ml: 919.11
    tests/parallel/domain_serial_spawn_burn.ml: 5266.94

Yeah I noticed that as well on FreeBSD. These are the ones that timeout with the default value for TIMEOUT.

I was also able to get no failures at all on FreeBSD with TIMEOUT=3600, however I tried again a few times under the same conditions and I got the following timeout in one occasion:

List of unexpected errors:
    tests/weak-ephe-final/'weaklifetime_par.ml' with 2 (bytecode)

Summary:
  3163 tests passed
   68 tests skipped
    0 tests failed
  169 tests not started (parent test skipped or failed)
    1 unexpected errors
  3401 tests considered
#### Something failed. Exiting with error status.
$ ulimit -a
-t: cpu time (seconds)              unlimited
-f: file size (blocks)              unlimited
-d: data seg size (kbytes)          1048576
-s: stack size (kbytes)             1048576
-c: core file size (blocks)         unlimited
-m: resident set size (kbytes)      unlimited
-l: locked-in-memory size (kbytes)  unlimited
-u: processes                       8384
-n: file descriptors                113337
-b: socket buffer size (bytes)      unlimited
-v: virtual memory size (kbytes)    unlimited
-p: pseudo-terminals                unlimited
-w: swap size (kbytes)              unlimited
-k: kqueues                         unlimited
-o: umtx shared locks               unlimited

Did you try several times? Were the results stable on OpenBSD?

@madroach
Copy link
Copy Markdown
Contributor Author

madroach commented Mar 7, 2022

No, I ran it only once. I'll run it a few more times.

@gasche
Copy link
Copy Markdown
Member

gasche commented Mar 7, 2022

In both cases I think the observed slowdowns could be explained by non-fairness of the system thread scheduler (I vaguely remember fairness issues causing failures in other systhreads tests in the past.)

The spawn_burn tests are doing a lot of computation and have a lot of contention on the domain synchronization mechanisms. They involve doing GCs in an infinite loop while some "work domains" try to get a fixed amount of work done. If the scheduler always give CPU time to the GC-ing domains, and never to the "work domain", the test could go on indefinitely. We could avoid this by fixing in advance the amount of iterations of the GC loops.

weaklifetime_par depends on domain-local gc statistics in a way that may also make it very sensitive to fairness between domains: its termination condition for each domain is that the domain has "seen" 5 major GCs, but in some conditions it may be that the number of major GCs is not counted accurately, and basically never increased from some domains perspective (if: they never start the major GCs themselves, and there isn't enough minor collections to make them "sample" the major GCs done by the other domain). We mentioned avenues for improving the test in #11008 (comment) . (The "bug" discussed in that issue is not the fairness issue per se, but fixing one would probably fix the other as well.) I'm planning to work on this in the coming weeks.

@madroach
Copy link
Copy Markdown
Contributor Author

madroach commented Mar 7, 2022

ok. Thanks for the your insights. Those issues don't seem to be specific for aarch64, but rather scheduler / OS specific. So those very slow tests are no blocker for this PR.

Copy link
Copy Markdown
Member

@kit-ty-kate kit-ty-kate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those issues don't seem to be specific for aarch64, but rather scheduler / OS specific. So those very slow tests are no blocker for this PR.

I agree.

  • i opened #11096 for FreeBSD as well as it is fine as well and behaves the same.
  • and opened #11097 for NetBSD too. It needed a bit more work to make it work properly but works fine with TIMEOUT=3900 in the same manner.

@avsm avsm merged commit 7130374 into ocaml:trunk Mar 21, 2022
shindere added a commit that referenced this pull request Mar 22, 2022
@shindere
Copy link
Copy Markdown
Contributor

Just pushed f225a9a to fix this PR's Changes entry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants