Make the testsuite more paranoid about the results by dra27 · Pull Request #975 · ocaml/ocaml

dra27 · 2016-12-16T20:30:03Z

This GPR introduces two changes to the operation of the full testsuite to attempt to catch more misconfiguration errors. Both of these new tests only apply if no tests have failed.

The first commit adds a cookie testsuite/tests/total-tests which should contain the total number of tests in the testsuite. Since 4.03.0, all tests are supposed to be considered on all platforms and either pass or be formally skipped.

The second adds a file testsuite/tests/known-skips which contains tagged lines indicating which tests are expected to be skipped on each platform and configuration. The idea here is to detect CI-misconfiguration or unexpected environment changes (e.g. Microsoft C sharp compiler disappearing).

The analysis of skipped tests adds quite a few lines to the Makefile, but I anticipate this being completely replaced with an OCaml script once the new ocamltest is driving the entire testsuite, so I hope it can be considered a temporary addition.

At the moment, this will cause the testsuite for msvc64 without flambda to fail because I don't wish to add special handling for this case when #974 will eliminate the need for it.

dra27

This is currently going through precheck as well.

dra27 · 2016-12-16T20:30:32Z

testsuite/Makefile

 FIND=find
 include ../config/Makefile
+# Borrowed from ../Makefile.tools
+BYTECODE_ONLY=[ "$(ARCH)" = "none" -o "$(ASM)" = "none" ]


Seemed safer to copy this small definition rather than risk including Makefile.tools here

dra27 · 2016-12-16T20:30:57Z

testsuite/Makefile

+           $(MAKE) RAN_ALL=1 report ; \
+         else \
+           $(MAKE) report ; \
+         fi


I quickly tested this on Ubuntu with GNU parallel and appear not to have broken it...

dra27 · 2016-12-16T20:31:25Z

testsuite/Makefile

+  $(if $(filter false,$(FLAMBDA)),\|!flambda) \
+  $(if $(shell which csc.exe 2>/dev/null),,\|!csharp) \
+  \|!system-[^,}]*\) \
+  [,}]


@damiendoligez is intended to appreciate my Herculean efforts to keep the lines below 80 chars.........

gasche · 2016-12-16T21:07:00Z

I haven't reviewed the Makefile virtuoso work, but I think the idea is good in principle.

On the other hand, I'm a bit worried by the idea of having a single synchronization point in the form of a counter shared by all tests. This means that anyone adding a new test (we frequently ask external contributors to do so) should know about this file, and also that it will become a very frequent point of version-control conflicts.

Would it be possible to distribute the counting of new tests, for example by asking each sub-directory in tests to have a count file with a number, and then sum all numbers and compare that to the number of testcases run?

Another thing that I would like to have is a very clear error message when this test goes wrong, in the use-case again that an external contributor unfamiliar with the testsuite would forgot to do the right thing and notice the failure. In particular, it would be good if the error message actually explained how to fix the issue.

dra27 · 2016-12-16T21:18:56Z

Temporarily ignoring the carnage I've unleashed on Travis, AppVeyor and the precheck CI!

I thought about the single synchronisation point too - my thinking was that it encourages rebase prior to merge (which I personally approve of anyway, even if a merge commit is created anyway) but also that our merge rate is not so fast that this would take longer than a quick fixing commit in the unlikely event of a race condition. So I thought it better to wait and see if this becomes a regular problem and then do as you suggest. What do you think?

It will be rare that testsuite/tests/known-skips needs changing for a set of changes not being written by an "advanced" developer. For the much more frequently updated testsuite/tests/total-tests there are messages on L215 and L219 for the two cases where the number differs - or do you think I should make them more explicit?

gasche · 2016-12-16T21:25:25Z

I foresee that it will be an actual problem for different reasons:

silly conflicts like this require O(1) work per branch, but become tiresome when rebasing or merging two development branches together -- as you have noticed from the Changes stuff
my use-case of junior contributors is full of people that are not necessarily familiar with github workflows, and for which "please rebase this before I merge" can add an additional technical barrier to contribution that I would rather avoid (in particular, people are generally not familiar with interactive-rebase, so if they have more than one commit extra fixup commits are a source of pain)

If you feel that the decentralized solution I sketched adds massive technical difficulties, then it's a reason not to do it. But if it's "not too hard" I would be interested in going this route right away.

I agree that known-skips are fine as a centralized file. I hadn't looked at the current error message, sorry, was just thinking out loud. The error message is actually fine (I would mention which file gives the predicted number in both the -lt and -gt cases, but that's nitpicking).

dra27 · 2016-12-16T21:39:31Z

That's a compelling reason for having individual count files - I'll change that!

I'd intentionally made the -gt message unclear because it's very unlikely that the correct fix is to lower the count in total-tests if fewer tests are being considered than should be. On the other hand, individual counts will allow a clearer error message to help pinpoint which Makefile is probably responsible for the error. Usually it will be that some skipped tests aren't displaying "=> skipped" (there was at least one fix for this in 4.04)

Total number of tests which should be considered is recorded in testsuite/tests/total-tests. make report fails if the total number of tests considered does not match the total number.

dra27 · 2016-12-17T11:32:28Z

OK, so splitting the total-tests out into individual count files revealed that two tests weren't being analysed properly (non-critically - failures would still have been detected)!

I used a very quick awk script to generate those count files - I might "polish" (if one can polish an awk script) it to be used to verify count files against _log for debugging as the current set-up doesn't make it immediately obvious where the incorrect count file is (that said, it's fairly easy to go through the commit history on trunk and see what's changed since it was last working).

Hopefully, having rebased, Travis will come back with success and AppVeyor with a known failure. I've obviously written something that's not 101% POSIX so of course OpenBSD is failing.......

testsuite/tests/known-skips records the names of tests (or test directories) which are known to be skipped on given platforms or configurations and make all (or make parallel) fails if either additional tests are skipped or ones which are expected to be skipped are not.

Two tests with custom Makefiles weren't "conforming" to summarize.awk's expectations of test output resulting in two tests not being reported in bookkeeping totals.

testsuite/tests/total-tests split into individual count files in each directory.

dra27 · 2016-12-17T14:10:23Z

OK, switched from grep to egrep to keep the purists happy!

I'm leaving force pushing this branch back to precheck until the Windows slaves have been fixed again, but I have changes here for openbsd-32 and zsystems-64 which should "just work" - so hopefully, when this branch is goes through precheck again it will come back with an expected fail for msvc64 only.

shindere · 2016-12-20T15:57:23Z

Gabriel Scherer (2016/12/16 13:07 -0800):

Would it be possible to distribute the counting of new tests, for example by asking each sub-directory in `tests` to have a `count` file with a number, and then sum all numbers and compare that to the number of testcases run?

Rather than numbers, I suggest that each subdirectory contains a file with the names of the tests. Well that's the approach currently used to find ocamltest tests: each migrated subdirectory contains an ocamltests (mark the final s) file containing the names of the test files to pass to ocamltest.

xavierleroy · 2019-10-15T14:53:28Z

Nearly three years and a full conversion to ocamltest later, is this PR still relevant?

shindere · 2019-10-24T07:03:40Z

Xavier Leroy (2019/10/15 07:53 -0700):

Nearly three years and a full conversion to ocamltest later, is this PR still relevant?

Well the code is probably not usable any longer in its current state, but the idea of being able to say whether it's normal or not that a test is skipped on a given platform looks very interesting to me. I'll try to see whether/how this can be achieved in our currnet set-up.

…ml#1062) Revert "Make [Invalid] take a structured form for invalid terms (ocaml#975)" This reverts commit 469e56e.

* Add XenServer (Cloud Software Group) job XenServer (formerly part of Citrix) has an open position for work on its Toolstack, which is written in OCaml. Citrix is now part of Cloud Software Group, but XenServer is no longer part of Citrix: it is part of Cloud Software Group directly, so use Cloud Software Group as the company name. For now this links to the logo on cloud.com while we are still sorting out a new XenServer website. Signed-off-by: Edwin Török <edwin.torok@cloud.com> * industrial users: XenServer: Citrix -> XenServer See https://www.cloud.com: * XenServer used to be part of Citrix (originally called XenServer, then rebranded as Citrix Hypervisor) * Citrix is now part of Cloud Software Group * XenServer is no longer part of Citrix (except perhaps as a legal entity), and there is no OCaml use in Citrix itself as far as I'm aware * XenServer is part of Cloud Software Group directly Also update the logo (we are still sorting out our website, for now this is the logo we currently use on Twitter). Signed-off-by: Edwin Török <edwin.torok@cloud.com> * fixup! Add XenServer (Cloud Software Group) job --------- Signed-off-by: Edwin Török <edwin.torok@cloud.com>

dra27 commented Dec 16, 2016

View reviewed changes

dra27 added the no-change-entry-needed label Dec 16, 2016

dra27 force-pushed the paranoid-testsuite branch 2 times, most recently from ecdc338 to 6bb5c3e Compare December 17, 2016 09:27

Fail testsuite if not all tests considered

e74513e

Total number of tests which should be considered is recorded in testsuite/tests/total-tests. make report fails if the total number of tests considered does not match the total number.

dra27 force-pushed the paranoid-testsuite branch from 6bb5c3e to 3aa1de7 Compare December 17, 2016 11:27

dra27 force-pushed the paranoid-testsuite branch from 3aa1de7 to 8bf4f24 Compare December 17, 2016 11:58

dra27 added 3 commits December 17, 2016 14:04

Fix output of two tests

cb2e591

Two tests with custom Makefiles weren't "conforming" to summarize.awk's expectations of test output resulting in two tests not being reported in bookkeeping totals.

Split testsuite/tests/total-tests

2b72a1e

testsuite/tests/total-tests split into individual count files in each directory.

dra27 force-pushed the paranoid-testsuite branch from 8bf4f24 to 2b72a1e Compare December 17, 2016 14:05

dra27 mentioned this pull request Dec 19, 2016

Enable msvc64 asmcomp tests #974

Closed

damiendoligez added this to the 4.07-or-later milestone Sep 29, 2017

dra27 mentioned this pull request Oct 24, 2017

Improve error handling while running the testsuite #1443

Merged

damiendoligez removed this from the consider-for-4.07 milestone Jun 4, 2018

dra27 mentioned this pull request Jul 2, 2018

Testsuite reports: move failed tests and Summary to the end #1871

Merged

dra27 mentioned this pull request Apr 30, 2020

unboxed-primitive-args: test does not require unix #9517

Merged

dra27 closed this Nov 25, 2020

dra27 deleted the paranoid-testsuite branch July 6, 2021 14:48

stedolan pushed a commit to stedolan/ocaml that referenced this pull request Mar 21, 2023

Make [Invalid] take a structured form for invalid terms (ocaml#975)

469e56e

stedolan pushed a commit to stedolan/ocaml that referenced this pull request Mar 21, 2023

Revert "Make [Invalid] take a structured form for invalid terms" (oca…

4da84c4

…ml#1062) Revert "Make [Invalid] take a structured form for invalid terms (ocaml#975)" This reverts commit 469e56e.

Conversation

dra27 commented Dec 16, 2016

Uh oh!

dra27 left a comment

Choose a reason for hiding this comment

Uh oh!

dra27 Dec 16, 2016

Choose a reason for hiding this comment

Uh oh!

dra27 Dec 16, 2016

Choose a reason for hiding this comment

Uh oh!

dra27 Dec 16, 2016

Choose a reason for hiding this comment

Uh oh!

gasche commented Dec 16, 2016

Uh oh!

dra27 commented Dec 16, 2016

Uh oh!

gasche commented Dec 16, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dra27 commented Dec 16, 2016

Uh oh!

dra27 commented Dec 17, 2016

Uh oh!

dra27 commented Dec 17, 2016

Uh oh!

shindere commented Dec 20, 2016 via email

Uh oh!

xavierleroy commented Oct 15, 2019

Uh oh!

shindere commented Oct 24, 2019 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

gasche commented Dec 16, 2016 •

edited

Loading