Skip to content

Make the testsuite more paranoid about the results#975

Closed
dra27 wants to merge 4 commits intoocaml:trunkfrom
dra27:paranoid-testsuite
Closed

Make the testsuite more paranoid about the results#975
dra27 wants to merge 4 commits intoocaml:trunkfrom
dra27:paranoid-testsuite

Conversation

@dra27
Copy link
Copy Markdown
Member

@dra27 dra27 commented Dec 16, 2016

This GPR introduces two changes to the operation of the full testsuite to attempt to catch more misconfiguration errors. Both of these new tests only apply if no tests have failed.

The first commit adds a cookie testsuite/tests/total-tests which should contain the total number of tests in the testsuite. Since 4.03.0, all tests are supposed to be considered on all platforms and either pass or be formally skipped.

The second adds a file testsuite/tests/known-skips which contains tagged lines indicating which tests are expected to be skipped on each platform and configuration. The idea here is to detect CI-misconfiguration or unexpected environment changes (e.g. Microsoft C sharp compiler disappearing).

The analysis of skipped tests adds quite a few lines to the Makefile, but I anticipate this being completely replaced with an OCaml script once the new ocamltest is driving the entire testsuite, so I hope it can be considered a temporary addition.

At the moment, this will cause the testsuite for msvc64 without flambda to fail because I don't wish to add special handling for this case when #974 will eliminate the need for it.

Copy link
Copy Markdown
Member Author

@dra27 dra27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently going through precheck as well.

FIND=find
include ../config/Makefile
# Borrowed from ../Makefile.tools
BYTECODE_ONLY=[ "$(ARCH)" = "none" -o "$(ASM)" = "none" ]
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seemed safer to copy this small definition rather than risk including Makefile.tools here

$(MAKE) RAN_ALL=1 report ; \
else \
$(MAKE) report ; \
fi
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I quickly tested this on Ubuntu with GNU parallel and appear not to have broken it...

$(if $(filter false,$(FLAMBDA)),\|!flambda) \
$(if $(shell which csc.exe 2>/dev/null),,\|!csharp) \
\|!system-[^,}]*\) \
[,}]
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@damiendoligez is intended to appreciate my Herculean efforts to keep the lines below 80 chars.........

@gasche
Copy link
Copy Markdown
Member

gasche commented Dec 16, 2016

I haven't reviewed the Makefile virtuoso work, but I think the idea is good in principle.

On the other hand, I'm a bit worried by the idea of having a single synchronization point in the form of a counter shared by all tests. This means that anyone adding a new test (we frequently ask external contributors to do so) should know about this file, and also that it will become a very frequent point of version-control conflicts.

Would it be possible to distribute the counting of new tests, for example by asking each sub-directory in tests to have a count file with a number, and then sum all numbers and compare that to the number of testcases run?

Another thing that I would like to have is a very clear error message when this test goes wrong, in the use-case again that an external contributor unfamiliar with the testsuite would forgot to do the right thing and notice the failure. In particular, it would be good if the error message actually explained how to fix the issue.

@dra27
Copy link
Copy Markdown
Member Author

dra27 commented Dec 16, 2016

Temporarily ignoring the carnage I've unleashed on Travis, AppVeyor and the precheck CI!

I thought about the single synchronisation point too - my thinking was that it encourages rebase prior to merge (which I personally approve of anyway, even if a merge commit is created anyway) but also that our merge rate is not so fast that this would take longer than a quick fixing commit in the unlikely event of a race condition. So I thought it better to wait and see if this becomes a regular problem and then do as you suggest. What do you think?

It will be rare that testsuite/tests/known-skips needs changing for a set of changes not being written by an "advanced" developer. For the much more frequently updated testsuite/tests/total-tests there are messages on L215 and L219 for the two cases where the number differs - or do you think I should make them more explicit?

@gasche
Copy link
Copy Markdown
Member

gasche commented Dec 16, 2016

I foresee that it will be an actual problem for different reasons:

  • silly conflicts like this require O(1) work per branch, but become tiresome when rebasing or merging two development branches together -- as you have noticed from the Changes stuff
  • my use-case of junior contributors is full of people that are not necessarily familiar with github workflows, and for which "please rebase this before I merge" can add an additional technical barrier to contribution that I would rather avoid (in particular, people are generally not familiar with interactive-rebase, so if they have more than one commit extra fixup commits are a source of pain)

If you feel that the decentralized solution I sketched adds massive technical difficulties, then it's a reason not to do it. But if it's "not too hard" I would be interested in going this route right away.

I agree that known-skips are fine as a centralized file. I hadn't looked at the current error message, sorry, was just thinking out loud. The error message is actually fine (I would mention which file gives the predicted number in both the -lt and -gt cases, but that's nitpicking).

@dra27
Copy link
Copy Markdown
Member Author

dra27 commented Dec 16, 2016

That's a compelling reason for having individual count files - I'll change that!

I'd intentionally made the -gt message unclear because it's very unlikely that the correct fix is to lower the count in total-tests if fewer tests are being considered than should be. On the other hand, individual counts will allow a clearer error message to help pinpoint which Makefile is probably responsible for the error. Usually it will be that some skipped tests aren't displaying "=> skipped" (there was at least one fix for this in 4.04)

@dra27 dra27 force-pushed the paranoid-testsuite branch 2 times, most recently from ecdc338 to 6bb5c3e Compare December 17, 2016 09:27
Total number of tests which should be considered is recorded in
testsuite/tests/total-tests. make report fails if the total number of
tests considered does not match the total number.
@dra27 dra27 force-pushed the paranoid-testsuite branch from 6bb5c3e to 3aa1de7 Compare December 17, 2016 11:27
@dra27
Copy link
Copy Markdown
Member Author

dra27 commented Dec 17, 2016

OK, so splitting the total-tests out into individual count files revealed that two tests weren't being analysed properly (non-critically - failures would still have been detected)!

I used a very quick awk script to generate those count files - I might "polish" (if one can polish an awk script) it to be used to verify count files against _log for debugging as the current set-up doesn't make it immediately obvious where the incorrect count file is (that said, it's fairly easy to go through the commit history on trunk and see what's changed since it was last working).

Hopefully, having rebased, Travis will come back with success and AppVeyor with a known failure. I've obviously written something that's not 101% POSIX so of course OpenBSD is failing.......

@dra27 dra27 force-pushed the paranoid-testsuite branch from 3aa1de7 to 8bf4f24 Compare December 17, 2016 11:58
testsuite/tests/known-skips records the names of tests (or test
directories) which are known to be skipped on given platforms or
configurations and make all (or make parallel) fails if either
additional tests are skipped or ones which are expected to be skipped
are not.
Two tests with custom Makefiles weren't "conforming" to summarize.awk's
expectations of test output resulting in two tests not being reported
in bookkeeping totals.
testsuite/tests/total-tests split into individual count files in each
directory.
@dra27 dra27 force-pushed the paranoid-testsuite branch from 8bf4f24 to 2b72a1e Compare December 17, 2016 14:05
@dra27
Copy link
Copy Markdown
Member Author

dra27 commented Dec 17, 2016

OK, switched from grep to egrep to keep the purists happy!

I'm leaving force pushing this branch back to precheck until the Windows slaves have been fixed again, but I have changes here for openbsd-32 and zsystems-64 which should "just work" - so hopefully, when this branch is goes through precheck again it will come back with an expected fail for msvc64 only.

@shindere
Copy link
Copy Markdown
Contributor

shindere commented Dec 20, 2016 via email

@xavierleroy
Copy link
Copy Markdown
Contributor

Nearly three years and a full conversion to ocamltest later, is this PR still relevant?

@shindere
Copy link
Copy Markdown
Contributor

shindere commented Oct 24, 2019 via email

@dra27 dra27 closed this Nov 25, 2020
@dra27 dra27 deleted the paranoid-testsuite branch July 6, 2021 14:48
stedolan pushed a commit to stedolan/ocaml that referenced this pull request Mar 21, 2023
stedolan pushed a commit to stedolan/ocaml that referenced this pull request Mar 21, 2023
…ml#1062)

Revert "Make [Invalid] take a structured form for invalid terms (ocaml#975)"

This reverts commit 469e56e.
EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024
* Add XenServer (Cloud Software Group) job

XenServer (formerly part of Citrix) has an open position for work on its
Toolstack, which is written in OCaml.
Citrix is now part of Cloud Software Group,
but XenServer is no longer part of Citrix: it is part of Cloud Software
Group directly, so use Cloud Software Group as the company name.

For now this links to the logo on cloud.com while we are still sorting
out a new XenServer website.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>

* industrial users: XenServer: Citrix -> XenServer

See https://www.cloud.com:
* XenServer used to be part of Citrix (originally called XenServer, then
  rebranded as Citrix Hypervisor)
* Citrix is now part of Cloud Software Group
* XenServer is no longer part of Citrix (except perhaps as a legal
  entity), and there is no OCaml use in Citrix itself as far as I'm aware
* XenServer is part of Cloud Software Group directly

Also update the logo (we are still sorting out our website, for now this is
the logo we currently use on Twitter).

Signed-off-by: Edwin Török <edwin.torok@cloud.com>

* fixup! Add XenServer (Cloud Software Group) job

---------

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants