Make the testsuite more paranoid about the results#975
Make the testsuite more paranoid about the results#975dra27 wants to merge 4 commits intoocaml:trunkfrom
Conversation
dra27
left a comment
There was a problem hiding this comment.
This is currently going through precheck as well.
| FIND=find | ||
| include ../config/Makefile | ||
| # Borrowed from ../Makefile.tools | ||
| BYTECODE_ONLY=[ "$(ARCH)" = "none" -o "$(ASM)" = "none" ] |
There was a problem hiding this comment.
Seemed safer to copy this small definition rather than risk including Makefile.tools here
| $(MAKE) RAN_ALL=1 report ; \ | ||
| else \ | ||
| $(MAKE) report ; \ | ||
| fi |
There was a problem hiding this comment.
I quickly tested this on Ubuntu with GNU parallel and appear not to have broken it...
| $(if $(filter false,$(FLAMBDA)),\|!flambda) \ | ||
| $(if $(shell which csc.exe 2>/dev/null),,\|!csharp) \ | ||
| \|!system-[^,}]*\) \ | ||
| [,}] |
There was a problem hiding this comment.
@damiendoligez is intended to appreciate my Herculean efforts to keep the lines below 80 chars.........
|
I haven't reviewed the Makefile virtuoso work, but I think the idea is good in principle. On the other hand, I'm a bit worried by the idea of having a single synchronization point in the form of a counter shared by all tests. This means that anyone adding a new test (we frequently ask external contributors to do so) should know about this file, and also that it will become a very frequent point of version-control conflicts. Would it be possible to distribute the counting of new tests, for example by asking each sub-directory in Another thing that I would like to have is a very clear error message when this test goes wrong, in the use-case again that an external contributor unfamiliar with the testsuite would forgot to do the right thing and notice the failure. In particular, it would be good if the error message actually explained how to fix the issue. |
|
Temporarily ignoring the carnage I've unleashed on Travis, AppVeyor and the precheck CI! I thought about the single synchronisation point too - my thinking was that it encourages rebase prior to merge (which I personally approve of anyway, even if a merge commit is created anyway) but also that our merge rate is not so fast that this would take longer than a quick fixing commit in the unlikely event of a race condition. So I thought it better to wait and see if this becomes a regular problem and then do as you suggest. What do you think? It will be rare that testsuite/tests/known-skips needs changing for a set of changes not being written by an "advanced" developer. For the much more frequently updated testsuite/tests/total-tests there are messages on L215 and L219 for the two cases where the number differs - or do you think I should make them more explicit? |
|
I foresee that it will be an actual problem for different reasons:
If you feel that the decentralized solution I sketched adds massive technical difficulties, then it's a reason not to do it. But if it's "not too hard" I would be interested in going this route right away. I agree that known-skips are fine as a centralized file. I hadn't looked at the current error message, sorry, was just thinking out loud. The error message is actually fine (I would mention which file gives the predicted number in both the |
|
That's a compelling reason for having individual count files - I'll change that! I'd intentionally made the -gt message unclear because it's very unlikely that the correct fix is to lower the count in total-tests if fewer tests are being considered than should be. On the other hand, individual counts will allow a clearer error message to help pinpoint which Makefile is probably responsible for the error. Usually it will be that some skipped tests aren't displaying "=> skipped" (there was at least one fix for this in 4.04) |
ecdc338 to
6bb5c3e
Compare
Total number of tests which should be considered is recorded in testsuite/tests/total-tests. make report fails if the total number of tests considered does not match the total number.
6bb5c3e to
3aa1de7
Compare
|
OK, so splitting the total-tests out into individual count files revealed that two tests weren't being analysed properly (non-critically - failures would still have been detected)! I used a very quick awk script to generate those count files - I might "polish" (if one can polish an awk script) it to be used to verify count files against _log for debugging as the current set-up doesn't make it immediately obvious where the incorrect count file is (that said, it's fairly easy to go through the commit history on trunk and see what's changed since it was last working). Hopefully, having rebased, Travis will come back with success and AppVeyor with a known failure. I've obviously written something that's not 101% POSIX so of course OpenBSD is failing....... |
3aa1de7 to
8bf4f24
Compare
testsuite/tests/known-skips records the names of tests (or test directories) which are known to be skipped on given platforms or configurations and make all (or make parallel) fails if either additional tests are skipped or ones which are expected to be skipped are not.
Two tests with custom Makefiles weren't "conforming" to summarize.awk's expectations of test output resulting in two tests not being reported in bookkeeping totals.
testsuite/tests/total-tests split into individual count files in each directory.
8bf4f24 to
2b72a1e
Compare
|
OK, switched from grep to egrep to keep the purists happy! I'm leaving force pushing this branch back to precheck until the Windows slaves have been fixed again, but I have changes here for openbsd-32 and zsystems-64 which should "just work" - so hopefully, when this branch is goes through precheck again it will come back with an expected fail for msvc64 only. |
|
Gabriel Scherer (2016/12/16 13:07 -0800):
Would it be possible to distribute the counting of new tests, for
example by asking each sub-directory in `tests` to have a `count` file
with a number, and then sum all numbers and compare that to the number
of testcases run?
Rather than numbers, I suggest that each subdirectory contains a file
with the names of the tests. Well that's the approach currently used to
find ocamltest tests: each migrated subdirectory contains an ocamltests
(mark the final s) file containing the names of the test files to pass
to ocamltest.
|
|
Nearly three years and a full conversion to ocamltest later, is this PR still relevant? |
|
Xavier Leroy (2019/10/15 07:53 -0700):
Nearly three years and a full conversion to ocamltest later, is this
PR still relevant?
Well the code is probably not usable any longer in its current state,
but the idea of being able to say whether it's normal or not that a test
is skipped on a given platform looks very interesting to me. I'll try to
see whether/how this can be achieved in our currnet set-up.
|
* Add XenServer (Cloud Software Group) job XenServer (formerly part of Citrix) has an open position for work on its Toolstack, which is written in OCaml. Citrix is now part of Cloud Software Group, but XenServer is no longer part of Citrix: it is part of Cloud Software Group directly, so use Cloud Software Group as the company name. For now this links to the logo on cloud.com while we are still sorting out a new XenServer website. Signed-off-by: Edwin Török <edwin.torok@cloud.com> * industrial users: XenServer: Citrix -> XenServer See https://www.cloud.com: * XenServer used to be part of Citrix (originally called XenServer, then rebranded as Citrix Hypervisor) * Citrix is now part of Cloud Software Group * XenServer is no longer part of Citrix (except perhaps as a legal entity), and there is no OCaml use in Citrix itself as far as I'm aware * XenServer is part of Cloud Software Group directly Also update the logo (we are still sorting out our website, for now this is the logo we currently use on Twitter). Signed-off-by: Edwin Török <edwin.torok@cloud.com> * fixup! Add XenServer (Cloud Software Group) job --------- Signed-off-by: Edwin Török <edwin.torok@cloud.com>
This GPR introduces two changes to the operation of the full testsuite to attempt to catch more misconfiguration errors. Both of these new tests only apply if no tests have failed.
The first commit adds a cookie testsuite/tests/total-tests which should contain the total number of tests in the testsuite. Since 4.03.0, all tests are supposed to be considered on all platforms and either pass or be formally skipped.
The second adds a file testsuite/tests/known-skips which contains tagged lines indicating which tests are expected to be skipped on each platform and configuration. The idea here is to detect CI-misconfiguration or unexpected environment changes (e.g. Microsoft C sharp compiler disappearing).
The analysis of skipped tests adds quite a few lines to the Makefile, but I anticipate this being completely replaced with an OCaml script once the new ocamltest is driving the entire testsuite, so I hope it can be considered a temporary addition.
At the moment, this will cause the testsuite for msvc64 without flambda to fail because I don't wish to add special handling for this case when #974 will eliminate the need for it.