Conversation
|
Now that this test uses the correct values, each subtest fails 5% of the runs, as expected. I ran 1000 tests, and had:
|
|
What chance of getting a false positive is comfortable? |
|
I think I can live with a 99.99% success rate. But at what point is the test still useful? |
|
Catching egregious failures of uniformity is about the best we can hope for. For normal builds 8 run the tests, extended has 17 doing so. Probabilities of 0.9976 and 0.9949 of the three tests passing for all builds with alpha = 0.9999 (under the uniformity hypothesis). |
|
More useful might be printing out the critical value which the test actually fails at and producing successively a warning and then an error. |
|
You can also do something like test it 100 times, and then check
that the expected failure rate it 5%.
I wonder what things like dieharder do.
|
|
TestU01, dieharder, NIST's SP 800-90B are testing for something different. They are attempting to test that each bit of the data is IID. I'm attempting to test that values generated over a range are uniform. e.g. a range 0-2 won't have evenly distributed bits (four zero bits, two one bits over all output values). Likewise a range 0-4 won't have equally probable bits (the '4' bit is only set once, the others twice each). Repeating the test and checking the failure rate has the same issue -- it's a binomial distribution instead of χ2 -- a critical value still needs to be chosen. |
|
Running multiple tests and checking for the 5% failure rate works. I've thrown something together in #8830 which does this. The underlying assumption is of independence -- both of the samplings within a test and between tests. Because the data are sourced from a DRBG, this assumption could be suspect, however a CSRNG should be designed to minimise and dependence. Each of the tests passes 95% of the time which gives a binomial distribution for which a 99.99% critical value can be calculated. This means less than 0.1% of normal test runs will false positive and about 0.2% of extended test runs will. These numbers seem livable and we can move the critical value either way pretty easily. Great idea :) |
|
And how will this be explained to users, who are pretty trained to see a "test failure" as a blocker? |
|
Replaced by #8830 |
No description provided.