Skip to content

Fix chi-square test#8826

Closed
kroeckx wants to merge 1 commit intoopenssl:masterfrom
kroeckx:bnrand_chisq
Closed

Fix chi-square test#8826
kroeckx wants to merge 1 commit intoopenssl:masterfrom
kroeckx:bnrand_chisq

Conversation

@kroeckx
Copy link
Member

@kroeckx kroeckx commented Apr 25, 2019

No description provided.

@kroeckx
Copy link
Member Author

kroeckx commented Apr 25, 2019

Now that this test uses the correct values, each subtest fails 5% of the runs, as expected.

I ran 1000 tests, and had:

  • test 1 failed 54 times.
  • test 2 failed 64 times.
  • test 3 failed 49 times.
  • One or more tests failed 157 times.

@paulidale
Copy link
Contributor

What chance of getting a false positive is comfortable?
The critical value can be adjusted to suite.

@kroeckx
Copy link
Member Author

kroeckx commented Apr 25, 2019

I think I can live with a 99.99% success rate. But at what point is the test still useful?

@paulidale
Copy link
Contributor

Catching egregious failures of uniformity is about the best we can hope for.

For normal builds 8 run the tests, extended has 17 doing so. Probabilities of 0.9976 and 0.9949 of the three tests passing for all builds with alpha = 0.9999 (under the uniformity hypothesis).

@paulidale
Copy link
Contributor

More useful might be printing out the critical value which the test actually fails at and producing successively a warning and then an error.

@kroeckx
Copy link
Member Author

kroeckx commented Apr 25, 2019 via email

@paulidale
Copy link
Contributor

TestU01, dieharder, NIST's SP 800-90B are testing for something different. They are attempting to test that each bit of the data is IID. I'm attempting to test that values generated over a range are uniform. e.g. a range 0-2 won't have evenly distributed bits (four zero bits, two one bits over all output values). Likewise a range 0-4 won't have equally probable bits (the '4' bit is only set once, the others twice each).

Repeating the test and checking the failure rate has the same issue -- it's a binomial distribution instead of χ2 -- a critical value still needs to be chosen.

@paulidale
Copy link
Contributor

Running multiple tests and checking for the 5% failure rate works. I've thrown something together in #8830 which does this. The underlying assumption is of independence -- both of the samplings within a test and between tests. Because the data are sourced from a DRBG, this assumption could be suspect, however a CSRNG should be designed to minimise and dependence.

Each of the tests passes 95% of the time which gives a binomial distribution for which a 99.99% critical value can be calculated. This means less than 0.1% of normal test runs will false positive and about 0.2% of extended test runs will. These numbers seem livable and we can move the critical value either way pretty easily.

Great idea :)

@richsalz
Copy link
Contributor

And how will this be explained to users, who are pretty trained to see a "test failure" as a blocker?

@kroeckx
Copy link
Member Author

kroeckx commented May 21, 2019

Replaced by #8830

@kroeckx kroeckx closed this May 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants