<random>: Improve binomial_distribution accuracy#1531
<random>: Improve binomial_distribution accuracy#1531StephanTLavavej merged 13 commits intomicrosoft:mainfrom
Conversation
- For small mean, replaces Poisson approximation with exact waiting time method based on cumulative sum of exponential variates. - Uses cheap but rigorous approximation of exit condition for fast return on first iteration.
|
BTW, for reference, the underlying algorithm is from "Non-Uniform Random Variate Generation", Luc Devroye, section X.4, "Second waiting time method". |
I think I'm going to restore the original version anyway, because |
StephanTLavavej
left a comment
There was a problem hiding this comment.
Looks good to me - the logic here is significantly beyond my experience so I really appreciate the PR. 😻 I will push changes for 2 trivial things I noticed.
statementreply
left a comment
There was a problem hiding this comment.
The proposed change doesn't correctly handle the case that n * (1-p) < 1.
When p > 0.5, except for the small n case, the original algorithm generates B(n, p) as n - B(n, 1-p). See binomial_distribution::param_type::_Init():
_Pp = _Px < 0.5 ? _Px : (1.0 - _Px);
_Mean = _Tx * _Pp;and the return below:
return _Par0._Px == _Par0._Pp ? _Res : static_cast<_Ty>(_Par0._Tx - _Res);Test case:
#include <cmath>
#include <iostream>
#include <random>
#include <vector>
using namespace std;
int main() {
constexpr int it_max = 10'000'000;
constexpr int n = 25;
constexpr double mean = n - 0.99;
constexpr double p = mean / n;
constexpr double var = n * p * (1.0 - p);
mt19937 gen;
binomial_distribution<> dist(n, p);
vector<int> counts(n + 1);
for (int i = 0; i < it_max; ++i) {
++counts[static_cast<size_t>(dist(gen))];
}
double sample_mean = 0.0;
for (size_t i = 1; i < counts.size(); ++i) {
sample_mean += static_cast<double>(i) * counts[i];
}
sample_mean /= it_max;
cout << "Expected mean: " << mean << endl
<< "Observed mean: " << sample_mean << endl;
double sample_var = 0.0;
for (size_t i = 0; i < counts.size(); ++i) {
sample_var += counts[i] * pow(i - sample_mean, 2);
}
sample_var /= it_max - 1;
cout << "Expected variance: " << var << endl
<< "Observed variance: " << sample_var << endl;
return 0;
}Pre-PR output:
Expected mean: 24.01
Observed mean: 24.0104
Expected variance: 0.950796
Observed variance: 0.988557
Post-PR output:
Expected mean: 24.01
Observed mean: 0.990254
Expected variance: 0.950796
Observed variance: 0.950525
(Sorry I missed this in my previous review)
|
@StephanTLavavej I've pushed a bugfix based on @statementreply's feedback after your approval. |
CaseyCarter
left a comment
There was a problem hiding this comment.
FYI: I've pushed a merge with main, and updated the list of libcxx test skips to indicate that my guess that #1530 caused std/numerics/rand/rand.dis/rand.dist.bern/rand.dist.bern.bin/eval.PR44847.pass.cpp to fail was wrong.
|
Thanks for noticing and fixing this accuracy bug! 🎯 🎯 🎯 |
Fixes #1530.