Add warnings about too few or too many samples#210
Conversation
pyperf/_bench.py
Outdated
|
|
||
| https://en.wikipedia.org/wiki/Sample_size_determination#Estimation_of_a_mean | ||
| """ | ||
| # Get the means of the values per run |
There was a problem hiding this comment.
Why not computing the mean only once, for all values of all runs?
There was a problem hiding this comment.
Because for some benchmarks, cache effects are visible within the same process. For example, pylint takes about 30% longer during the first iteration than the subsequent 2 iterations. One could argue that's a bad benchmark, but it's common enough that we should control for it. There's some more discussion here: faster-cpython/bench_runner#318 (comment)
That said, it's definitely worth putting a comment about that here.
pyperf/_cli.py
Outdated
| lines.append( | ||
| "Consider passing processes=%d to the Runner constructor to save time." % | ||
| required_nsamples | ||
| ) |
There was a problem hiding this comment.
This warning may be a little bit annoying. Maybe only show it in the "pyperf check" command? https://pyperf.readthedocs.io/en/latest/cli.html#check-cmd
There was a problem hiding this comment.
Yeah, that's a good idea. We can run check in our own infra, which is good enough for me.
pyperf/_bench.py
Outdated
| raise ValueError("MAD must be >= 0") | ||
| return value | ||
|
|
||
| def required_nsamples(self): |
There was a problem hiding this comment.
If you want to add a public function, please document it at: https://pyperf.readthedocs.io/en/latest/api.html#benchmark-class
There was a problem hiding this comment.
Good point. I think we do want it to be public (for the same reason the other statistics methods are public).
pyperf/tests/test_perf_cli.py
Outdated
|
|
||
| def test_check_stable(self): | ||
| stdout = self.run_command('check', TELCO) | ||
| self.assertTrue( |
There was a problem hiding this comment.
I suggest using assertIn() instead.
Co-authored-by: Victor Stinner <vstinner@python.org>
vstinner
left a comment
There was a problem hiding this comment.
The code change LGTM. I didn't check the maths behind required_nprocesses().
Related to python/pyperformance#372. Once this is merged, we can reduce the number of iterations in pyperformance.