DOC: Add Karl Pearson's reference to chi-square test#13971
DOC: Add Karl Pearson's reference to chi-square test#13971tupui merged 7 commits intoscipy:masterfrom
Conversation
Pearson's original paper was important and I think it's an improvement to list it among chi-square test references.
The references section contains papers/articles which the author has referred to, to either write the code or the documentation. In this case, it seems like the author didn't refer to the paper you have mentioned to do either of those. If you think some part of the documentation or code can be improved using your reference, it would be nice to add that too. For example, you could add some documentation to explain the chi-squared test using your reference and cite the paper there. |
@tirthasheshpatel Great suggestion. I've included an explanation of when low frequencies warrant using Fisher's Exact test as higher power alternative. I've also added the minimum 13 count described in the original chi-square paper. |
|
Also, you have proposed from your master branch. This is not a good practice. In the future, please propose changes from a feature branch. Thanks! |
Co-authored-by: Tirth Patel <tirthasheshpatel@gmail.com>
|
Looks like the build agent for Windows 64-bit is going exceptionally slow, causing one of the checks to fail. I do not have dev.azure.com credentials to re-run failed builds. How would you like me to proceed? |
tirthasheshpatel
left a comment
There was a problem hiding this comment.
LGTM now. Thanks, @nightvision04! (As this is a documentation change, I don't think the awaiting workflows are necessary before merging)
tupui
left a comment
There was a problem hiding this comment.
LGTM as well. I checked the documentation build so no need for extra CI. I am merging then. Thanks @nightvision04 for contributing! And thanks @tirthasheshpatel for the review.
* master: (164 commits) DOC: Add Karl Pearson's reference to chi-square test (scipy#13971) BLD: fix build warnings for causal/anticausal pointers in ndimage MAINT: stats: Fix unused imports and a few other issues related to imports. DOC: fix typo MAINT: Remove duplicate calculations in sokalmichener BUG: spatial: fix weight handling of `distance.sokalmichener`. DOC: update Readme (scipy#13910) MAINT: QMCEngine d input validation (scipy#13940) MAINT: forward port 1.6.3 relnotes REL: add PEP 621 (project metadata in pyproject.toml) support EHN: signal: make `get_window` supports `general_cosine` and `general_hamming` window functions. (scipy#13934) ENH/DOC: pydata sphinx theme polishing (scipy#13814) DOC/MAINT: Add copyright notice to qmc.primes_from_2_to (scipy#13927) BUG: DOC: signal: fix need argument config and add missing doc link for `signal.get_window` DOC: fix subsets docstring (scipy#13926) BUG: signal: fix get_window argument handling and add tests. (scipy#13879) ENH: stats: add 'alternative' parameter to ansari (scipy#13650) BUG: Reactivate conda environment in init MAINT: use dict built-in rather than OrderedDict Revert "CI: Add nightly release of NumPy in linux workflows (scipy#13876)" (scipy#13909) ...
|
"If one or more frequencies I don't think that's true. Fisher's exact test in general is very conservative and doesn't have large power. Besides, Fisher's exact is for 2 by 2, while pearson's chisquare test is the same for arbitrary number of components. |
I didn't verify the "with greater statistical power" part. Sorry! I think you are right here. It's better not to comment about it unless we are absolutely sure.
I agree but would you agree with reformulating the docs to say that the Fisher Exact test is more suited for small sample sizes. (I think that is more accepted and also present on the wikipedia page) I can do a partial revert of this in a new PR if you agree. Otherwise, I can do a full revert of that statement. Thanks very much for verifying this! Feel free to share other thoughts that you may have. |
|
Fisher's exact test maintains size, that is type 1 error is below alpha, 0.05. (But because of the discrete sample space it can be far below the significance level) Some references recommend tests that maintain size (rejection rate approximately equal to alpha) on average instead of always. That makes the test less conservative, but overrejects in some cases. I think chisquare test becomes liberal similarly to wald test in small samples.
It's not clear whether a very conservative test is very useful either, although still better than very liberal. Maybe make the statement a bit broader and refer to "exact tests, such as Fisher's exact test" are recommended in small samples because they do not overreject. Barnard's test is an unconditional exact test, and there are a few alternatives that are approximately exact and those could be preferred to Fisher's exact test because they are less conservative in small samples. |
Thank you for this correction. I could have worded it softer and included this motivation. |
Makes sense. Thanks! I will rephrase this in a new PR unless @nightvision04 wants to take that up.
If you want, you could submit another PR rephrasing the statement as @josef-pkt said. Would you be willing to do that? |
|
Sure can @tirthasheshpatel . I'll have it up in the next few days. |
Pearson's original paper was important and I think it's an improvement to list it among chi-square test references.
Reference issue
Closes #13970
What does this implement/fix?
Added Pearson's landmark paper to the chi-square references.