Skip to content

DOC: Add Karl Pearson's reference to chi-square test#13971

Merged
tupui merged 7 commits intoscipy:masterfrom
nightvision04:master
May 3, 2021
Merged

DOC: Add Karl Pearson's reference to chi-square test#13971
tupui merged 7 commits intoscipy:masterfrom
nightvision04:master

Conversation

@nightvision04
Copy link
Copy Markdown
Contributor

Pearson's original paper was important and I think it's an improvement to list it among chi-square test references.

Reference issue

Closes #13970

What does this implement/fix?

Added Pearson's landmark paper to the chi-square references.

Pearson's original paper was important and I think it's an improvement to list it among chi-square test references.
@nightvision04 nightvision04 changed the title Added Karl Pearson's reference to chi-square test ENH: Added Karl Pearson's reference to chi-square test May 2, 2021
@nightvision04 nightvision04 changed the title ENH: Added Karl Pearson's reference to chi-square test ENH: Add Karl Pearson's reference to chi-square test May 2, 2021
@nightvision04 nightvision04 changed the title ENH: Add Karl Pearson's reference to chi-square test DOC: Add Karl Pearson's reference to chi-square test May 2, 2021
@tirthasheshpatel
Copy link
Copy Markdown
Contributor

I often work with chi-square tests and found it surprising that Pearson's paper wasn't found among the references. Is this something we can add?

The references section contains papers/articles which the author has referred to, to either write the code or the documentation. In this case, it seems like the author didn't refer to the paper you have mentioned to do either of those. If you think some part of the documentation or code can be improved using your reference, it would be nice to add that too. For example, you could add some documentation to explain the chi-squared test using your reference and cite the paper there.

@nightvision04
Copy link
Copy Markdown
Contributor Author

The references section contains papers/articles which the author has referred to, to either write the code or the documentation. In this case, it seems like the author didn't refer to the paper you have mentioned to do either of those. If you think some part of the documentation or code can be improved using your reference, it would be nice to add that too. For example, you could add some documentation to explain the chi-squared test using your reference and cite the paper there.

@tirthasheshpatel Great suggestion. I've included an explanation of when low frequencies warrant using Fisher's Exact test as higher power alternative. I've also added the minimum 13 count described in the original chi-square paper.

Comment thread scipy/stats/stats.py Outdated
@tirthasheshpatel
Copy link
Copy Markdown
Contributor

tirthasheshpatel commented May 2, 2021

Also, you have proposed from your master branch. This is not a good practice. In the future, please propose changes from a feature branch. Thanks!

Comment thread scipy/stats/stats.py Outdated
@tylerjereddy tylerjereddy added the Documentation Issues related to the SciPy documentation. Also check https://github.com/scipy/scipy.org label May 2, 2021
nightvision04 and others added 2 commits May 2, 2021 12:45
Co-authored-by: Tirth Patel <tirthasheshpatel@gmail.com>
@nightvision04
Copy link
Copy Markdown
Contributor Author

Looks like the build agent for Windows 64-bit is going exceptionally slow, causing one of the checks to fail. I do not have dev.azure.com credentials to re-run failed builds.

How would you like me to proceed?

Copy link
Copy Markdown
Contributor

@tirthasheshpatel tirthasheshpatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now. Thanks, @nightvision04! (As this is a documentation change, I don't think the awaiting workflows are necessary before merging)

@ilayn ilayn added this to the 1.7.0 milestone May 3, 2021
Copy link
Copy Markdown
Member

@tupui tupui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well. I checked the documentation build so no need for extra CI. I am merging then. Thanks @nightvision04 for contributing! And thanks @tirthasheshpatel for the review.

@tupui tupui merged commit 6e5c9ee into scipy:master May 3, 2021
patnr added a commit to patnr/scipy that referenced this pull request May 3, 2021
* master: (164 commits)
  DOC: Add Karl Pearson's reference to chi-square test (scipy#13971)
  BLD: fix build warnings for causal/anticausal pointers in ndimage
  MAINT: stats: Fix unused imports and a few other issues related to imports.
  DOC: fix typo
  MAINT: Remove duplicate calculations in sokalmichener
  BUG: spatial: fix weight handling of `distance.sokalmichener`.
  DOC: update Readme (scipy#13910)
  MAINT: QMCEngine d input validation (scipy#13940)
  MAINT: forward port 1.6.3 relnotes
  REL: add PEP 621 (project metadata in pyproject.toml) support
  EHN: signal: make `get_window` supports `general_cosine` and `general_hamming` window functions. (scipy#13934)
  ENH/DOC: pydata sphinx theme polishing (scipy#13814)
  DOC/MAINT: Add copyright notice to qmc.primes_from_2_to (scipy#13927)
  BUG: DOC: signal: fix need argument config and add missing doc link for `signal.get_window`
  DOC: fix subsets docstring (scipy#13926)
  BUG: signal: fix get_window argument handling and add tests. (scipy#13879)
  ENH: stats: add 'alternative' parameter to ansari (scipy#13650)
  BUG: Reactivate conda environment in init
  MAINT: use dict built-in rather than OrderedDict
  Revert "CI: Add nightly release of NumPy in linux workflows (scipy#13876)" (scipy#13909)
  ...
@josef-pkt
Copy link
Copy Markdown
Member

"If one or more frequencies
are less than 5, Fisher's Exact Test can be used with greater statistical
power."

I don't think that's true. Fisher's exact test in general is very conservative and doesn't have large power.
I think peason chisquare and similar tests would over reject, but I'm not completely sure about chisquare. (wald test in 2 by 2 strongly overrejects in small samples)

Besides, Fisher's exact is for 2 by 2, while pearson's chisquare test is the same for arbitrary number of components.

@tirthasheshpatel
Copy link
Copy Markdown
Contributor

tirthasheshpatel commented May 4, 2021

I don't think that's true. Fisher's exact test in general is very conservative and doesn't have large power.
I think peason chisquare and similar tests would over reject, but I'm not completely sure about chisquare. (wald test in 2 by 2 strongly overrejects in small samples)

I didn't verify the "with greater statistical power" part. Sorry! I think you are right here. It's better not to comment about it unless we are absolutely sure.

Besides, Fisher's exact is for 2 by 2, while pearson's chisquare test is the same for arbitrary number of components.

I agree but would you agree with reformulating the docs to say that the Fisher Exact test is more suited for small sample sizes. (I think that is more accepted and also present on the wikipedia page)

I can do a partial revert of this in a new PR if you agree. Otherwise, I can do a full revert of that statement. Thanks very much for verifying this! Feel free to share other thoughts that you may have.

@josef-pkt
Copy link
Copy Markdown
Member

Fisher's exact test maintains size, that is type 1 error is below alpha, 0.05. (But because of the discrete sample space it can be far below the significance level)

Some references recommend tests that maintain size (rejection rate approximately equal to alpha) on average instead of always. That makes the test less conservative, but overrejects in some cases.

I think chisquare test becomes liberal similarly to wald test in small samples.
But, I don't have a reference for the small sample performance of pearson's chisquare test. My readings were mostly on hypothesis tests for one sample and two sample proportions.

I agree but would you agree with reformulating the docs to say that the Fisher Exact test is more suited for small sample sizes.

It's not clear whether a very conservative test is very useful either, although still better than very liberal.

Maybe make the statement a bit broader and refer to "exact tests, such as Fisher's exact test" are recommended in small samples because they do not overreject.

Barnard's test is an unconditional exact test, and there are a few alternatives that are approximately exact and those could be preferred to Fisher's exact test because they are less conservative in small samples.

@nightvision04
Copy link
Copy Markdown
Contributor Author

Maybe make the statement a bit broader and refer to "exact tests, such as Fisher's exact test" are recommended in small samples because they do not overreject.

Thank you for this correction. I could have worded it softer and included this motivation.

@tirthasheshpatel
Copy link
Copy Markdown
Contributor

Maybe make the statement a bit broader and refer to "exact tests, such as Fisher's exact test" are recommended in small samples because they do not overreject.

Barnard's test is an unconditional exact test, and there are a few alternatives that are approximately exact and those could be preferred to Fisher's exact test because they are less conservative in small samples.

Makes sense. Thanks! I will rephrase this in a new PR unless @nightvision04 wants to take that up.

I could have worded it softer and included this motivation.

If you want, you could submit another PR rephrasing the statement as @josef-pkt said. Would you be willing to do that?

@nightvision04
Copy link
Copy Markdown
Contributor Author

Sure can @tirthasheshpatel . I'll have it up in the next few days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Documentation Issues related to the SciPy documentation. Also check https://github.com/scipy/scipy.org scipy.stats

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pearson's original paper on chi-square test could be referenced.

6 participants