Skip to content

BUG: stats: Fix zipf.pmf and zipfian.pmf for int32 k#20702

Merged
dschmitz89 merged 3 commits intoscipy:mainfrom
dschmitz89:gh20692
May 20, 2024
Merged

BUG: stats: Fix zipf.pmf and zipfian.pmf for int32 k#20702
dschmitz89 merged 3 commits intoscipy:mainfrom
dschmitz89:gh20692

Conversation

@dschmitz89
Copy link
Copy Markdown
Contributor

@dschmitz89 dschmitz89 commented May 12, 2024

Reference issue

Closes gh-20692

In the long run, these issues should hopefully be taken care of by the infrastructure (#15928 ).

@github-actions github-actions Bot added scipy.stats defect A clear bug or issue that prevents SciPy from being installed or used as expected labels May 12, 2024
@dschmitz89 dschmitz89 changed the title BUG: Fix zipf.pmf for integer k BUG: Fix zipf.pmf for int32 k May 12, 2024
@dschmitz89
Copy link
Copy Markdown
Contributor Author

zipfian likely suffers from the same issue:

return 1.0 / _gen_harmonic(n, a) / k**a

I am not familiar with these distributions though and could not cook up a failing example in a quick session, so let me know if this would be important to fix as well.

@tylerjereddy tylerjereddy added the backport-candidate This fix should be ported by a maintainer to previous SciPy versions. label May 13, 2024
@tylerjereddy tylerjereddy added this to the 1.14.0 milestone May 13, 2024
dist = zipf(9)
pmf = dist.pmf(k)
pmf_k_int32 = dist.pmf(k_int32)
assert_equal(pmf, pmf_k_int32) No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Daniel, I just messed around for a bit and found this case that failed before/passed after a similar patch per your comment on zipfian:

Details
diff --git a/scipy/stats/_discrete_distns.py b/scipy/stats/_discrete_distns.py
index cda7c4e11..2ee610827 100644
--- a/scipy/stats/_discrete_distns.py
+++ b/scipy/stats/_discrete_distns.py
@@ -1412,7 +1412,8 @@ class zipfian_gen(rv_discrete):
         return 1, n
 
     def _pmf(self, k, a, n):
-        return 1.0 / _gen_harmonic(n, a) / k**a
+        k = k.astype(np.float64)
+        return 1.0 / _gen_harmonic(n, a) * k**-a
 
     def _cdf(self, k, a, n):
         return _gen_harmonic(k, a) / _gen_harmonic(n, a)
diff --git a/scipy/stats/tests/test_discrete_distns.py b/scipy/stats/tests/test_discrete_distns.py
index c5993ebde..bbe9085ce 100644
--- a/scipy/stats/tests/test_discrete_distns.py
+++ b/scipy/stats/tests/test_discrete_distns.py
@@ -627,3 +627,12 @@ class TestBetaNBinom:
         #      return float(fourth_moment/var**2 - 3.)
         assert_allclose(betanbinom.stats(n, a, b, moments="k"),
                         ref, rtol=3e-15)
+
+
+def test_zipfian_gh_20692():
+    k = np.arange(0, 1000)
+    k_int32 = k.astype(np.int32)
+    dist = zipfian(111, 22)
+    pmf = dist.pmf(k)
+    pmf_k_int32 = dist.pmf(k_int32)
+    assert_equal(pmf, pmf_k_int32)

You shouldn't trust my judgement on stats stuff, but it does look quite similar in nature as you note. The obtained pmf floats are ridiculously tiny, maybe 111 is a bit much, but anyway I just pushed it to error out for integers. It does run lightning fast at least.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in the last commit.

Co-authored-by: Tyler Reddy <tyler.je.reddy@gmail.com>
@dschmitz89 dschmitz89 changed the title BUG: Fix zipf.pmf for int32 k BUG: Fix zipf.pmf and zipfian.pmf for int32 k May 14, 2024
@dschmitz89 dschmitz89 requested a review from tylerjereddy May 14, 2024 20:48
@lucascolley lucascolley changed the title BUG: Fix zipf.pmf and zipfian.pmf for int32 k BUG: stats: Fix zipf.pmf and zipfian.pmf for int32 k May 17, 2024
@dschmitz89 dschmitz89 merged commit d248b50 into scipy:main May 20, 2024
@dschmitz89
Copy link
Copy Markdown
Contributor Author

I went ahead and merged this since it is a quite simple bug fix.

@dschmitz89 dschmitz89 deleted the gh20692 branch May 20, 2024 10:09
@tylerjereddy tylerjereddy modified the milestones: 1.14.0, 1.13.1 May 22, 2024
tylerjereddy pushed a commit to tylerjereddy/scipy that referenced this pull request May 22, 2024
…) [wheel build]

* BUG: fix zipf.pmf for integer k

* ENH: increase range for zipfian.pmf for integer k

---------

Co-authored-by: Tyler Reddy <tyler.je.reddy@gmail.com>

[wheel build]
@tylerjereddy tylerjereddy removed the backport-candidate This fix should be ported by a maintainer to previous SciPy versions. label May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.stats

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: stats.zipf: incorrect pmf values

2 participants