Skip to content

[ENH] Student's t-distribution#5050

Merged
fkiraly merged 19 commits into
sktime:mainfrom
alex-gregory-ds:add-t-distribution
Aug 24, 2023
Merged

[ENH] Student's t-distribution#5050
fkiraly merged 19 commits into
sktime:mainfrom
alex-gregory-ds:add-t-distribution

Conversation

@alex-gregory-ds

Copy link
Copy Markdown
Contributor

Reference Issues/PRs

Related to #4518

What does this implement/fix? Explain your changes.

Add t-distribution to the proba subpackage.

Does your contribution introduce a new dependency? If yes, which one?

No

What should a reviewer concentrate their feedback on?

Did you add any tests for the change?

Not yet.

Any other comments?

PR checklist

For all contributions
  • I've added myself to the list of contributors with any new badges I've earned :-)
    How to: add yourself to the all-contributors file in the sktime root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
    See here for full badge reference
  • Optionally, I've added myself and possibly others to the CODEOWNERS file - do this if you want to become the owner or maintainer of an estimator you added.
    See here for further details on the algorithm maintainer role.
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.

Comment thread sktime/proba/t.py Outdated
Comment thread sktime/proba/t.py Outdated
Comment thread sktime/proba/t.py Outdated
Comment thread sktime/proba/t.py
Comment thread sktime/proba/t.py Outdated
@alex-gregory-ds

Copy link
Copy Markdown
Contributor Author

@fkiraly, my apologies, this is basically a copy paste from the normal currently. I have only changed the pdf function at the moment. Still working through the rest. I should have made this clearer.

@fkiraly fkiraly added implementing algorithms Implementing algorithms, estimators, objects native to sktime enhancement Adding new functionality module:probability&simulation probability distributions and simulators labels Aug 8, 2023
@fkiraly

fkiraly commented Aug 8, 2023

Copy link
Copy Markdown
Collaborator

just posting this here in case you're not aware
https://xkcd.com/1347/

@fkiraly fkiraly changed the title Initialise student's t-distribution [ENH] Student's t-distribution Aug 9, 2023
Comment thread sktime/proba/t.py
@alex-gregory-ds

Copy link
Copy Markdown
Contributor Author

@fkiraly, I couldn't find a formula for energy, if you have one I am happy to implement it. Also, from what I have read it sounds like there isn't an analytic solution for probability point function for the t-distribution. Are you aware of an analytic solution for the ppf?

@fkiraly

fkiraly commented Aug 12, 2023

Copy link
Copy Markdown
Collaborator

I couldn't find a formula for energy, if you have one I am happy to implement it.

Thanks! It's nasty though, so don't do it if it feels to unpleasant.

Appendix A.2 of "evaluating forecasts with scoringRules" has a few explicit formulae for the energy, including the t-distribution. The expression is hard to track in implementation, so I would advise comparing against the Monte-Carlo default if you implement it.

Are you aware of an analytic solution for the ppf?

The inverse t cdf tends to appear, here's what Wolfram Alpha says: https://www.wolframalpha.com/input/?i=inverse+cdf+student%27s+t+distribution

@alex-gregory-ds

Copy link
Copy Markdown
Contributor Author

here's what Wolfram Alpha says: https://www.wolframalpha.com/input/?i=inverse+cdf+student%27s+t+distribution

Just added the ppf using the Wolfram Alpha link. Thanks for sharing that link.

@fkiraly

fkiraly commented Aug 20, 2023

Copy link
Copy Markdown
Collaborator

Have you checked that it's indeed inverse to the cdf?

This should probably be part of the tests, too, note to self.

@alex-gregory-ds

alex-gregory-ds commented Aug 20, 2023

Copy link
Copy Markdown
Contributor Author

Have you checked that it's indeed inverse to the cdf?

Here is an example showing the output of ppf is the inverse of cdf.

>>> import pandas as pd
>>> from sktime.proba.t import TDistribution
>>> n = TDistribution(mu=0, sigma=1, df=[[1, 1], [1, 1], [1, 1]])
>>> p = pd.DataFrame({"col1": [0.1, 0.3, 0.5], "col2": [0.7, 0.9, 1.0]})
>>> n.cdf(n.ppf(p))
   col1  col2
0   0.1   0.7
1   0.3   0.9
2   0.5   NaN

@alex-gregory-ds

Copy link
Copy Markdown
Contributor Author

@fkiraly, I am unsure where to start with the energy function. I see the section for student's t-distribution in A.2 of "evaluating forecasts with scoringRules" but, since there isn't a flashing neon sign saying "this is the energy function!" I am unsure of what needs to be implemented.

I assume the continuous ranked probability score (CRPS) is related to energy but it is not obvious to me how.

Would it be possible to get a review and perhaps merge in what is currently implemented and leave energy function for a different PR unless you want to take a look at energy on this PR?

@fkiraly

fkiraly commented Aug 21, 2023

Copy link
Copy Markdown
Collaborator

Would it be possible to get a review and perhaps merge in what is currently implemented and leave energy function for a different PR unless you want to take a look at energy on this PR?

Sure!

Have you checked why the tests are failing?
There's something going on with indexing. Happy to have a more detailed look if you're not sure what is wrong here.

@fkiraly

fkiraly commented Aug 21, 2023

Copy link
Copy Markdown
Collaborator

I assume the continuous ranked probability score (CRPS) is related to energy but it is not obvious to me how.

Yes, the general formula is CRPS(d.cdf, y) = d.energy(y) - 0.5 * d.energy(). Unfortunately the reference only gives CRPS, but the scaling behaviour of the terms (d.energy() does not depend on y and is proportional to d.std()) makes me think that the positive terms are d.energy(y), and then negative term is - 0.5 * d.energy(). If not, d.energy(y) should be the positive terms minus a constant multiple of the negative term (the coefficient should be "simple" and guessable by comparison with the Monte Carlo estimate).

As said, we can leave it for now - no worries. It's unpleasant as I alluded to above (and in addition, unfortunately, not 100% clear from the reference).

@fkiraly

fkiraly commented Aug 21, 2023

Copy link
Copy Markdown
Collaborator

Btw @Alex-JG3, thanks a lot!
What would also be useful is feedback on the implementer experience - e.g., is there boilerplate that is unclear, or interface elements that you think would be better changed?

Comment thread sktime/proba/t.py Outdated
Comment thread sktime/proba/t.py Outdated
@alex-gregory-ds alex-gregory-ds marked this pull request as ready for review August 23, 2023 16:19

@fkiraly fkiraly left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, really nice!

I haven't checked all the formulae, but I did spot checks and it looked good.

Would you be so kind and also open a PR on skpro?
See sktime/skpro#22
Just copy-paste this in the appropriate folder and change the BaseDistribution import location, the test framework will automatically do the rest.

If you're interested, you could:

  • implement more distributions - sktime/skpro#22
  • join on the next phease refinement of the distributions interface sktime/skpro#21
  • contribute probabilistic regressors in skpro - sktime/skpro#7
  • contribute probabilistic forecasters in sktime - #1742

(happy to chat on discord for planning!)

@fkiraly fkiraly merged commit 1826772 into sktime:main Aug 24, 2023
fkiraly pushed a commit to sktime/skpro that referenced this pull request Aug 27, 2023
<!--
Thanks for contributing a pull request! Please ensure you have taken a
look
at our contribution guide:
https://skbase.readthedocs.io/en/latest/contribute.html
-->

#### Reference Issues/PRs
<!--
Example: Fixes #1234. See also #3456.

Please use keywords (e.g., Fixes) to create link to the issues or pull
requests
you resolved, so that they will automatically be closed when your pull
request
is merged. See
https://github.com/blog/1506-closing-issues-via-pull-requests
-->

Mirror of `sktime` sktime/sktime#5050. Towards #22


#### What does this implement/fix? Explain your changes.
<!--
A clear and concise description of what you have implemented. Remember
to implement
unit tests and docstrings if your pull request commits code to the
repository.
-->

Add student's t-distribution.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Adding new functionality implementing algorithms Implementing algorithms, estimators, objects native to sktime module:probability&simulation probability distributions and simulators

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants