Skip to content

Forecasting tutorial rework#972

Merged
fkiraly merged 31 commits intomainfrom
forecasting-tutorial-rework
Jun 23, 2021
Merged

Forecasting tutorial rework#972
fkiraly merged 31 commits intomainfrom
forecasting-tutorial-rework

Conversation

@fkiraly
Copy link
Collaborator

@fkiraly fkiraly commented Jun 18, 2021

This pull request contains a re-work of the forecasting tutorial which should make it more structured, complete, and accessible - from what right now looks like a patchwork piece that has organically proliferated...

Main changes:

  • re-structuring in chapters - basic workflows, selected estimators, composition (reduction, pipelines)
  • expanded explanation of what happens in sections/chapters, added some code for object inspection
  • factored out the "pitfalls" in a separate notebook, also expanded - focus is now on use of sktime primarily

@fkiraly fkiraly added the documentation Documentation & tutorials label Jun 18, 2021
@fkiraly fkiraly requested a review from mloning as a code owner June 18, 2021 00:19
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@fkiraly fkiraly requested review from aiwalter and big-o June 18, 2021 00:19
@fkiraly fkiraly marked this pull request as draft June 18, 2021 00:20
Copy link
Contributor

@aiwalter aiwalter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fkiraly I think its a good idea. My first impression:


Martin Walter martin_friedrich.walter@daimler.com, Mercedes-Benz AG on behalf of Daimler TSS GmbH.
https://github.com/Daimler/daimler-foss/blob/master/LEGAL_IMPRINT.md

@fkiraly
Copy link
Collaborator Author

fkiraly commented Jun 18, 2021

Would be good to see also the outputs of the cells

Don't the precommits eat the outputs away?
And, in general, I prefer not to commit them for "work in progress" notebooks, since they clutter the repo, especially the graphics. We should probably have them displayed for the "final" document though.

We could add a table of contents as described here: https://moonbooks.org/Articles/How-to-create-a-table-of-contents-in-a-jupyter-notebook-/

Yes, I was wondering how you do that - will have a look, thanks!

I think we need also a section for the transformers, this is not covered so far. But also related to forecasting I think

My feeling is, that should perhaps be an entirely separate notebook, since there's some complexity, e.g., the taxonomy on series, panel, etc? Maybe sth for the dev days next week. Might overload the forecasting notebook and break the narrative arc if there's too much in there.

The file 01a_forecasting_sklearn.ipynb is the old one, right?

Yes, I've copied it for easy copy/paste. Ultimately, I'd like to turn it into an appendix on the topic "what happens if you naively use scikit-learn, and why sktime is the right way to do lagging plus supervised prediction", which currently occupies much of the introduction. I found it an interesting topic, but it prevents you from getting to usage early on, which is what most users might be looking fore in a tutorial.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Jun 18, 2021

while I have your attention, @aiwalter: is the code in the "advanced benchmarking" section still up-to-date? Or is there newer material that you would like to see there?

@aiwalter
Copy link
Contributor

which section do you mean?
The one for backtesting or model selection with MultiplexForecaster? Generally, the old or current notebook should be up to date as the code should be tested by the tests?

I would like to see also a more sophisticated tuning part including MultiplexForecaster, TransformedTargetForecaster or the new ForecastingPipeline, some transformers, OptionalPassthrough, TabularToSeriesAdaptor. Maybe this can be a separate page/Notebook on the website then.


Martin Walter martin_friedrich.walter@daimler.com, Mercedes-Benz AG on behalf of Daimler TSS GmbH.
https://github.com/Daimler/daimler-foss/blob/master/LEGAL_IMPRINT.md

@fkiraly
Copy link
Collaborator Author

fkiraly commented Jun 18, 2021

I would like to see also a more sophisticated tuning part including MultiplexForecaster, TransformedTargetForecaster or the new ForecastingPipeline, some transformers, OptionalPassthrough, TabularToSeriesAdaptor. Maybe this can be a separate page/Notebook on the website then.

I think that would be really nice in the "tuning" Section 3.3, as the "expert" sub-section which really shows (or brags about) all the great things we can do with sktime - but I also need to leave something for you to write. no? 😄
(feel free to PR this branch)

Copy link
Contributor

@mloning mloning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good!

A few comments:

  • I wouldn't add set up instructions on top of each notebook, they may change and keeping them up-to-date will be hard, so instead I'd either automatically add more info in the header that's added to all notebooks automatically on the sphinx-built online documentation or simply link to the installation guide, same applies to notes on development status of particular functionality
  • automated table of contents would be good as suggested by @aiwalter
  • for exogenous data, there is the load_longley data if that's helpful
  • instead of "querying predictions" I'd say "generating predictions"
  • overview of forecasters in sktime, I'd use `all_estimators("forecaster") to generate this dynamically
  • typo "obatins"
  • we now also have "dirrec" reduction strategy

The other more general point is that Jupyter notebooks are very hard to maintain in general, due to their incompatibility with git. An alternative may be to convert them into Python files and then use sphinx gallery for docs and an automated conversion for binder as done by scikit-learn.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Jun 19, 2021

The other more general point is that Jupyter notebooks are very hard to maintain in general, due to their incompatibility with git. An alternative may be to convert them into Python files and then use sphinx gallery for docs and an automated conversion for binder as done by scikit-learn.

Yes, but I still think we should use jupyter notebooks.
Why: all users can run them, assuming the right packages are installed. More specifically, no dev IDE set-up is required, anaconda with jupyter suffices.

Binder is error prone (it still is not fixed! see #662), and python raises the barrier to run the code.
My opinion is to keep the entry barrier to the tutorials as low as possible.

@mloning
Copy link
Contributor

mloning commented Jun 19, 2021

My opinion is to keep the entry barrier to the tutorials as low as possible.

Agreed, but sphinx gallery allows you to download Python scripts as Jupyter notebooks (see e.g. https://scikit-learn.org/stable/auto_examples/bicluster/plot_spectral_coclustering.html#sphx-glr-auto-examples-bicluster-plot-spectral-coclustering-py)

@fkiraly
Copy link
Collaborator Author

fkiraly commented Jun 19, 2021

Agreed, but sphinx gallery allows you to download Python scripts as Jupyter notebooks

Hm, does this properly support markdown and cells?
Happy to try it if someone else sets it up.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Jun 19, 2021

due to their incompatibility with git

they are not incompatible with git? Or what do you mean?

@mloning
Copy link
Contributor

mloning commented Jun 19, 2021

I meant that changes to Jupyter notebooks are not line based so it's hard to see what's going on with git to the point where tracking line changes is basically useless (hence tools like ReviewNB)

@fkiraly
Copy link
Collaborator Author

fkiraly commented Jun 19, 2021

I meant that changes to Jupyter notebooks are not line based so it's hard to see what's going on with git to the point where tracking line changes is basically useless (hence tools like ReviewNB)

but we're using reviewnb, no?

@MarcoGorelli MarcoGorelli mentioned this pull request Jun 20, 2021
5 tasks
@fkiraly
Copy link
Collaborator Author

fkiraly commented Jun 20, 2021

Thanks, @MarcoGorelli, for helpful comments!

Hi @fkiraly - flake8 doesn't modify files, it just checks for common linting errors, for better or for worse they need to then be fixed up manually. So the same will happen when running nbqa flake8

Argh - so there's no way around trying to find the exact cell manually?
My current workflow is extremely fiddly, since visual studio code (vanilla python extension) does not display cell numbers, and the numbers when running are different than those returned by the pre-commits, since they don't count markdown cells, I suppose.

Any advice on a better workflow is appreciated...

I've found the pre-commit maintainer to be extremely helpful, do have you tried reporting this to their repo?

Well, I'm not sure whether i'ts a pre-commit problem, a vs code problem, a GitHub desktop problem, a Windows admin rights problem, or a combination thereof. I wouldn't be able to produce a helpful bug report. I could compile the strange error messages though.

@MarcoGorelli
Copy link
Contributor

the numbers when running are different than those returned by the pre-commits, since they don't count markdown cells, I suppose.
Any advice on a better workflow is appreciated...

The numbers from the nbqa output are of the code cells - the ones that are displayed in the cell are the execution count, which unfortunately can't be used in the nbqa output because they might contain duplicates or even be None if they haven't been executed

However, if you restart a notebook and execute it top-to-bottom, then the execution count should match the cell number in nbqa's output

@fkiraly
Copy link
Collaborator Author

fkiraly commented Jun 20, 2021

However, if you restart a notebook and execute it top-to-bottom, then the execution count should match the cell number in nbqa's output

That's what I thought I tried (obviously, if there are duplicats/skips there's no way it could match) and the numbers didn't match, but maybe I did sth wrong

Let me try it again - because if it wouldn't match it would be a bug, right?

@fkiraly
Copy link
Collaborator Author

fkiraly commented Jun 20, 2021

hm, it fits now, so I must have looked at the wrong log or notebook then, or made some other silly mistake. Seems to work as intended.

@MarcoGorelli
Copy link
Contributor

no worries 😄

because if it wouldn't match it would be a bug, right?

yes, exactly - and if you do encounter that, a bug report to nbqa would be appreciated 🙏

@fkiraly
Copy link
Collaborator Author

fkiraly commented Jun 20, 2021

well, it's a bug, but it looks like it was in my head - so, all fine for now
thanks for advice, @MarcoGorelli

@fkiraly
Copy link
Collaborator Author

fkiraly commented Jun 22, 2021

@mloning, @aiwalter, are you happy with the tutorial in its current proposed state?

Comments are either addressed or factored out as issues with major additional work.

@review-notebook-app
Copy link

review-notebook-app bot commented Jun 22, 2021

View / edit / reply to this conversation on ReviewNB

mloning commented on 2021-06-22T15:23:46Z
----------------------------------------------------------------

I would include this here, I listed alternatives in my review above.


fkiraly commented on 2021-06-22T16:42:43Z
----------------------------------------------------------------

sorry, what do you want to include in the instructions? Not sure what you are referring to

Copy link
Contributor

@mloning mloning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fkiraly we probably want to add the to the docs too and make sure that it's displayed there properly with outputs (see https://github.com/alan-turing-institute/sktime/blob/main/docs/source/tutorials.rst)

Copy link
Collaborator Author

fkiraly commented Jun 22, 2021

sorry, what do you want to include in the instructions? Not sure what you are referring to


View entire conversation on ReviewNB

@fkiraly
Copy link
Collaborator Author

fkiraly commented Jun 22, 2021

sorry, what do you want to include in the instructions? Not sure what you are referring to

I've added examples/01a_forecasting_sklearn.ipynb to the tutorials.rst now, is that what you meant?

@fkiraly
Copy link
Collaborator Author

fkiraly commented Jun 23, 2021

I noticed I forgot section 4 - I now added that section, which is basically a step-by-step extension guide

@fkiraly fkiraly requested review from aiwalter and mloning June 23, 2021 11:12
@fkiraly
Copy link
Collaborator Author

fkiraly commented Jun 23, 2021

would you mind reviewing again with intention to merge?

I think suggestions are addressed, suggestions that would be major changes not just refactors are spun out in issues.

Copy link
Contributor

@aiwalter aiwalter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fkiraly few comments:

  • should we maybe add an example anywhere about forecasting with exog data?
  • what do you think about renaming the notebooks to 01a_forecasting.ipynb and 01b_forecasting_sklearn.ipynb or sth? As there might be other notebooks to follow? Just a minor thing.
  • Could we display the cell outputs also before merging?

@fkiraly
Copy link
Collaborator Author

fkiraly commented Jun 23, 2021

should we maybe add an example anywhere about forecasting with exog data?

Yes, but I would have to make it up from scratch - so it would be in the "more than just a refactor" category. Would say, out of scope.

what do you think about renaming the notebooks to 01a_forecasting.ipynb and 01b_forecasting_sklearn.ipynb or sth? As there might be other notebooks to follow? Just a minor thing.

Well, I thought there should be one "main" notebook, and the others are appendices a, b, and so on.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Jun 23, 2021

Could we display the cell outputs also before merging?

Sure, I'll do that precisely when you think it's ready to, to avoid repo clutter.

@fkiraly fkiraly merged commit d3c102b into main Jun 23, 2021
@fkiraly fkiraly deleted the forecasting-tutorial-rework branch June 23, 2021 19:45
@fkiraly
Copy link
Collaborator Author

fkiraly commented Jun 23, 2021

merged with @aiwalter having read through & tested - to facilitate additions of @aiwalter and other smaller additions during dev days

fkiraly added a commit that referenced this pull request Jun 24, 2021
This is the forecasting tutorial with all cell contents, see #972.

No other changes except cell content in the main forecasting tutorial added.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Documentation & tutorials

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants