Skip to content

Deprecation warning for default value of boundary kwarg in map_overlap#8397

Merged
jsignell merged 12 commits intodask:mainfrom
GenevieveBuckley:boundary-default-warning
Nov 29, 2021
Merged

Deprecation warning for default value of boundary kwarg in map_overlap#8397
jsignell merged 12 commits intodask:mainfrom
GenevieveBuckley:boundary-default-warning

Conversation

@GenevieveBuckley
Copy link
Copy Markdown
Contributor

This PR is the first step in updating the default boundary kwarg value in Dask's map_overlap function. The current default value is "reflect", in future this will be changed to "none".

We want to do this primarily for performance reasons, as discussed here: #8391

Questions:

  1. Should we choose a specific version to change this in? Something 6 months or a year away?
  2. Do we need to do this as a two step process? Potentially I could use the disutils LooseVersion to check whether the dask version is above or below a certain number, so this change would turn on automatically.
    • ...although then I guess you'd still need a second step to prune out that stuff at a later date, it's just less important to remember to do it at a specific time
  3. Is FutureWarning correct to use here? This seems the most commonly used in the Dask codebase, although there is also a PendingDeprecationWarning used a handful of times, so I thought I should check.

@github-actions github-actions bot added the array label Nov 18, 2021
@GenevieveBuckley
Copy link
Copy Markdown
Contributor Author

cc @jakirkham

@GenevieveBuckley
Copy link
Copy Markdown
Contributor Author

There are a number of tests that are failing because of the FutureWarning added here. I've updated the tests to explicitly specify what boundary value they need here.

Copy link
Copy Markdown
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Genevieve! This looks good 😄

Had a couple comments on the warning message. Otherwise LGTM

# Default boundary value is set in the function named "coerce_boundary"
warnings.warn(
"Default 'boundary' argument value will change from 'reflect' "
"to 'none' in future versions. "
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we specify a version here?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can be relatively soon. Maybe February?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to specify a version.

February seems too early though, lots of people will be on holiday in December and January, so it's quite likely people won't see this at all before it changes. (Although I'm sure plenty of scientists have code they run infrequently, like once every six months or so on a new dataset, so maybe it's impractical to wait a huge amount of time.)

Mind you, the only visible differences should be at the very outer edges of a Dask array, so the impact of the change is pretty small. Maybe it's ok to change it soon and have people's first introduction be a pleasant surprise things are running faster.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can always change the version if we don't feel ready. But I tend to agree that I don't think this will negatively impact many people. From the responses on this, the FutureWarning won't really hit them either since they set this kwarg. So I'm fine with whatever you choose.

Copy link
Copy Markdown
Contributor Author

@GenevieveBuckley GenevieveBuckley Nov 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll say March then, 2021.03.0
It's an arbitrary choice, but we should pick something.

GenevieveBuckley and others added 2 commits November 22, 2021 19:44
Co-authored-by: jakirkham <jakirkham@gmail.com>
@GenevieveBuckley
Copy link
Copy Markdown
Contributor Author

Ok, latest commits should fix up those doctest errors (doctests were not explicitly specifying boundary= kwarg and then failing when the warning was raised)

@GenevieveBuckley
Copy link
Copy Markdown
Contributor Author

GenevieveBuckley commented Nov 25, 2021

...I don't really understand why the macOS tests are failing here. Something's a bit weird. I'm going to investigate on my personal laptop & see if I can reproduce it.

Locally on a Mac, nothing fails when I run pytest dask/tests/test_layers.py (which is where the CI test failure is happening. But when I try running the whole test suite, I get a pytest error "Too many open files". I thought that might be this problem again, but it doesn't look like it. My ulimit is reported as unlimited, and trying the trick with ulimit -n 2048 doesn't change anything.

I've tried with both python 3.8 and python 3.9 environments, and with the main and boundary-default-warning branches. They don't seem to affect things much.

I see the CI test suite passing for other recent PRs, so I don't think anything is broken on the main branch. I might try merging main into this branch, to see if that helps. I don't see why it would, but worth a shot.

@GenevieveBuckley
Copy link
Copy Markdown
Contributor Author

I should add - the kinds of test errors I'm seeing on my laptop are not the same event loop errors that are happening in the CI.

ERROR dask/array/tests/test_xarray.py - RuntimeError: There is no current eve...
ERROR dask/bytes/tests/test_hdfs.py - RuntimeError: There is no current event...
ERROR dask/tests/test_distributed.py - RuntimeError: There is no current even...
ERROR dask/tests/test_layers.py - RuntimeError: There is no current event loo...
ERROR gw0

Here's hoping those are just a bit of a fluke.

Copy link
Copy Markdown
Member

@jsignell jsignell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thanks for being so thorough!

@jsignell jsignell merged commit 848388f into dask:main Nov 29, 2021
@GenevieveBuckley
Copy link
Copy Markdown
Contributor Author

I've just scheduled an email that should hopefully remind me to make a follow up PR to this in the last week of February (when we should actually change the kwarg default value). Fingers crossed the reminder works!

@GenevieveBuckley GenevieveBuckley deleted the boundary-default-warning branch November 29, 2021 23:27
@jsignell
Copy link
Copy Markdown
Member

Whoa. I've never thought of doing something like that.

jsignell pushed a commit that referenced this pull request Feb 28, 2022
…8743)

Followup to PR #8397

We've had a FutureWarning up for a few months about an upcoming change to the default 'boundary' kwarg value in `map_overlap`, so now is the time to change it. Previous default was `"reflect"`, new default will be "None".

The reason for this change is that it makes the code run a lot faster, and for most people the overlap depth is sufficient and they should not require additional boundary handling. See #8391 for a full discussion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants