Skip to content

Warn users that engine='auto' will change in future#8907

Merged
jcrist merged 3 commits intodask:mainfrom
jcrist:warn-parquet-auto
Apr 13, 2022
Merged

Warn users that engine='auto' will change in future#8907
jcrist merged 3 commits intodask:mainfrom
jcrist:warn-parquet-auto

Conversation

@jcrist
Copy link
Member

@jcrist jcrist commented Apr 8, 2022

Adds a warning to read_parquet and to_parquet that the meaning of
engine='auto' will change in the future (switching to using pyarrow
if it is installed, and falling back to fastparquet). This warning is
only raised for users that have both libraries installed and have
engine='auto'. Users without both libraries installed or that already
specify an engine will not see this warning.

First part of #8900.

Adds a warning to `read_parquet` and `to_parquet` that the meaning of
`engine='auto'` will change in the future (switching to using `pyarrow`
if it is installed, and falling back to `fastparquet`). This warning is
only raised for users that have both libraries installed and have
`engine='auto'`. Users without both libraries installed or that already
specify an engine will not see this warning.
@martindurant
Copy link
Member

Should we perhaps come to a consensus in #8900 , or at least take a vote before considering this?

@jcrist
Copy link
Member Author

jcrist commented Apr 8, 2022

Definitely. I'm just putting this up to have something concrete to talk about.

@martindurant
Copy link
Member

OK, got it. You may want to mark this WIP, although the comments we just added should make that clear for any other reviewer than happens to glance here.

@jcrist
Copy link
Member Author

jcrist commented Apr 13, 2022

I've updated this PR to fix the failing test - I believe (based on the conversation in #8900) that this should be good to merge after tests pass. cc @bryanwweber for a quick review to make sure I didn't bungle anything.

@mrocklin
Copy link
Member

Hey, I want to raise a slight concern here (sorry for not doing it earlier). For a user who doesn't care at all about this, they're going to get a confusing warning. Are we ok with that?

My guess is that we'll cause more stress/confusion by raising a warning here than by just silently changing things.

@mrocklin
Copy link
Member

If fastparquet and pyarrow were known to produce different results or something I would totally get wanting to emit something, but they should be drop-in replacements. This seems like being a little bit too loud to me.

Happy to be overruled though.

@jcrist
Copy link
Member Author

jcrist commented Apr 13, 2022

My guess is that we'll cause more stress/confusion by raising a warning here than by just silently changing things.

This warning is only raised for users that have both pyarrow and fastparquet installed and no engine is explicitly specified, which is likely to be rare. There are functional differences between the two apis, so I think warning in this case is worthwhile.

@martindurant
Copy link
Member

that have both pyarrow and fastparquet installed and no engine is explicitly specified

I think the standard Anaconda installer has both

Copy link
Contributor

@bryanwweber bryanwweber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a small style nit, thanks @jcrist!

@jcrist
Copy link
Member Author

jcrist commented Apr 13, 2022

I think the standard Anaconda installer has both

Hmmm, ok. What do you think of the current warning logic Martin? I like it for its simplicity, but if this is likely to be noisy for naive users then we can revisit it. A few other options:

  • What we currently have
  • Warn only for writing (where the differences matter a bit more)
  • Warn only if the user passes in backend-specific kwargs
  • Don't warn at all

If no preference, then I'm likely to keep what we currently have.

@mrocklin
Copy link
Member

I think the standard Anaconda installer has both

Yeah. This is my concern. The naive user probably isn't well served by this choice.

@martindurant
Copy link
Member

Warn only if the user passes in backend-specific kwargs

This would be great, but is it tractable? Otherwise, I could go either way between no warning and warn-on-write.

@mrocklin
Copy link
Member

More broadly, as we change defaults here will we provide warnings? My gut reaction is that we shouldn't. There are probably several changes of this magnitude or greater.

@jcrist
Copy link
Member Author

jcrist commented Apr 13, 2022

This would be great, but is it tractable?

We'd just check if there are any extra kwargs passed to read_parquet/to_parquet (which are always forwarded to the backend), and if so warn then. I'll switch to this.

More broadly, as we change defaults here will we provide warnings? My gut reaction is that we shouldn't.

I think that in cases where we can warn and not have it be obnoxious we should. I thought this was a case where the warning would be unlikely, but it turns out it isn't. Generally I like to let users whose code may break know beforehand that something big is changing.

We now warn if:

- `engine='auto'`
- Both `pyarrow` and `fastparquet` are installed
- Backend specific options are provided
@jcrist
Copy link
Member Author

jcrist commented Apr 13, 2022

Updated. We now only warn if all of the following are true:

  • engine='auto'
  • Both pyarrow and fastparquet are installed
  • Backend specific options are provided

@mrocklin
Copy link
Member

mrocklin commented Apr 13, 2022 via email

@jcrist jcrist merged commit bfc76af into dask:main Apr 13, 2022
@jcrist jcrist deleted the warn-parquet-auto branch April 13, 2022 21:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants