Warn users that engine='auto' will change in future#8907
Conversation
Adds a warning to `read_parquet` and `to_parquet` that the meaning of `engine='auto'` will change in the future (switching to using `pyarrow` if it is installed, and falling back to `fastparquet`). This warning is only raised for users that have both libraries installed and have `engine='auto'`. Users without both libraries installed or that already specify an engine will not see this warning.
|
Should we perhaps come to a consensus in #8900 , or at least take a vote before considering this? |
|
Definitely. I'm just putting this up to have something concrete to talk about. |
|
OK, got it. You may want to mark this WIP, although the comments we just added should make that clear for any other reviewer than happens to glance here. |
|
I've updated this PR to fix the failing test - I believe (based on the conversation in #8900) that this should be good to merge after tests pass. cc @bryanwweber for a quick review to make sure I didn't bungle anything. |
|
Hey, I want to raise a slight concern here (sorry for not doing it earlier). For a user who doesn't care at all about this, they're going to get a confusing warning. Are we ok with that? My guess is that we'll cause more stress/confusion by raising a warning here than by just silently changing things. |
|
If fastparquet and pyarrow were known to produce different results or something I would totally get wanting to emit something, but they should be drop-in replacements. This seems like being a little bit too loud to me. Happy to be overruled though. |
This warning is only raised for users that have both |
I think the standard Anaconda installer has both |
bryanwweber
left a comment
There was a problem hiding this comment.
LGTM, just a small style nit, thanks @jcrist!
Hmmm, ok. What do you think of the current warning logic Martin? I like it for its simplicity, but if this is likely to be noisy for naive users then we can revisit it. A few other options:
If no preference, then I'm likely to keep what we currently have. |
Yeah. This is my concern. The naive user probably isn't well served by this choice. |
This would be great, but is it tractable? Otherwise, I could go either way between no warning and warn-on-write. |
|
More broadly, as we change defaults here will we provide warnings? My gut reaction is that we shouldn't. There are probably several changes of this magnitude or greater. |
We'd just check if there are any extra kwargs passed to
I think that in cases where we can warn and not have it be obnoxious we should. I thought this was a case where the warning would be unlikely, but it turns out it isn't. Generally I like to let users whose code may break know beforehand that something big is changing. |
We now warn if: - `engine='auto'` - Both `pyarrow` and `fastparquet` are installed - Backend specific options are provided
|
Updated. We now only warn if all of the following are true:
|
|
+1
…On Wed, Apr 13, 2022 at 3:33 PM Jim Crist-Harif ***@***.***> wrote:
Updated. We now only warn if all of the following are true:
- engine='auto'
- Both pyarrow and fastparquet are installed
- Backend specific options are provided
—
Reply to this email directly, view it on GitHub
<#8907 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTAJC3E5AUQJJMQMGRLVE4VQ3ANCNFSM5S5VJLQA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Adds a warning to
read_parquetandto_parquetthat the meaning ofengine='auto'will change in the future (switching to usingpyarrowif it is installed, and falling back to
fastparquet). This warning isonly raised for users that have both libraries installed and have
engine='auto'. Users without both libraries installed or that alreadyspecify an engine will not see this warning.
First part of #8900.