Adjust the Parquet engine classes to allow more easily subclassing#6211
Adjust the Parquet engine classes to allow more easily subclassing#6211martindurant merged 4 commits intodask:masterfrom
Conversation
|
Background for this is that I need a place to put a hook to deal with data that has weird fat fingered parquet datetimes which cause issues when converting into pandas. Turns out that Having some other similar splitting points in some of the other larger functions might make them a bit less scary |
26583f9 to
29bd77f
Compare
|
Can you add a test specifically with your "fat-fingered" timestamp that would have caused pandas overflow? That is your specific use-case here, so it also serves to show why this change is useful. |
without code duplication. Add passthough for arrow_to_pandas options.
29bd77f to
b9db240
Compare
|
Given that these methods are only called explicitly from our own code, I have no problem with these changes. @rjzamora , do you have any objections or thoughts? |
|
Thanks for contributing here @mariusvniekerk ! Only had a chance to take a quick look on my phone, but I suspect the changes are fine. Since we do use these methods in dask_cudf, I would like to double check that this doesn't break anything there and report back. |
|
Thanks @rjzamora , This should also allow you to be more specific with other fancy stuff you want to do in dask_cudf. |
rjzamora
left a comment
There was a problem hiding this comment.
These changes seem reasonable to me - Thanks @mariusvniekerk
Reduces code duplication when subclassing and provides a hook for adjusting arrow / pandas conversion behavior more precisely
Add passthough for arrow_to_pandas options.
black dask/flake8 dask