ARROW-1883: [Python] Fix handling of metadata in to_pandas when not all columns are present#1386
Conversation
…ll columns are present
|
The travis failure looks unrelated. |
cpcloud
left a comment
There was a problem hiding this comment.
@jorisvandenbossche Thanks for reporting and fixing! Couple of minor things to address before merging.
python/pyarrow/pandas_compat.py
Outdated
| raw_name = col_meta['name'] | ||
| if i >= n_columns: | ||
| # index columns | ||
| raw_name = pandas_metadata['index_columns'][i - n_columns] |
There was a problem hiding this comment.
can you pull pandas_metadata['index_columns'] out into a variable?
| assert pa.Array.from_pandas(expected, | ||
| type=pa.list_(t())).equals(result) | ||
|
|
||
| def test_table_column_subset_metadata(self): |
There was a problem hiding this comment.
Can you add 2 tests: 1 against a datetime index and 1 against another index that isn't datetime? We need to make sure the i >= ncolumns condition is tested.
You'll need to pass preserve_index=True to pa.Table.from_pandas().
There was a problem hiding this comment.
Yes, I was actually playing with such frames interactively testing it out, but forgot to add to the actual tests.
preserve_index=True is the default, but I can add it for explicitness if you want
|
Thanks for the review, added some more tests. |
|
How does this patch align with #1397? |
|
@wesm I believe @jorisvandenbossche said he was going to rebase on top of #1397 when that's merged |
|
@wesm yes, and I have time to that today (or tomorrow) once the other is merged |
a800b5a to
ea891b2
Compare
|
I updated this now #1397 is merged. But, I could not really simplify the code to just use the new |
|
For reference, the test file is written as: (similar code to construct the dataframe is in the test to create the expected result) |
wesm
left a comment
There was a problem hiding this comment.
+1. We should probably remove the backwards compatibility code after some statute of limitations has passed (1 or 2 major releases) so we aren't maintaining it forever. Having this release available will give people the ability to "fix" their files if needed
To make this cleaner, I could also move all back compat-handling code to a function that takes a metadata object and rewrites it in a new form, so the rest of the code doesn't need to care about it, and it is easier to remove later on. If you would be interested in that, I can do that here, or in a new PR (as that is not release critical) |
|
No need to rock the boat here, I will merge this once the Appveyor build runs |
|
thanks @jorisvandenbossche!! |
|
You're welcome! And happy to have a first patch to arrow :) |
This closes ARROW-1883.
So basically what I did in
_add_any_metadatawas replacingcol = table[i]with:to check that the column is actually present in the schema. However, that involved some more code to get to
raw_name(the name how the column is present in the schema), as this does not always match the name inpandas_metadata['column'][..]['name']. Not sure if there is a better way to get that name.(or if it would be better to filter
pandas_metadataearlier on, instead of checking when actually trying to process the metadata of that column)