[DataFrame] Implemented __setitem__, select_dtypes, and astype#1941
[DataFrame] Implemented __setitem__, select_dtypes, and astype#1941devin-petersohn merged 9 commits intoray-project:masterfrom
Conversation
|
Test PASSed. |
|
Test PASSed. |
|
Test PASSed. |
|
Test PASSed. |
python/ray/dataframe/dataframe.py
Outdated
| "github.com/ray-project/ray.") | ||
| if not columns: | ||
| return DataFrame() | ||
| col_idx = [self.columns.get_loc(columns[i]) |
There was a problem hiding this comment.
This needs to handle cases where some of the passed in values are not in self.columns. Pandas fills this case with NaNs.
python/ray/dataframe/dataframe.py
Outdated
| self._col_partitions) | ||
| columns = self.columns | ||
| index = self.index[key] | ||
| index = self._col_metadata[key].index |
python/ray/dataframe/dataframe.py
Outdated
| raise NotImplementedError( | ||
| "To contribute to Pandas on Ray, please visit " | ||
| "github.com/ray-project/ray.") | ||
| if errors == 'raise': |
There was a problem hiding this comment.
Should not ignore errors when errors == 'raise'
There was a problem hiding this comment.
On errors == 'raise', these exceptions should be raised. On errors == 'ignore', they should be suppressed and self should be returned as per the pandas documentation. Currently errors are suppressed in this if case.
python/ray/dataframe/dataframe.py
Outdated
| "github.com/ray-project/ray.") | ||
| if errors == 'raise': | ||
| try: | ||
| pd.DataFrame().astype(dtype) |
There was a problem hiding this comment.
This check doesn't work on empty dataframes
python/ray/dataframe/dataframe.py
Outdated
| raise NotImplementedError( | ||
| "To contribute to Pandas on Ray, please visit " | ||
| "github.com/ray-project/ray.") | ||
| if not columns: |
There was a problem hiding this comment.
Anytime that columns == None this will return an empty dataframe
| elif not exclude: | ||
| exclude = [] | ||
|
|
||
| sel = tuple(map(set, (include, exclude))) |
There was a problem hiding this comment.
This code (and below) is somewhat hard to read, and can be wrapped up into the if-else cases above
There was a problem hiding this comment.
The code using map in particular?
python/ray/dataframe/dataframe.py
Outdated
|
|
||
| axis = pd.DataFrame()._get_axis_number(axis) if (axis) else 0 | ||
| if axis == 1 or columns: | ||
| def row_helper(df, col_idx): |
There was a problem hiding this comment.
Since the partitions currently have RangeIndex on both rows and columns, this function won't work. Refer to utils._reindex_helper and ask @devin-petersohn for details.
There was a problem hiding this comment.
So I tried to get around that by translating the columns passed in to the helper by using the loc index of the columns desired (col_idx)
There was a problem hiding this comment.
Removed this implementation of reindex for a future PR
|
Test FAILed. |
|
Test PASSed. |
|
Test PASSed. |
|
|
||
| with pytest.raises(NotImplementedError): | ||
| ray_df.reindex() | ||
| ray_df.reindex(None) |
There was a problem hiding this comment.
Please add a sanity check test here.
There was a problem hiding this comment.
Looking through the code, was reindex dropped from this PR?
There was a problem hiding this comment.
yes, reindex did not have full functionality (axes, errors) so I took it out for the PR against master
|
|
||
| with pytest.raises(NotImplementedError): | ||
| ray_df.reindex() | ||
| ray_df.reindex(None) |
|
Test PASSed. |
|
Test PASSed. |
|
Merged, thanks @jaeminkim1324! |
* master: (21 commits) Expand local_dir in Trial init (ray-project#2013) Fixing ascii error for Python2 (ray-project#2009) [DataFrame] Implements df.update (ray-project#1997) [DataFrame] Implements df.as_matrix (ray-project#2001) [DataFrame] Implement quantile (ray-project#1992) [DataFrame] Impement sort_values and sort_index (ray-project#1977) [DataFrame] Implement rank (ray-project#1991) [DataFrame] Implemented prod, product, added test suite (ray-project#1994) [DataFrame] Implemented __setitem__, select_dtypes, and astype (ray-project#1941) [DataFrame] Implement diff (ray-project#1996) [DataFrame] Implemented nunique, skew (ray-project#1995) [DataFrame] Implements filter and dropna (ray-project#1959) [DataFrame] Implements df.pipe (ray-project#1999) [DataFrame] Apply() for Lists and Dicts (ray-project#1973) Clean up syntax for supported Python versions. (ray-project#1963) [DataFrame] Implements mode, to_datetime, and get_dummies (ray-project#1956) [DataFrame] Fix dtypes (ray-project#1930) keep_dims -> keepdims (ray-project#1980) add pthread linking (ray-project#1986) [DataFrame] Add layer of abstraction to allow OID instantiation (ray-project#1984) ...
* master: (25 commits) [DataFrame] Add direct pandas imports for MVP (ray-project#1960) Make ActorHandles pickleable, also make proper ActorHandle and ActorC… (ray-project#2007) Expand local_dir in Trial init (ray-project#2013) Fixing ascii error for Python2 (ray-project#2009) [DataFrame] Implements df.update (ray-project#1997) [DataFrame] Implements df.as_matrix (ray-project#2001) [DataFrame] Implement quantile (ray-project#1992) [DataFrame] Impement sort_values and sort_index (ray-project#1977) [DataFrame] Implement rank (ray-project#1991) [DataFrame] Implemented prod, product, added test suite (ray-project#1994) [DataFrame] Implemented __setitem__, select_dtypes, and astype (ray-project#1941) [DataFrame] Implement diff (ray-project#1996) [DataFrame] Implemented nunique, skew (ray-project#1995) [DataFrame] Implements filter and dropna (ray-project#1959) [DataFrame] Implements df.pipe (ray-project#1999) [DataFrame] Apply() for Lists and Dicts (ray-project#1973) Clean up syntax for supported Python versions. (ray-project#1963) [DataFrame] Implements mode, to_datetime, and get_dummies (ray-project#1956) [DataFrame] Fix dtypes (ray-project#1930) keep_dims -> keepdims (ray-project#1980) ...
* master: [DataFrame] Add direct pandas imports for MVP (ray-project#1960) Make ActorHandles pickleable, also make proper ActorHandle and ActorC… (ray-project#2007) Expand local_dir in Trial init (ray-project#2013) Fixing ascii error for Python2 (ray-project#2009) [DataFrame] Implements df.update (ray-project#1997) [DataFrame] Implements df.as_matrix (ray-project#2001) [DataFrame] Implement quantile (ray-project#1992) [DataFrame] Impement sort_values and sort_index (ray-project#1977) [DataFrame] Implement rank (ray-project#1991) [DataFrame] Implemented prod, product, added test suite (ray-project#1994) [DataFrame] Implemented __setitem__, select_dtypes, and astype (ray-project#1941) [DataFrame] Implement diff (ray-project#1996) [DataFrame] Implemented nunique, skew (ray-project#1995) [DataFrame] Implements filter and dropna (ray-project#1959) [DataFrame] Implements df.pipe (ray-project#1999) [DataFrame] Apply() for Lists and Dicts (ray-project#1973)
What do these changes do?
Adds functionality for astype and select_dtypes methods for Dataframes