[DataFrame] Update architecture to be more flexible and performant#1821
[DataFrame] Update architecture to be more flexible and performant#1821robertnishihara merged 63 commits intoray-project:masterfrom
Conversation
Setting up pandas concordant constructor adding column partitions to Dataframe architecture using copy to create temp ray df resolve merge issues renaming rebased variables to new names rename cols, rows to _col_partitions and _row_partitions Adding in the col_partitions and row_partitions properties Resolving some flake8 issues Shuffle on either axis added WIP comments to shuffle.py Modifications to shuffle actor drop on shuffle axis implemented rows_to_cols WIP implement transpose using col and row partitions Rebuild columns, implement index calculation * zipped index calculations * rename _index and _length to _row* * add _col_index and _col_length (currently broken) * pass index & columns in transpose resolving rebase from ray/master rearranging utils to match ray/master reimplement groupby and update_inplace implemented _rebuild_rows fix import issue update map functions to new architecture cast to df in sum and fix empty partition index adhoc fix for sum
any/all impl Resolving additional functions to architecture change any/all index join impl Implement __delitem__ WIP (untested) fix for when columns are non-duplicate replace columns with _col_index implemented inplace implemented insert, handtested uncomment tests
* Adding fixes * Adding iterrows update
* Fix items * Removing debug code
* changes to min/max dataframes functions * max/min now return a Series - fixed tests to check equality in pandas series objects * added error checking for axis in min and max * updated error checking for axis in min/max
* Update __delitem__ with row/col_partitions * Fix __neg__
…l(), and count() (ray-project#8) * cleanup and fixing eval * fixed eval and ffill dataframe functions * changing _col_index entries during eval * small change to documentation for applymap()
|
Test PASSed. |
python/ray/dataframe/dataframe.py
Outdated
| _row_partitions = property(_get_row_partitions, _set_row_partitions) | ||
|
|
||
| def _get_col_partitions(self): | ||
| @ray.remote |
There was a problem hiding this comment.
avoid defining remote function here
python/ray/dataframe/dataframe.py
Outdated
| @@ -401,7 +661,12 @@ def isnull(self): | |||
| True: cell contains null. | |||
| False: otherwise. | |||
| """ | |||
| return self._map_partitions(lambda df: df.isnull) | |||
| new_blk_partitions = np.array([_map_partitions( | |||
There was a problem hiding this comment.
blk -> block everywhere would be preferable
| zero_filled = test_data.tsframe.fillna(0) | ||
| ray_df = from_pandas(test_data.tsframe, num_partitions).fillna(0) | ||
| print(ray_df) | ||
| print(zero_filled) |
python/ray/dataframe/dataframe.py
Outdated
|
|
||
| if index is not None: | ||
| self.index = index | ||
| dtype : Data type to force. Only a single dtype is allowed. |
python/ray/dataframe/dataframe.py
Outdated
| If axis=None or axis=0, this call applies df.all(axis=1) | ||
| to the transpose of df. | ||
| If axis=None or axis=0, this call applies on the column partitions, | ||
| otherwise operates on row partitions |
There was a problem hiding this comment.
indent by 4 spaces
python/ray/dataframe/dataframe.py
Outdated
| """Updates the current DataFrame inplace | ||
| Note: | ||
| If `columns` or `index` are not supplied, they will revert to | ||
| default columns or index respectively, as this function does not |
There was a problem hiding this comment.
indent by 4 spaces
|
Test PASSed. |
robertnishihara
left a comment
There was a problem hiding this comment.
Thanks for addressing the comments. Currently failing linting, but looks good to me.
|
Test PASSed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test PASSed. |
|
Test PASSed. |
|
Test PASSed. |
What do these changes do?
Changed the underlying architecture to be partitioned based on blocks and allow for data to be accessed by either row or column. Also updated many method implementations that needed minor changes to be correct in the new architecture.
Related issue number