[DataFrame] Implement Inter-DataFrame operations#1937
[DataFrame] Implement Inter-DataFrame operations#1937robertnishihara merged 22 commits intoray-project:masterfrom
Conversation
|
Test PASSed. |
python/ray/dataframe/dataframe.py
Outdated
There was a problem hiding this comment.
Please note here (and everywhere else) that this can cause implicit serialization issues if other is a series, for the eventual refactor
python/ray/dataframe/dataframe.py
Outdated
python/ray/dataframe/dataframe.py
Outdated
There was a problem hiding this comment.
Might be worth putting a note here (like you did in the join code) for the future to join on metadatas, enabling you to pass metadatas below.
python/ray/dataframe/dataframe.py
Outdated
There was a problem hiding this comment.
If it's non-list-like (scalar), probably best to perform the action on block partitions (a la applymap)
|
Didn't get to look too deep overall, but I'll take another look later on. One thing I'll note for all the math functions though is that you can reduce the amount of copied code by possibly having one math archetype (similar to |
|
Test FAILed. |
|
Test FAILed. |
|
Test PASSed. |
2deb1d8 to
71ab6ad
Compare
|
Test PASSed. |
|
Test FAILed. |
|
Test PASSed. |
|
Test PASSed. |
|
Test PASSed. |
|
Test PASSed. |
|
Test PASSed. |
|
Test PASSed. |
|
Test PASSed. |
|
Test PASSed. |
|
Test FAILed. |
|
Jenkins, retest this please. |
|
Test PASSed. |
|
Test PASSed. |
python/ray/dataframe/utils.py
Outdated
|
|
||
|
|
||
| @ray.remote | ||
| def co_op_helper(func, left_columns, right_columns, left_df_len, *zipped): |
There was a problem hiding this comment.
Is co short for "column"? May be worth clarifying this. This function could probably use a docstring.
Also how do you determine which function names in this file should start with _?
There was a problem hiding this comment.
co op is short for copartition operation. I will clarify. The note about the leading underscore is something I have been wanting to clean up. This should lead with an underscore.
python/ray/dataframe/dataframe.py
Outdated
| else: | ||
| return self._single_df_op_helper( | ||
| lambda df: df.eq(other, axis, level), | ||
| other, axis, level) |
There was a problem hiding this comment.
instead of this if statement, should we just use _iter_and_single_df_op_helper?
python/ray/dataframe/dataframe.py
Outdated
| else: | ||
| return self._single_df_op_helper( | ||
| lambda df: df.ge(other, axis, level), | ||
| other, axis, level) |
There was a problem hiding this comment.
instead of this if statement, should we just use _iter_and_single_df_op_helper?
python/ray/dataframe/dataframe.py
Outdated
| else: | ||
| return self._single_df_op_helper( | ||
| lambda df: df.gt(other, axis, level), | ||
| other, axis, level) |
There was a problem hiding this comment.
instead of this if statement, should we just use _iter_and_single_df_op_helper?
There was a problem hiding this comment.
The same question applies in a few more places.
|
|
||
| Returns: | ||
| A new DataFrame filled with Booleans. | ||
| """ |
There was a problem hiding this comment.
General question. I thought we were inheriting docstrings from pandas. Does that mean that these docstrings are redundant?
There was a problem hiding this comment.
True, these docs are internal for us. Ideally we wouldn't have to go to the Pandas docs each time we want to look at a method.
| "To contribute to Pandas on Ray, please visit " | ||
| "github.com/ray-project/ray.") | ||
|
|
||
| def _copartition(self, other, new_index): |
There was a problem hiding this comment.
This method repartitions the two DFs so that they have the same partitioning?
There was a problem hiding this comment.
Yes, based on the index. I will add more detailed notes here.
|
Test FAILed. |
|
Test PASSed. |
|
Could you fix the linting errors? |
|
Test FAILed. |
|
Jenkins, retest this please. |
|
Test PASSed. |
* 'master' of https://github.com/ray-project/ray: [rllib] Fix broken link in docs (ray-project#1967) [DataFrame] Sample implement (ray-project#1954) [DataFrame] Implement Inter-DataFrame operations (ray-project#1937) remove UniqueIDHasher (ray-project#1957) [rllib] Add DDPG documentation, rename DDPG2 <=> DDPG (ray-project#1946) updates (ray-project#1958) Pin Cython in autoscaler development example. (ray-project#1951) Incorporate C++ Buffer management and Seal global threadpool fix from arrow (ray-project#1950) [XRay] Add consistency check for protocol between node_manager and local_scheduler_client (ray-project#1944) Remove smart_open install. (ray-project#1943) [DataFrame] Fully implement append, concat and join (ray-project#1932) [DataFrame] Fix for __getitem__ string indexing (ray-project#1939) [DataFrame] Implementing write methods (ray-project#1918) [rllib] arr[end] was excluded when end is not None (ray-project#1931) [DataFrame] Implementing API correct groupby with aggregation methods (ray-project#1914)
Implements the inter-DataFrame and scalar operations:
add,__add__radd,__radd____iadd__sub,__sub__,subtractrsub,__rsub____isub__mul,__mul__,multiplyrmul,__rmul__div,__div__,dividefloordiv,__floordiv__rfloordiv,__rfloordiv__ifloordiv__truediv,__truediv__rtruediv,__rtruediv____itruediv__mod,__mod__rmod,__rmod____imod__pow,__pow__rpow,__rpow__ipow__Depends on #1932, don't merge until after that is merged.
Edit: Also add comparison methods:
ge,__ge__gt,__gt__le,__le__lt,__lt__eq,__eq__ne,__ne__