Skip to content

[DataFrame] Implement Inter-DataFrame operations#1937

Merged
robertnishihara merged 22 commits intoray-project:masterfrom
devin-petersohn:df_inter_ops
Apr 30, 2018
Merged

[DataFrame] Implement Inter-DataFrame operations#1937
robertnishihara merged 22 commits intoray-project:masterfrom
devin-petersohn:df_inter_ops

Conversation

@devin-petersohn
Copy link
Copy Markdown
Member

@devin-petersohn devin-petersohn commented Apr 23, 2018

Implements the inter-DataFrame and scalar operations:

  • add, __add__
  • radd, __radd__
  • __iadd__
  • sub, __sub__, subtract
  • rsub, __rsub__
  • __isub__
  • mul, __mul__, multiply
  • rmul, __rmul__
  • div, __div__, divide
  • floordiv, __floordiv__
  • rfloordiv, __rfloordiv
  • __ifloordiv__
  • truediv, __truediv__
  • rtruediv, __rtruediv__
  • __itruediv__
  • mod, __mod__
  • rmod, __rmod__
  • __imod__
  • pow, __pow__
  • rpow, __rpow
  • __ipow__

Depends on #1932, don't merge until after that is merged.

Edit: Also add comparison methods:

  • ge, __ge__
  • gt, __gt__
  • le, __le__
  • lt, __lt__
  • eq, __eq__
  • ne, __ne__

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5035/
Test PASSed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note here (and everywhere else) that this can cause implicit serialization issues if other is a series, for the eventual refactor

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Multilevel" mispelling

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth putting a note here (like you did in the join code) for the future to join on metadatas, enabling you to pass metadatas below.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added below

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's non-list-like (scalar), probably best to perform the action on block partitions (a la applymap)

@p-yang
Copy link
Copy Markdown
Contributor

p-yang commented Apr 23, 2018

Didn't get to look too deep overall, but I'll take another look later on. One thing I'll note for all the math functions though is that you can reduce the amount of copied code by possibly having one math archetype (similar to _arithmetic_helper) that takes in a class function like pandas.DataFrame.add

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5039/
Test FAILed.

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5058/
Test FAILed.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5057/
Test PASSed.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5063/
Test PASSed.

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5069/
Test FAILed.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5065/
Test PASSed.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5066/
Test PASSed.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5070/
Test PASSed.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5077/
Test PASSed.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5079/
Test PASSed.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5088/
Test PASSed.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5090/
Test PASSed.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5098/
Test PASSed.

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5105/
Test FAILed.

@devin-petersohn
Copy link
Copy Markdown
Member Author

Jenkins, retest this please.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5106/
Test PASSed.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5113/
Test PASSed.



@ray.remote
def co_op_helper(func, left_columns, right_columns, left_df_len, *zipped):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is co short for "column"? May be worth clarifying this. This function could probably use a docstring.

Also how do you determine which function names in this file should start with _?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

co op is short for copartition operation. I will clarify. The note about the leading underscore is something I have been wanting to clean up. This should lead with an underscore.

else:
return self._single_df_op_helper(
lambda df: df.eq(other, axis, level),
other, axis, level)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of this if statement, should we just use _iter_and_single_df_op_helper?

else:
return self._single_df_op_helper(
lambda df: df.ge(other, axis, level),
other, axis, level)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of this if statement, should we just use _iter_and_single_df_op_helper?

else:
return self._single_df_op_helper(
lambda df: df.gt(other, axis, level),
other, axis, level)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of this if statement, should we just use _iter_and_single_df_op_helper?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same question applies in a few more places.


Returns:
A new DataFrame filled with Booleans.
"""
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General question. I thought we were inheriting docstrings from pandas. Does that mean that these docstrings are redundant?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, these docs are internal for us. Ideally we wouldn't have to go to the Pandas docs each time we want to look at a method.

"To contribute to Pandas on Ray, please visit "
"github.com/ray-project/ray.")

def _copartition(self, other, new_index):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method repartitions the two DFs so that they have the same partitioning?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, based on the index. I will add more detailed notes here.

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5115/
Test FAILed.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5116/
Test PASSed.

@robertnishihara
Copy link
Copy Markdown
Collaborator

Could you fix the linting errors?

$ flake8 --exclude=python/ray/core/src/common/flatbuffers_ep-prefix/,python/ray/core/generated/,src/common/format/,doc/source/conf.py,python/ray/cloudpickle/
./python/ray/dataframe/dataframe.py:2193:47: E128 continuation line under-indented for visual indent
./python/ray/dataframe/dataframe.py:2194:47: E128 continuation line under-indented for visual indent
./python/ray/dataframe/dataframe.py:2199:47: E128 continuation line under-indented for visual indent
./python/ray/dataframe/dataframe.py:2200:47: E128 continuation line under-indented for visual indent
./python/ray/dataframe/dataframe.py:2238:47: E128 continuation line under-indented for visual indent
./python/ray/dataframe/dataframe.py:2239:47: E128 continuation line under-indented for visual indent
./python/ray/dataframe/dataframe.py:2245:29: E127 continuation line over-indented for visual indent
./python/ray/dataframe/dataframe.py:4012:30: E127 continuation line over-indented for visual indent
./python/ray/dataframe/dataframe.py:4019:30: E127 continuation line over-indented for visual indent

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5122/
Test FAILed.

@devin-petersohn
Copy link
Copy Markdown
Member Author

Jenkins, retest this please.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5125/
Test PASSed.

@robertnishihara robertnishihara merged commit 0c477fb into ray-project:master Apr 30, 2018
royf added a commit to royf/ray that referenced this pull request Jun 22, 2018
* 'master' of https://github.com/ray-project/ray:
  [rllib] Fix broken link in docs (ray-project#1967)
  [DataFrame] Sample implement (ray-project#1954)
  [DataFrame] Implement Inter-DataFrame operations (ray-project#1937)
  remove UniqueIDHasher (ray-project#1957)
  [rllib] Add DDPG documentation, rename DDPG2 <=> DDPG (ray-project#1946)
  updates (ray-project#1958)
  Pin Cython in autoscaler development example. (ray-project#1951)
  Incorporate C++ Buffer management and Seal global threadpool fix from arrow (ray-project#1950)
  [XRay] Add consistency check for protocol between node_manager and local_scheduler_client (ray-project#1944)
  Remove smart_open install. (ray-project#1943)
  [DataFrame] Fully implement append, concat and join (ray-project#1932)
  [DataFrame] Fix for __getitem__ string indexing (ray-project#1939)
  [DataFrame] Implementing write methods (ray-project#1918)
  [rllib] arr[end] was excluded when end is not None (ray-project#1931)
  [DataFrame] Implementing API correct groupby with aggregation methods (ray-project#1914)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants