Skip to content

[DataFrame] Encapsulate index and lengths into separate class#1849

Merged
devin-petersohn merged 7 commits intoray-project:masterfrom
p-yang:index_integrate
Apr 10, 2018
Merged

[DataFrame] Encapsulate index and lengths into separate class#1849
devin-petersohn merged 7 commits intoray-project:masterfrom
p-yang:index_integrate

Conversation

@p-yang
Copy link
Copy Markdown
Contributor

@p-yang p-yang commented Apr 7, 2018

What do these changes do?

This PR migrates the handling of all indexing and length handling for dataframes to a separate class for encapsulation.

Related issue number

N/A

p-yang added 2 commits April 7, 2018 16:02
* added skeleton for index_df.py

* initial impl index_df

* separate out partition and non-partition impls

* add len function

* drop returns index_df slice of dropped indices

* housecleaning
@p-yang p-yang changed the title Index integrate [DataFrame] Index integrate Apr 7, 2018
@p-yang p-yang changed the title [DataFrame] Index integrate [DataFrame] Encapsulate index and lengths into separate class Apr 7, 2018
@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4727/
Test FAILed.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4726/
Test PASSed.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4729/
Test PASSed.

Copy link
Copy Markdown
Member

@devin-petersohn devin-petersohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I would add a way to pass in the _IndexMetadata objects to the constructor so we can avoid some of the latency of making duplicate objects? With this, we will likely need a copy method so we don't make changes inplace on the wrong one.

renamed._row_index.rename_axis(mapper, axis=axis, copy=copy,
inplace=True)
renamed.index.name = mapper
# renamed._row_metadata.rename_axis(mapper, axis=axis, copy=copy,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you want this left in for some reason?

else:
renamed._row_index.set_names(name)
renamed.index.set_names(name)
# renamed._row_metadata.set_names(name)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above, is this needed?


# see if we can slice the rows
indexer = convert_to_index_sliceable(self._row_index, key)
# NOTE(patyang): ?????
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the note.

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4782/
Test FAILed.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4783/
Test PASSed.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4784/
Test PASSed.

@devin-petersohn devin-petersohn merged commit 521b549 into ray-project:master Apr 10, 2018
@devin-petersohn
Copy link
Copy Markdown
Member

Thanks @Veryku

royf added a commit to royf/ray that referenced this pull request Apr 22, 2018
* master: (56 commits)
  [xray] Turn on flushing to the GCS for the lineage cache (ray-project#1907)
  Single Big Object Parallel Transfer. (ray-project#1827)
  Remove num_threads as a parameter. (ray-project#1891)
  Adds Valgrind tests for multi-threaded object manager. (ray-project#1890)
  Pin cython version in docker base dependencies file. (ray-project#1898)
  Update arrow to efficiently serialize more types of numpy arrays. (ray-project#1889)
  updates (ray-project#1896)
  [DataFrame] Inherit documentation from Pandas (ray-project#1727)
  Update arrow and parquet-cpp. (ray-project#1875)
  raylet command line resource configuration plumbing (ray-project#1882)
  use raylet for remote ray nodes (ray-project#1880)
  [rllib] Propagate dim option to deepmind wrappers (ray-project#1876)
  [RLLib] DDPG (ray-project#1685)
  Lint Python files with Yapf (ray-project#1872)
  [DataFrame] Fixed repr, info, and memory_usage (ray-project#1874)
  Fix getattr compat (ray-project#1871)
  check if arrow build dir exists (ray-project#1863)
  [DataFrame] Encapsulate index and lengths into separate class (ray-project#1849)
  [DataFrame] Implemented __getattr__ (ray-project#1753)
  Add better analytics to docs (ray-project#1854)
  ...

# Conflicts:
#	python/ray/rllib/__init__.py
#	python/setup.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants