[DataFrame] Encapsulate index and lengths into separate class#1849
[DataFrame] Encapsulate index and lengths into separate class#1849devin-petersohn merged 7 commits intoray-project:masterfrom
Conversation
* added skeleton for index_df.py * initial impl index_df * separate out partition and non-partition impls * add len function * drop returns index_df slice of dropped indices * housecleaning
|
Test FAILed. |
|
Test PASSed. |
|
Test PASSed. |
devin-petersohn
left a comment
There was a problem hiding this comment.
This is great! I would add a way to pass in the _IndexMetadata objects to the constructor so we can avoid some of the latency of making duplicate objects? With this, we will likely need a copy method so we don't make changes inplace on the wrong one.
python/ray/dataframe/dataframe.py
Outdated
| renamed._row_index.rename_axis(mapper, axis=axis, copy=copy, | ||
| inplace=True) | ||
| renamed.index.name = mapper | ||
| # renamed._row_metadata.rename_axis(mapper, axis=axis, copy=copy, |
There was a problem hiding this comment.
Did you want this left in for some reason?
python/ray/dataframe/dataframe.py
Outdated
| else: | ||
| renamed._row_index.set_names(name) | ||
| renamed.index.set_names(name) | ||
| # renamed._row_metadata.set_names(name) |
There was a problem hiding this comment.
Same question as above, is this needed?
python/ray/dataframe/dataframe.py
Outdated
|
|
||
| # see if we can slice the rows | ||
| indexer = convert_to_index_sliceable(self._row_index, key) | ||
| # NOTE(patyang): ????? |
There was a problem hiding this comment.
I don't understand the note.
|
Test FAILed. |
|
Test PASSed. |
|
Test PASSed. |
|
Thanks @Veryku |
* master: (56 commits) [xray] Turn on flushing to the GCS for the lineage cache (ray-project#1907) Single Big Object Parallel Transfer. (ray-project#1827) Remove num_threads as a parameter. (ray-project#1891) Adds Valgrind tests for multi-threaded object manager. (ray-project#1890) Pin cython version in docker base dependencies file. (ray-project#1898) Update arrow to efficiently serialize more types of numpy arrays. (ray-project#1889) updates (ray-project#1896) [DataFrame] Inherit documentation from Pandas (ray-project#1727) Update arrow and parquet-cpp. (ray-project#1875) raylet command line resource configuration plumbing (ray-project#1882) use raylet for remote ray nodes (ray-project#1880) [rllib] Propagate dim option to deepmind wrappers (ray-project#1876) [RLLib] DDPG (ray-project#1685) Lint Python files with Yapf (ray-project#1872) [DataFrame] Fixed repr, info, and memory_usage (ray-project#1874) Fix getattr compat (ray-project#1871) check if arrow build dir exists (ray-project#1863) [DataFrame] Encapsulate index and lengths into separate class (ray-project#1849) [DataFrame] Implemented __getattr__ (ray-project#1753) Add better analytics to docs (ray-project#1854) ... # Conflicts: # python/ray/rllib/__init__.py # python/setup.py
What do these changes do?
This PR migrates the handling of all indexing and length handling for dataframes to a separate class for encapsulation.
Related issue number
N/A