[DataFrame] Refactor __delitem__#2080
Conversation
| # np.int64 or pd.Series | ||
| col_parts_to_del = \ | ||
| pd.Series(self._col_metadata[key, 'partition']).unique() | ||
| pd.Series(del_df['partition'].copy()).unique() |
There was a problem hiding this comment.
Copy is necessary because unique will mutate the underlying data (🤔). If the underlying data comes from arrow sealed object, a "Buffer is read-only" exception will be raised.
There was a problem hiding this comment.
A lot of the caching and memoizing that Pandas does mutates the objects. In the future, hopefully we can get rid of the need for the copy.
| self._coord_df.loc[partition_mask, | ||
| 'index_within_partition'] = [ | ||
| p for p in range(sum(partition_mask))] | ||
| 'index_within_partition'] = np.arange( |
There was a problem hiding this comment.
The previous method will prevent consecutive del calls:
del df['col1']
del df['col2'] # Fail because the col_metadata contains float
It seems setting with loc requires extra caution here. That's why I added type casting.
|
Test PASSed. |
| # np.int64 or pd.Series | ||
| col_parts_to_del = \ | ||
| pd.Series(self._col_metadata[key, 'partition']).unique() | ||
| pd.Series(del_df['partition'].copy()).unique() |
There was a problem hiding this comment.
A lot of the caching and memoizing that Pandas does mutates the objects. In the future, hopefully we can get rid of the need for the copy.
|
Merged, thanks @simon-mo! |
* master: (22 commits) [xray] Fix bug in updating actor execution dependencies (ray-project#2064) [DataFrame] Refactor __delitem__ (ray-project#2080) [xray] Better error messaging when pulling from self. (ray-project#2068) Use source code in hash where possible (fix ray-project#2089) (ray-project#2090) Functions for flushing done tasks and evicted objects. (ray-project#2033) Fix compilation error for RAY_USE_NEW_GCS with latest clang. (ray-project#2086) [xray] Corrects Error Handling During Push and Pull. (ray-project#2059) [xray] Sophisticated task dependency management (ray-project#2035) Support calling positional arguments by keyword (fix ray-project#998) (ray-project#2081) [DataFrame] Improve performance of iteration methods (ray-project#2026) [DataFrame] Implement to_csv (ray-project#2014) [xray] Lineage cache only requests notifications about remote parent tasks (ray-project#2066) [rllib] Add magic methods for rollouts (ray-project#2024) [DataFrame] Allows DataFrame constructor to take in another DataFrame (ray-project#2072) Pin Pandas version for Travis to 0.22 (ray-project#2075) Fix python linting (ray-project#2076) [xray] Fix GCS table prefixes (ray-project#2065) Some tests for _submit API. (ray-project#2062) [rllib] Queue lib for python 2.7 (ray-project#2057) [autoscaler] Remove faulty assert that breaks during downscaling, pull configs from env (ray-project#2006) ...
* master: (24 commits) Performance fix (ray-project#2110) Use flake8-comprehensions (ray-project#1976) Improve error message printing and suppression. (ray-project#2104) [rllib] [doc] Broken link in ddpg doc YAPF, take 3 (ray-project#2098) [rllib] rename async -> _async (ray-project#2097) fix unused lambda capture (ray-project#2102) [xray] Use pubsub instead of timeout for ObjectManager Pull. (ray-project#2079) [DataFrame] Update _inherit_docstrings (ray-project#2085) [JavaWorker] Changes to the build system for support java worker (ray-project#2092) [xray] Fix bug in updating actor execution dependencies (ray-project#2064) [DataFrame] Refactor __delitem__ (ray-project#2080) [xray] Better error messaging when pulling from self. (ray-project#2068) Use source code in hash where possible (fix ray-project#2089) (ray-project#2090) Functions for flushing done tasks and evicted objects. (ray-project#2033) Fix compilation error for RAY_USE_NEW_GCS with latest clang. (ray-project#2086) [xray] Corrects Error Handling During Push and Pull. (ray-project#2059) [xray] Sophisticated task dependency management (ray-project#2035) Support calling positional arguments by keyword (fix ray-project#998) (ray-project#2081) [DataFrame] Improve performance of iteration methods (ray-project#2026) ...
* fix-a3c-torch: (37 commits) Add missing channel major Use correct filter size Add TODO Fix shape errors fmt Performance fix (ray-project#2110) Use flake8-comprehensions (ray-project#1976) Improve error message printing and suppression. (ray-project#2104) [rllib] [doc] Broken link in ddpg doc YAPF, take 3 (ray-project#2098) [rllib] rename async -> _async (ray-project#2097) fix unused lambda capture (ray-project#2102) [xray] Use pubsub instead of timeout for ObjectManager Pull. (ray-project#2079) [DataFrame] Update _inherit_docstrings (ray-project#2085) [JavaWorker] Changes to the build system for support java worker (ray-project#2092) [xray] Fix bug in updating actor execution dependencies (ray-project#2064) [DataFrame] Refactor __delitem__ (ray-project#2080) [xray] Better error messaging when pulling from self. (ray-project#2068) Use source code in hash where possible (fix ray-project#2089) (ray-project#2090) Functions for flushing done tasks and evicted objects. (ray-project#2033) ...
What do these changes do?
Refactor
__delitem__for it to operate on block partitions only for efficiency. AlsoRelated issue number
Address issue #2027