[DataFrame] Fix blocking issue on _IndexMetadata passing#1965
[DataFrame] Fix blocking issue on _IndexMetadata passing#1965devin-petersohn merged 8 commits intoray-project:masterfrom
Conversation
|
Test PASSed. |
|
Test FAILed. |
devin-petersohn
left a comment
There was a problem hiding this comment.
It would be great to get numbers on the performance difference here.
python/ray/dataframe/dataframe.py
Outdated
There was a problem hiding this comment.
Are you planning to do this on this PR?
python/ray/dataframe/dataframe.py
Outdated
There was a problem hiding this comment.
What is the bp in bp_length?
python/ray/dataframe/dataframe.py
Outdated
There was a problem hiding this comment.
can we get rid of this line since we only need the metadata now?
There was a problem hiding this comment.
We can't in this case, as discussed today. If we don't pass the columns then the metadata of the new dataframe won't have the column changes reflected. Therefore, we need to either copy the metadata and modify the copy and push the copy, or pass the new columns such that the constructor modifies the metadata object copy on its end.
python/ray/dataframe/dataframe.py
Outdated
There was a problem hiding this comment.
Same as above, can we get rid of this line?
|
Follow-up question: Can we make it so that |
|
Reference on performance numbers (these are off the top of my head, and testing on On
|
|
Test PASSed. |
|
Test PASSed. |
devin-petersohn
left a comment
There was a problem hiding this comment.
These changes look really great! Just a couple of minor quick comments.
python/ray/dataframe/utils.py
Outdated
There was a problem hiding this comment.
Add doc """Compute widths for each partition."""
There was a problem hiding this comment.
Are you planning to add this fix in this PR?
There was a problem hiding this comment.
This was actually silently fixed by the other changes, setting a new DF will implicitly change the index on the metadata object as well. Comment removed.
There was a problem hiding this comment.
Could you write a comment about how to use __getitem__? We have had people using it in incorrect ways and trying to make it work for them, so this way hopefully we can avoid that.
|
Jenkins, retest this please |
|
Test PASSed. |
|
Merged, thanks @Veryku |
* magic-methods: fmt Fix IndentationError Write magic methods for SampleBatch/PartialRollout Clean up syntax for supported Python versions. (ray-project#1963) [DataFrame] Implements mode, to_datetime, and get_dummies (ray-project#1956) [DataFrame] Fix dtypes (ray-project#1930) keep_dims -> keepdims (ray-project#1980) add pthread linking (ray-project#1986) [DataFrame] Add layer of abstraction to allow OID instantiation (ray-project#1984) [DataFrame] Fix blocking issue on _IndexMetadata passing (ray-project#1965)
What do these changes do?
Fixes a performance issue on passing
_IndexMetadataobjects for applymap and related functions that don't mutate indexes.