[Data] Add disable Block Shaping option to BlockOutputBuffer#58757
[Data] Add disable Block Shaping option to BlockOutputBuffer#58757bveeramani merged 2 commits intomasterfrom
Conversation
Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a disable_block_shaping option to BlockOutputBuffer, allowing users to bypass the automatic block sizing logic. This is a useful feature for scenarios where manual control over block structure is desired. The implementation is clean, involving updates to OutputBlockSizeOption and BlockOutputBuffer to respect the new flag. The related refactoring in map_transformer.py to remove the now-obsolete _can_skip_block_sizing method is a good simplification. The accompanying tests are thorough, covering various configurations and ensuring the new functionality works as expected. I've suggested a couple of minor refactorings to improve code conciseness by removing some redundant checks. Overall, this is a solid contribution.
| def _exceeded_buffer_row_limit(self) -> bool: | ||
| if self._output_block_size_option.disable_block_shaping: | ||
| return False | ||
|
|
||
| return ( | ||
| self._max_num_rows_per_block() is not None | ||
| and self._buffer.num_rows() > self._max_num_rows_per_block() | ||
| ) |
There was a problem hiding this comment.
The if self._output_block_size_option.disable_block_shaping: check is redundant. The _max_num_rows_per_block() method already returns None when disable_block_shaping is True, which causes the expression self._max_num_rows_per_block() is not None to evaluate to False. This makes the entire return statement False, achieving the same result as the explicit if block. Removing the explicit check simplifies the code.
def _exceeded_buffer_row_limit(self) -> bool:
return (
self._max_num_rows_per_block() is not None
and self._buffer.num_rows() > self._max_num_rows_per_block()
)| def _exceeded_buffer_size_limit(self) -> bool: | ||
| if self._output_block_size_option.disable_block_shaping: | ||
| return False | ||
|
|
||
| return ( | ||
| self._max_bytes_per_block() is not None | ||
| and self._buffer.get_estimated_memory_usage() > self._max_bytes_per_block() | ||
| ) |
There was a problem hiding this comment.
Similar to _exceeded_buffer_row_limit, the if self._output_block_size_option.disable_block_shaping: check here is redundant. The _max_bytes_per_block() method returns None when disable_block_shaping is True, which will cause this method to correctly return False. Removing the explicit check will make the code more concise.
def _exceeded_buffer_size_limit(self) -> bool:
return (
self._max_bytes_per_block() is not None
and self._buffer.get_estimated_memory_usage() > self._max_bytes_per_block()
)…ject#58757) ## Description In addition to Block shaping by Block Size and Num Rows, add an option to skip Block Shaping altogether in BlockOutputBuffer. Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>
…ject#58757) ## Description In addition to Block shaping by Block Size and Num Rows, add an option to skip Block Shaping altogether in BlockOutputBuffer. Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com> Signed-off-by: YK <1811651+ykdojo@users.noreply.github.com>
…ject#58757) ## Description In addition to Block shaping by Block Size and Num Rows, add an option to skip Block Shaping altogether in BlockOutputBuffer. Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>
…ject#58757) ## Description In addition to Block shaping by Block Size and Num Rows, add an option to skip Block Shaping altogether in BlockOutputBuffer. Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com> Signed-off-by: peterxcli <peterxcli@gmail.com>
Description
[Data] Add disable Block Shaping option to BlockOutputBuffer
In addition to Block shaping by Block Size and Num Rows, add an option to skip Block Shaping altogether in BlockOutputBuffer.
Related issues
Additional information