Skip to content

Avoid deep copying Utf8View array buffers in ScalarValue::to_array_of_size() #18159

@2010YOUY01

Description

@2010YOUY01

Describe the bug

The ScalarValue::to_array_of_size() API is repeating a scalar value k times and convert it into a array of length k. For ScalarValue::List type with inner list type Utf8View, now it's doing deep copy for the Utf8View buffers.

Note copying the inner Utf8View array is inevitable due to the ListArray encoding, but deep copying the payload buffers can be avoided, only the views should be copied.

See the List implementation in

pub fn to_array_of_size(&self, size: usize) -> Result<ArrayRef> {

This caused performance regression in #18070

To Reproduce

It can be verified easier by checking the implementation.

Expected behavior

Doing shallow copy on List elements

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingperformanceMake DataFusion faster

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions