Skip to content

Use NullBufferBuilder instead of BooleanBufferBuilder for creating Null masks #14115

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

DataFusion uses BooleanBuffer in several places to create Null buffers. I thought there was a clever optimization for handling data with no nulls which I filed in arrow-rs

However, @tustvold pointed out that NullBufferBuilder has exactly the optimization described:

I looked at the DataFusion codebase and found we have several examples of using BooleanBufferBuilder rather than NullBufferBuilder:

https://github.com/search?q=repo%3Aapache%2Fdatafusion%20BooleanBufferBuilder&type=code

It even has a reimplementation of the NullBufferBuilder optimization 🤦 :

/// Builder for an (optional) null mask
///
/// Optimized for avoid creating the bitmask when all values are non-null
#[derive(Debug)]
pub(crate) enum MaybeNullBufferBuilder {
/// seen `row_count` rows but no nulls yet
NoNulls { row_count: usize },
/// have at least one null value
///
/// Note this is an Arrow *VALIDITY* buffer (so it is false for nulls, true
/// for non-nulls)
Nulls(BooleanBufferBuilder),
}

Describe the solution you'd like

I would like to switch DataFusion to using NullBufferBuilder instead of BooleanBufferBuilder as much as possible

Note that until the following PR is availble, this will involve adding an explicit dependency on arrow_buffer

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions