Skip to content

Use BufWriter when writing bloom filters and limit tests (#3318)#3319

Merged
alamb merged 1 commit into
apache:masterfrom
tustvold:disable-bloom-filter-roundtrip
Dec 9, 2022
Merged

Use BufWriter when writing bloom filters and limit tests (#3318)#3319
alamb merged 1 commit into
apache:masterfrom
tustvold:disable-bloom-filter-roundtrip

Conversation

@tustvold

@tustvold tustvold commented Dec 9, 2022

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #3318

Rationale for this change

The tests were taking an exceedingly long time to run, this disables them except for a few select tests that explicitly test the bloom filter.

This also adds a BufWriter when writing bloom filters, it is almost certain that this can be optimised further, but this was a quick win that took i32_column_bloom_filter from taking 17 seconds to 2. This is still terrible, but not catastrophic 😅

What changes are included in this PR?

Are there any user-facing changes?

Disable bloom filters for most tests
max_statistics_size: Option<usize>,
/// bloom filter related properties
bloom_filter_properies: Option<BloomFilterProperties>,
bloom_filter_properties: Option<BloomFilterProperties>,

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a drive by fix of a typo

@github-actions github-actions Bot added the parquet Changes to the parquet crate label Dec 9, 2022
/// Write the bloom filter data (header and then bitset) to the output
pub(crate) fn write<W: Write>(&self, mut writer: W) -> Result<(), ParquetError> {
pub(crate) fn write<W: Write>(&self, writer: W) -> Result<(), ParquetError> {
// Use a BufWriter to avoid costs of writing individual blocks

@tustvold tustvold Dec 9, 2022

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should be using BufWriter at a higher level 🤔 I guess we mostly don't run into this issue as we write large page sized blobs

@tustvold tustvold changed the title Use BufWriter when writing bloom filters (#3318) Use BufWriter when writing bloom filters and limit tests (#3318) Dec 9, 2022
@alamb alamb merged commit c215f49 into apache:master Dec 9, 2022
@alamb

alamb commented Dec 9, 2022

Copy link
Copy Markdown
Contributor

Thank you for the quick turnaround

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parquet arrow-writer tests take very long to run (over 60 seconds)

3 participants