Skip to content

PARQUET-160: avoid wasting 64K per empty buffer.#98

Closed
julienledem wants to merge 18 commits intoapache:masterfrom
julienledem:avoid_wasting_64K_per_empty_buffer
Closed

PARQUET-160: avoid wasting 64K per empty buffer.#98
julienledem wants to merge 18 commits intoapache:masterfrom
julienledem:avoid_wasting_64K_per_empty_buffer

Conversation

@julienledem
Copy link
Copy Markdown
Member

This buffer initializes itself to a default size when instantiated.
This leads to a lot of unused small buffers when there are a lot of empty columns.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should tweak the initialSize here.
levels should get a tiny initial size (100 bytes?) in case they are always null or always defined.

@julienledem
Copy link
Copy Markdown
Member Author

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should 5 be configurable too?

we could also make CapacityByteArrayOutputStream abstract or take as an argument a slab size calculator etc. so that we can plug in different behaviors here. what do you think?

@isnotinvain
Copy link
Copy Markdown
Contributor

Do you want to tweak the initial size here as well?
Do you think we should try this out internally and see if it's an improvement first?

@isnotinvain
Copy link
Copy Markdown
Contributor

@julienledem ping!

@isnotinvain
Copy link
Copy Markdown
Contributor

Sent a PR against this PR here: julienledem#2

@isnotinvain
Copy link
Copy Markdown
Contributor

@tsdeng ok, this PR is now ready to review, it's got both @julienledem's changes and mine as well.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm this, not used.

…nledem/incubator-parquet-mr into avoid_wasting_64K_per_empty_buffer
Conflicts:
	parquet-hadoop/src/main/java/parquet/hadoop/ColumnChunkPageWriteStore.java
	parquet-hadoop/src/test/java/parquet/hadoop/TestColumnChunkPageWriteStore.java
@isnotinvain
Copy link
Copy Markdown
Contributor

+1, lets merge when the tests are green

@isnotinvain
Copy link
Copy Markdown
Contributor

I'm running these tests here:
https://github.com/isnotinvain/incubator-parquet-mr/pull/2

in case we have to wait a long time for the travis CI apache queue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants