PARQUET-160: avoid wasting 64K per empty buffer.#98
PARQUET-160: avoid wasting 64K per empty buffer.#98julienledem wants to merge 18 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
We should tweak the initialSize here.
levels should get a tiny initial size (100 bytes?) in case they are always null or always defined.
|
The initial size here should be tweaked as well to something smaller: |
There was a problem hiding this comment.
should 5 be configurable too?
we could also make CapacityByteArrayOutputStream abstract or take as an argument a slab size calculator etc. so that we can plug in different behaviors here. what do you think?
|
Do you want to tweak the initial size here as well? |
|
@julienledem ping! |
… a simpler heuristic in the column writers instead
|
Sent a PR against this PR here: julienledem#2 |
…onaryValuesWriter as well
Updates to PR-98
|
@tsdeng ok, this PR is now ready to review, it's got both @julienledem's changes and mine as well. |
…nledem/incubator-parquet-mr into avoid_wasting_64K_per_empty_buffer
Conflicts: parquet-hadoop/src/main/java/parquet/hadoop/ColumnChunkPageWriteStore.java parquet-hadoop/src/test/java/parquet/hadoop/TestColumnChunkPageWriteStore.java
|
+1, lets merge when the tests are green |
|
I'm running these tests here: in case we have to wait a long time for the travis CI apache queue. |
This buffer initializes itself to a default size when instantiated.
This leads to a lot of unused small buffers when there are a lot of empty columns.