persist: actually use columnar encoding for unsealed batches.#9239
persist: actually use columnar encoding for unsealed batches.#9239ruchirK merged 2 commits intoMaterializeInc:mainfrom
Conversation
7c138ff to
33ee4df
Compare
|
Only the last commit is relevant, the first 3 commits are #9165 |
|
headed to the airport now so it might take me a bit to get to this one |
|
|
||
| let updates = generate_updates(); | ||
| let size = get_encoded_len(updates.clone()); | ||
| let updates = vec![updates.iter().collect::<ColumnarRecords>()]; |
There was a problem hiding this comment.
let's make generate_updates return a ColumnarRecords instead
There was a problem hiding this comment.
I'd rather not do that because generate_updates is also used to generate a vec of updates for calls to bench_writes (which uses a WriteReqBuilder)
Keeping all of the data generating functions tied to the same underlying function helps me make sure that the benchmarks that write unsealed batches and the benchmarks that write data to indexed and then call drain pending are writing the same exact data.
There was a problem hiding this comment.
yeah I noticed, but it's just as easy for bench_writes to go ColumnarRecords -> Vec<(...)> as for this to go Vec<(...)> -> ColumnarRecords. and I'd rather the benchmarks that only need a ColumnarRecords don't have to go through the allocs of the Vec thing (even in that's only happening in setup code)
There was a problem hiding this comment.
took me a bit to find the write! macro to actually generate keys in place, but done!
| }; | ||
| match b.recv() { | ||
| Ok(b) => { | ||
| // Reverse the updates so we can pop them off the back |
There was a problem hiding this comment.
why did we lose this behavior (and comment about making sure to filter before clone)?
There was a problem hiding this comment.
iirc, because ColumnarRecordsIter doesn't implement DoubleEndedIterator -- I think it could though - let me see if I can whip that up
There was a problem hiding this comment.
nope! that was not the right reason. -- added the rev back in, but am not reversing the updates within each ColumnarRecords because they are not sorted in any way
| ); | ||
|
|
||
| debug_assert!(!updates.is_empty()); | ||
| let updates = vec![updates.iter().collect::<ColumnarRecords>()]; |
There was a problem hiding this comment.
shouldn't this build the ColumnarRecords directly?
There was a problem hiding this comment.
I'm sorry I'm not sure I understand the question here
There was a problem hiding this comment.
oh i think i see the question -- right now we go through a vec to generate the ColumnarRecords and we could skip that intermediate step. yes. ofc
There was a problem hiding this comment.
This methods builds a Vec<((Vec<u8>, Vec<u8>), u64, size)> and then uses that to construct a ColumnarRecords. instead of let mut updates = vec![] above, we could that updates variable a ColumnarRecordsBuilder and then instead of updates.extend, we could call updates.push. this line then becomes updates.finish instead of updates.iter().collect::<ColumnarRecords>()
There was a problem hiding this comment.
looks like we raced! yup, that's it exactly
There was a problem hiding this comment.
done! implemented it with a FromIterator implementation so it can be reused if needed (I think its essentially the same builder logic that i would have had to inline). This removes the allocation for the vec, and also removes the allocations for each key and val, so thank you for the suggestion!
| ); | ||
|
|
||
| debug_assert!(!updates.is_empty()); | ||
| let updates = vec![updates.iter().collect::<ColumnarRecords>()]; |
There was a problem hiding this comment.
This methods builds a Vec<((Vec<u8>, Vec<u8>), u64, size)> and then uses that to construct a ColumnarRecords. instead of let mut updates = vec![] above, we could that updates variable a ColumnarRecordsBuilder and then instead of updates.extend, we could call updates.push. this line then becomes updates.finish instead of updates.iter().collect::<ColumnarRecords>()
| pub desc: Description<SeqNo>, | ||
| /// The updates themselves. | ||
| pub updates: Vec<((Vec<u8>, Vec<u8>), u64, isize)>, | ||
| pub updates: Vec<ColumnarRecords>, |
There was a problem hiding this comment.
weird, looks like I never submitted this comment in the first round. sorry about that, it's my major concern and why I didn't just stamp this PR the first time...
why is this still a Vec? should it not just be a bare ColumnarRecords?
There was a problem hiding this comment.
Right. That was my expectation when I first started making this change, and I tried to think of ways to make it a bare ColumnarRecords but it seemed that, since each write itself produces a ColumnarRecords, concatenating together N ColumnarRecords into one would introduce a lot more extra allocation / memcpying on the write fast path. I agree that trace batches should just be a bare ColumnarRecords
to be entirely honest though I didn't do the performance test of what the impact would be of converting all of the ColumnarRecords from various writes into one ColumnarRecords per persisted stream and thats probably worth doing before we decide on this approach
There was a problem hiding this comment.
yeah okay, I see. I'm looking ahead to switching the update blobs to be encoded with arrow where we won't really be able to keep this as a Vec, but I'm fine leaving this as a TODO for then.
96ebab8 to
93e14e9
Compare
danhhz
left a comment
There was a problem hiding this comment.
I do still prefer that generate_updates return a ColumnarRecords (unless I'm still missing something really dumb) but I'm not going to hold this up over it if you really still disagree
| } | ||
| } | ||
|
|
||
| impl<'a> FromIterator<((&'a [u8], &'a [u8]), u64, isize)> for ColumnarRecords { |
There was a problem hiding this comment.
seems like we could combine these two impls, but I'm happy leaving that as a TODO
This commit teaches unsealed batches to use the new [ColumnarRecords] representation to store records. This change results in a healthy 15 - 20% improvement across the indexed_write_drain benchmarks. These changes are relative to the baseline from (b9436dd) which introduced the columnar representation, and not relative to the temporary regression caused by using ColumnarRecords without changing the encoding. temporary regression introduced in blob_cache_set_unsealed_batch/file_unsorted/23000000 time: [67.601 ms 72.892 ms 78.702 ms] thrpt: [278.70 MiB/s 300.92 MiB/s 324.47 MiB/s] change: time: [-4.0024% +3.6422% +13.959%] (p = 0.44 > 0.05) thrpt: [-12.249% -3.5142% +4.1693%] No change in performance detected. blob_cache_set_unsealed_batch/mem_unsorted/23000000 time: [29.671 ms 30.247 ms 30.843 ms] thrpt: [711.17 MiB/s 725.17 MiB/s 739.26 MiB/s] change: time: [-14.040% -12.150% -10.401%] (p = 0.00 < 0.05) thrpt: [+11.608% +13.831% +16.333%] Performance has improved. indexed_write_drain/mem_sorted/23000000 time: [157.88 ms 160.63 ms 164.21 ms] thrpt: [133.57 MiB/s 136.56 MiB/s 138.93 MiB/s] change: time: [-21.879% -18.607% -15.460%] (p = 0.00 < 0.05) thrpt: [+18.288% +22.860% +28.007%] Performance has improved. Found 1 outliers among 10 measurements (10.00%) 1 (10.00%) high severe indexed_write_drain/mem_unsorted/23000000 time: [162.87 ms 165.92 ms 168.86 ms] thrpt: [129.90 MiB/s 132.20 MiB/s 134.68 MiB/s] change: time: [-24.925% -23.193% -21.583%] (p = 0.00 < 0.05) thrpt: [+27.524% +30.196% +33.201%] Performance has improved. indexed_write_drain/file_sorted/23000000 time: [227.40 ms 232.74 ms 238.26 ms] thrpt: [92.060 MiB/s 94.246 MiB/s 96.458 MiB/s] change: time: [-17.466% -15.410% -13.244%] (p = 0.00 < 0.05) thrpt: [+15.266% +18.217% +21.163%] Performance has improved. indexed_write_drain/file_unsorted/23000000 time: [236.30 ms 240.72 ms 245.04 ms] thrpt: [89.513 MiB/s 91.119 MiB/s 92.824 MiB/s] change: time: [-16.800% -14.567% -12.476%] (p = 0.00 < 0.05) thrpt: [+14.254% +17.051% +20.192%] Performance has improved. For very small batches this change results in a noticeable increase in the encoded size but the effect is small, and entirely unnoticeable once data volumes exceed 1,000 bytes (measured with inserts of 1.1KB, 110 KB, and 110 MB, and the sizes of unsealed batches produced before and after this change were identical). Unfortunately, this change doesn't register any improvement in the end-to-end test sending 1 million 110 byte rows via a COPY INTO, because we save on allocations during `write_atomic`, but end up having to allocate when sending rows to the listener. However, there's clear positives from this change. Serialization times decrease from 230 ms to 170 ms (out of a total of ~360 - 370 ms CPU time spent in drain_pending). This represents a ~20% improvement that is roughly commeasurate with what we observe in the benchmarks. On the coordinator side, we go from spending ~175 ms in `write_atomic` to spending about 140 ms in `write_atomic`, which is a smaller than expected change. Curiously, the trace seems to be dominated by `free_tiny` which leads us to suspect that perhaps the cost we're now bottlenecked on is the `drop` on the owned vec that gets passed to `write_atomic`.
…ectly Previously, the various write benchmarks would generate a large vector of ((key, val), time, diff) records which would then be converted into a `ColumnarRecords` object if needed. This forced the benchmarks which didn't need the underlying vector of records to still pay the allocation cost on benchmark setup. This commit changes the behavior to instead always generate a `ColumnarRecords` and downgrade to a vector of records only as needed. This change also removes the clone needed to find the encoded length of a set of records. This commit has no significant impact on benchmark results.
2667f91 to
4f25522
Compare
|
tftr! merging on green! |
Motivation
Tips for reviewer
Checklist
user-facing behavior changes.