persist: actually use columnar encoding for unsealed batches. by ruchirK · Pull Request #9239 · MaterializeInc/materialize

ruchirK · 2021-11-23T00:57:33Z

Motivation

Tips for reviewer

Checklist

This PR has adequate test coverage / QA involvement has been duly considered.
This PR adds a release note for any
user-facing behavior changes.

ruchirK · 2021-11-23T01:01:49Z

Only the last commit is relevant, the first 3 commits are #9165

danhhz · 2021-11-23T15:26:03Z

headed to the airport now so it might take me a bit to get to this one

danhhz · 2021-11-29T17:31:06Z


    let updates = generate_updates();
    let size = get_encoded_len(updates.clone());
+    let updates = vec![updates.iter().collect::<ColumnarRecords>()];


let's make generate_updates return a ColumnarRecords instead

I'd rather not do that because generate_updates is also used to generate a vec of updates for calls to bench_writes (which uses a WriteReqBuilder)

Keeping all of the data generating functions tied to the same underlying function helps me make sure that the benchmarks that write unsealed batches and the benchmarks that write data to indexed and then call drain pending are writing the same exact data.

yeah I noticed, but it's just as easy for bench_writes to go ColumnarRecords -> Vec<(...)> as for this to go Vec<(...)> -> ColumnarRecords. and I'd rather the benchmarks that only need a ColumnarRecords don't have to go through the allocs of the Vec thing (even in that's only happening in setup code)

took me a bit to find the write! macro to actually generate keys in place, but done!

danhhz · 2021-11-29T17:36:21Z

                };
                match b.recv() {
                    Ok(b) => {
-                        // Reverse the updates so we can pop them off the back


why did we lose this behavior (and comment about making sure to filter before clone)?

iirc, because ColumnarRecordsIter doesn't implement DoubleEndedIterator -- I think it could though - let me see if I can whip that up

nope! that was not the right reason. -- added the rev back in, but am not reversing the updates within each ColumnarRecords because they are not sorted in any way

danhhz · 2021-11-29T17:36:58Z

        );
+
        debug_assert!(!updates.is_empty());
+        let updates = vec![updates.iter().collect::<ColumnarRecords>()];


shouldn't this build the ColumnarRecords directly?

I'm sorry I'm not sure I understand the question here

oh i think i see the question -- right now we go through a vec to generate the ColumnarRecords and we could skip that intermediate step. yes. ofc

This methods builds a Vec<((Vec<u8>, Vec<u8>), u64, size)> and then uses that to construct a ColumnarRecords. instead of let mut updates = vec![] above, we could that updates variable a ColumnarRecordsBuilder and then instead of updates.extend, we could call updates.push. this line then becomes updates.finish instead of updates.iter().collect::<ColumnarRecords>()

looks like we raced! yup, that's it exactly

done! implemented it with a FromIterator implementation so it can be reused if needed (I think its essentially the same builder logic that i would have had to inline). This removes the allocation for the vec, and also removes the allocations for each key and val, so thank you for the suggestion!

danhhz · 2021-11-29T21:07:02Z

        );
+
        debug_assert!(!updates.is_empty());
+        let updates = vec![updates.iter().collect::<ColumnarRecords>()];


This methods builds a Vec<((Vec<u8>, Vec<u8>), u64, size)> and then uses that to construct a ColumnarRecords. instead of let mut updates = vec![] above, we could that updates variable a ColumnarRecordsBuilder and then instead of updates.extend, we could call updates.push. this line then becomes updates.finish instead of updates.iter().collect::<ColumnarRecords>()

danhhz · 2021-11-29T21:09:00Z

    pub desc: Description<SeqNo>,
    /// The updates themselves.
-    pub updates: Vec<((Vec<u8>, Vec<u8>), u64, isize)>,
+    pub updates: Vec<ColumnarRecords>,


weird, looks like I never submitted this comment in the first round. sorry about that, it's my major concern and why I didn't just stamp this PR the first time...

why is this still a Vec? should it not just be a bare ColumnarRecords?

Right. That was my expectation when I first started making this change, and I tried to think of ways to make it a bare ColumnarRecords but it seemed that, since each write itself produces a ColumnarRecords, concatenating together N ColumnarRecords into one would introduce a lot more extra allocation / memcpying on the write fast path. I agree that trace batches should just be a bare ColumnarRecords

to be entirely honest though I didn't do the performance test of what the impact would be of converting all of the ColumnarRecords from various writes into one ColumnarRecords per persisted stream and thats probably worth doing before we decide on this approach

yeah okay, I see. I'm looking ahead to switching the update blobs to be encoded with arrow where we won't really be able to keep this as a Vec, but I'm fine leaving this as a TODO for then.

added a todo here.

danhhz

I do still prefer that generate_updates return a ColumnarRecords (unless I'm still missing something really dumb) but I'm not going to hold this up over it if you really still disagree

danhhz · 2021-11-30T22:11:10Z

    }
 }

+impl<'a> FromIterator<((&'a [u8], &'a [u8]), u64, isize)> for ColumnarRecords {


seems like we could combine these two impls, but I'm happy leaving that as a TODO

added a todo

This commit teaches unsealed batches to use the new [ColumnarRecords] representation to store records. This change results in a healthy 15 - 20% improvement across the indexed_write_drain benchmarks. These changes are relative to the baseline from (b9436dd) which introduced the columnar representation, and not relative to the temporary regression caused by using ColumnarRecords without changing the encoding. temporary regression introduced in blob_cache_set_unsealed_batch/file_unsorted/23000000 time: [67.601 ms 72.892 ms 78.702 ms] thrpt: [278.70 MiB/s 300.92 MiB/s 324.47 MiB/s] change: time: [-4.0024% +3.6422% +13.959%] (p = 0.44 > 0.05) thrpt: [-12.249% -3.5142% +4.1693%] No change in performance detected. blob_cache_set_unsealed_batch/mem_unsorted/23000000 time: [29.671 ms 30.247 ms 30.843 ms] thrpt: [711.17 MiB/s 725.17 MiB/s 739.26 MiB/s] change: time: [-14.040% -12.150% -10.401%] (p = 0.00 < 0.05) thrpt: [+11.608% +13.831% +16.333%] Performance has improved. indexed_write_drain/mem_sorted/23000000 time: [157.88 ms 160.63 ms 164.21 ms] thrpt: [133.57 MiB/s 136.56 MiB/s 138.93 MiB/s] change: time: [-21.879% -18.607% -15.460%] (p = 0.00 < 0.05) thrpt: [+18.288% +22.860% +28.007%] Performance has improved. Found 1 outliers among 10 measurements (10.00%) 1 (10.00%) high severe indexed_write_drain/mem_unsorted/23000000 time: [162.87 ms 165.92 ms 168.86 ms] thrpt: [129.90 MiB/s 132.20 MiB/s 134.68 MiB/s] change: time: [-24.925% -23.193% -21.583%] (p = 0.00 < 0.05) thrpt: [+27.524% +30.196% +33.201%] Performance has improved. indexed_write_drain/file_sorted/23000000 time: [227.40 ms 232.74 ms 238.26 ms] thrpt: [92.060 MiB/s 94.246 MiB/s 96.458 MiB/s] change: time: [-17.466% -15.410% -13.244%] (p = 0.00 < 0.05) thrpt: [+15.266% +18.217% +21.163%] Performance has improved. indexed_write_drain/file_unsorted/23000000 time: [236.30 ms 240.72 ms 245.04 ms] thrpt: [89.513 MiB/s 91.119 MiB/s 92.824 MiB/s] change: time: [-16.800% -14.567% -12.476%] (p = 0.00 < 0.05) thrpt: [+14.254% +17.051% +20.192%] Performance has improved. For very small batches this change results in a noticeable increase in the encoded size but the effect is small, and entirely unnoticeable once data volumes exceed 1,000 bytes (measured with inserts of 1.1KB, 110 KB, and 110 MB, and the sizes of unsealed batches produced before and after this change were identical). Unfortunately, this change doesn't register any improvement in the end-to-end test sending 1 million 110 byte rows via a COPY INTO, because we save on allocations during `write_atomic`, but end up having to allocate when sending rows to the listener. However, there's clear positives from this change. Serialization times decrease from 230 ms to 170 ms (out of a total of ~360 - 370 ms CPU time spent in drain_pending). This represents a ~20% improvement that is roughly commeasurate with what we observe in the benchmarks. On the coordinator side, we go from spending ~175 ms in `write_atomic` to spending about 140 ms in `write_atomic`, which is a smaller than expected change. Curiously, the trace seems to be dominated by `free_tiny` which leads us to suspect that perhaps the cost we're now bottlenecked on is the `drop` on the owned vec that gets passed to `write_atomic`.

…ectly Previously, the various write benchmarks would generate a large vector of ((key, val), time, diff) records which would then be converted into a `ColumnarRecords` object if needed. This forced the benchmarks which didn't need the underlying vector of records to still pay the allocation cost on benchmark setup. This commit changes the behavior to instead always generate a `ColumnarRecords` and downgrade to a vector of records only as needed. This change also removes the clone needed to find the encoded length of a set of records. This commit has no significant impact on benchmark results.

ruchirK · 2021-12-01T18:11:16Z

tftr! merging on green!

ruchirK mentioned this pull request Nov 23, 2021

persist: columnar representation for unsealed batches #9165

Merged

2 tasks

ruchirK force-pushed the persist-columnar-2 branch from 7c138ff to 33ee4df Compare November 23, 2021 01:01

ruchirK requested a review from danhhz November 23, 2021 01:01

danhhz reviewed Nov 29, 2021

View reviewed changes

danhhz mentioned this pull request Nov 29, 2021

persist: address a batch of encoding TODOs #9338

Merged

1 task

danhhz reviewed Nov 29, 2021

View reviewed changes

ruchirK force-pushed the persist-columnar-2 branch 2 times, most recently from 96ebab8 to 93e14e9 Compare November 30, 2021 20:24

ruchirK requested a review from danhhz November 30, 2021 21:59

danhhz approved these changes Nov 30, 2021

View reviewed changes

ruchirK added 2 commits December 1, 2021 13:09

ruchirK force-pushed the persist-columnar-2 branch from 2667f91 to 4f25522 Compare December 1, 2021 18:09

ruchirK enabled auto-merge December 1, 2021 18:11

ruchirK merged commit 1dc5a8f into MaterializeInc:main Dec 1, 2021

Conversation

ruchirK commented Nov 23, 2021

Motivation

Tips for reviewer

Checklist

Uh oh!

ruchirK commented Nov 23, 2021

Uh oh!

danhhz commented Nov 23, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danhhz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ruchirK commented Dec 1, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants