Skip to content

persist: actually use columnar encoding for unsealed batches.#9239

Merged
ruchirK merged 2 commits intoMaterializeInc:mainfrom
ruchirK:persist-columnar-2
Dec 1, 2021
Merged

persist: actually use columnar encoding for unsealed batches.#9239
ruchirK merged 2 commits intoMaterializeInc:mainfrom
ruchirK:persist-columnar-2

Conversation

@ruchirK
Copy link
Copy Markdown
Contributor

@ruchirK ruchirK commented Nov 23, 2021

Motivation

Tips for reviewer

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR adds a release note for any
    user-facing behavior changes.

@ruchirK
Copy link
Copy Markdown
Contributor Author

ruchirK commented Nov 23, 2021

Only the last commit is relevant, the first 3 commits are #9165

@danhhz
Copy link
Copy Markdown
Contributor

danhhz commented Nov 23, 2021

headed to the airport now so it might take me a bit to get to this one


let updates = generate_updates();
let size = get_encoded_len(updates.clone());
let updates = vec![updates.iter().collect::<ColumnarRecords>()];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make generate_updates return a ColumnarRecords instead

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not do that because generate_updates is also used to generate a vec of updates for calls to bench_writes (which uses a WriteReqBuilder)

Keeping all of the data generating functions tied to the same underlying function helps me make sure that the benchmarks that write unsealed batches and the benchmarks that write data to indexed and then call drain pending are writing the same exact data.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I noticed, but it's just as easy for bench_writes to go ColumnarRecords -> Vec<(...)> as for this to go Vec<(...)> -> ColumnarRecords. and I'd rather the benchmarks that only need a ColumnarRecords don't have to go through the allocs of the Vec thing (even in that's only happening in setup code)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

took me a bit to find the write! macro to actually generate keys in place, but done!

Comment thread src/persist/src/indexed/unsealed.rs Outdated
};
match b.recv() {
Ok(b) => {
// Reverse the updates so we can pop them off the back
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did we lose this behavior (and comment about making sure to filter before clone)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc, because ColumnarRecordsIter doesn't implement DoubleEndedIterator -- I think it could though - let me see if I can whip that up

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope! that was not the right reason. -- added the rev back in, but am not reversing the updates within each ColumnarRecords because they are not sorted in any way

Comment thread src/persist/src/indexed/unsealed.rs Outdated
);

debug_assert!(!updates.is_empty());
let updates = vec![updates.iter().collect::<ColumnarRecords>()];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this build the ColumnarRecords directly?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry I'm not sure I understand the question here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i think i see the question -- right now we go through a vec to generate the ColumnarRecords and we could skip that intermediate step. yes. ofc

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This methods builds a Vec<((Vec<u8>, Vec<u8>), u64, size)> and then uses that to construct a ColumnarRecords. instead of let mut updates = vec![] above, we could that updates variable a ColumnarRecordsBuilder and then instead of updates.extend, we could call updates.push. this line then becomes updates.finish instead of updates.iter().collect::<ColumnarRecords>()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we raced! yup, that's it exactly

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done! implemented it with a FromIterator implementation so it can be reused if needed (I think its essentially the same builder logic that i would have had to inline). This removes the allocation for the vec, and also removes the allocations for each key and val, so thank you for the suggestion!

Comment thread src/persist/src/indexed/unsealed.rs Outdated
);

debug_assert!(!updates.is_empty());
let updates = vec![updates.iter().collect::<ColumnarRecords>()];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This methods builds a Vec<((Vec<u8>, Vec<u8>), u64, size)> and then uses that to construct a ColumnarRecords. instead of let mut updates = vec![] above, we could that updates variable a ColumnarRecordsBuilder and then instead of updates.extend, we could call updates.push. this line then becomes updates.finish instead of updates.iter().collect::<ColumnarRecords>()

pub desc: Description<SeqNo>,
/// The updates themselves.
pub updates: Vec<((Vec<u8>, Vec<u8>), u64, isize)>,
pub updates: Vec<ColumnarRecords>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird, looks like I never submitted this comment in the first round. sorry about that, it's my major concern and why I didn't just stamp this PR the first time...

why is this still a Vec? should it not just be a bare ColumnarRecords?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. That was my expectation when I first started making this change, and I tried to think of ways to make it a bare ColumnarRecords but it seemed that, since each write itself produces a ColumnarRecords, concatenating together N ColumnarRecords into one would introduce a lot more extra allocation / memcpying on the write fast path. I agree that trace batches should just be a bare ColumnarRecords

to be entirely honest though I didn't do the performance test of what the impact would be of converting all of the ColumnarRecords from various writes into one ColumnarRecords per persisted stream and thats probably worth doing before we decide on this approach

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah okay, I see. I'm looking ahead to switching the update blobs to be encoded with arrow where we won't really be able to keep this as a Vec, but I'm fine leaving this as a TODO for then.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a todo here.

@ruchirK ruchirK force-pushed the persist-columnar-2 branch 2 times, most recently from 96ebab8 to 93e14e9 Compare November 30, 2021 20:24
@ruchirK ruchirK requested a review from danhhz November 30, 2021 21:59
Copy link
Copy Markdown
Contributor

@danhhz danhhz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do still prefer that generate_updates return a ColumnarRecords (unless I'm still missing something really dumb) but I'm not going to hold this up over it if you really still disagree

}
}

impl<'a> FromIterator<((&'a [u8], &'a [u8]), u64, isize)> for ColumnarRecords {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like we could combine these two impls, but I'm happy leaving that as a TODO

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a todo

Comment thread src/persist/src/indexed/encoding.rs
This commit teaches unsealed batches to use the new [ColumnarRecords] representation
to store records. This change results in a healthy 15 - 20% improvement across the
indexed_write_drain benchmarks.

These changes are relative to the baseline from (b9436dd) which introduced the
columnar representation,  and not relative to the temporary regression caused by
using ColumnarRecords without changing the encoding.

temporary regression introduced in
blob_cache_set_unsealed_batch/file_unsorted/23000000
                        time:   [67.601 ms 72.892 ms 78.702 ms]
                        thrpt:  [278.70 MiB/s 300.92 MiB/s 324.47 MiB/s]
                 change:
                        time:   [-4.0024% +3.6422% +13.959%] (p = 0.44 > 0.05)
                        thrpt:  [-12.249% -3.5142% +4.1693%]
                        No change in performance detected.
blob_cache_set_unsealed_batch/mem_unsorted/23000000
                        time:   [29.671 ms 30.247 ms 30.843 ms]
                        thrpt:  [711.17 MiB/s 725.17 MiB/s 739.26 MiB/s]
                 change:
                        time:   [-14.040% -12.150% -10.401%] (p = 0.00 < 0.05)
                        thrpt:  [+11.608% +13.831% +16.333%]
                        Performance has improved.

indexed_write_drain/mem_sorted/23000000
                        time:   [157.88 ms 160.63 ms 164.21 ms]
                        thrpt:  [133.57 MiB/s 136.56 MiB/s 138.93 MiB/s]
                 change:
                        time:   [-21.879% -18.607% -15.460%] (p = 0.00 < 0.05)
                        thrpt:  [+18.288% +22.860% +28.007%]
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe
indexed_write_drain/mem_unsorted/23000000
                        time:   [162.87 ms 165.92 ms 168.86 ms]
                        thrpt:  [129.90 MiB/s 132.20 MiB/s 134.68 MiB/s]
                 change:
                        time:   [-24.925% -23.193% -21.583%] (p = 0.00 < 0.05)
                        thrpt:  [+27.524% +30.196% +33.201%]
                        Performance has improved.
indexed_write_drain/file_sorted/23000000
                        time:   [227.40 ms 232.74 ms 238.26 ms]
                        thrpt:  [92.060 MiB/s 94.246 MiB/s 96.458 MiB/s]
                 change:
                        time:   [-17.466% -15.410% -13.244%] (p = 0.00 < 0.05)
                        thrpt:  [+15.266% +18.217% +21.163%]
                        Performance has improved.
indexed_write_drain/file_unsorted/23000000
                        time:   [236.30 ms 240.72 ms 245.04 ms]
                        thrpt:  [89.513 MiB/s 91.119 MiB/s 92.824 MiB/s]
                 change:
                        time:   [-16.800% -14.567% -12.476%] (p = 0.00 < 0.05)
                        thrpt:  [+14.254% +17.051% +20.192%]
                        Performance has improved.

For very small batches this change results in a noticeable increase in the encoded
size but the effect is small, and entirely unnoticeable once data volumes exceed
1,000 bytes (measured with inserts of 1.1KB, 110 KB, and 110 MB, and the sizes of
unsealed batches produced before and after this change were identical).

Unfortunately, this change doesn't register any improvement in the end-to-end test
sending 1 million 110 byte rows via a COPY INTO, because we save on allocations
during `write_atomic`, but end up having to allocate when sending rows to the listener.
However, there's clear positives from this change. Serialization times decrease
from 230 ms to 170 ms (out of a total of ~360 - 370 ms CPU time spent in drain_pending).
This represents a ~20% improvement that is roughly commeasurate with what we observe
in the benchmarks. On the coordinator side, we go from spending ~175 ms in
`write_atomic` to spending about 140 ms in `write_atomic`, which is a smaller than
expected change. Curiously, the trace seems to be dominated by `free_tiny` which
leads us to suspect that perhaps the cost we're now bottlenecked on is the `drop`
on the owned vec that gets passed to `write_atomic`.
…ectly

Previously, the various write benchmarks would generate a large vector of
((key, val), time, diff) records which would then be converted into a `ColumnarRecords`
object if needed. This forced the benchmarks which didn't need the underlying
vector of records to still pay the allocation cost on benchmark setup.

This commit changes the behavior to instead always generate a `ColumnarRecords`
and downgrade to a vector of records only as needed. This change also removes
the clone needed to find the encoded length of a set of records.

This commit has no significant impact on benchmark results.
@ruchirK ruchirK force-pushed the persist-columnar-2 branch from 2667f91 to 4f25522 Compare December 1, 2021 18:09
@ruchirK ruchirK enabled auto-merge December 1, 2021 18:11
@ruchirK
Copy link
Copy Markdown
Contributor Author

ruchirK commented Dec 1, 2021

tftr! merging on green!

@ruchirK ruchirK merged commit 1dc5a8f into MaterializeInc:main Dec 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants