Chunked (large) item support by dormando · Pull Request #181 · memcached/memcached

dormando · 2016-07-07T08:14:14Z

Work in progress PR for chunked item support. As of writing this is largely functional, but is incomplete and may need a refactor.

What is large item support?

Items have always required a single chunk large enough to hold its header, key, and data. This is done via a slab allocator, which is split into many sized slabs. The "largest available slab", became the largest item you could store into memcached. This is where the 1 megabyte limit came in.

The larger the item size limit, the less efficient the slab allocator is, as there is more space between slab classes.

With this change, the max slab chunk size is decoupled from the max item size, and items larger than the max chunk size now consist of multiple chunks chained together.

This has a number of benefits:

Performance is nearly identical, as larger items will fill bandwidth much earlier than small items.
The max slab class size can be much smaller (I've been settling toward 16k after testing), down from 1mb. This means the rest of the slab classes can be closer together.
Since larger items use data at a much smaller granularity (ie 16k), they have less memory overhead than before.
The item size max can be hundreds of megabytes without any major impact.

There are some downsides:

It complicates the slab allocator a bit and adds a few extra code paths in reading/writing items.
Freeing a single item no longer guarantees enough memory to store a new single item. Pulling chunked items to free will need some careful tuning. This may not be awful considering very small items will continue to be 1:1, and the number of loops necessary to free memory for large items is reasonable given the bandwidth required to store them.

The current status

This largely works. It's been bench tested but not torture tested. Some tests have been written and I've fiddled with it manually for a while. There's a decent amount of code left to write, and I feel like it might need a little rewriting or simplification. This implementation is making the deep slab allocator more aware of how items are structured which feels like a layer violation: items.c should instead ask the slab allocator for a specific number of chunks, but then we might have to loop through them twice, or pass a callback function into the slab allocator to initialize the chunks. Moving more of the implementation up to the item layer would allow it to more intelligently garbage collect when freeing up chunks for a new chunked item as well.

Could use some feedback on any obvious bugs I've left in.

Work remaining:

Write more tests (+ fix more existing tests + tests for 00-startup.t)
Protect binary sasl auth from getting a chunked item
Implement chunked-aware append/prepend commands
Implement chunked-aware slab page mover. This is also complicated.
Run burn-in test
Clean out debug code

Punting or doing later:

Test using readv to reduce number of read() syscalls on store.

~~Run burn-in tests on very large items (hundreds of megabytes)~~
~~Any useful stats that could be added? Something to indicate how many large items have been seen, at least.~~

dormando · 2016-07-12T10:40:43Z

It's been passing the burn-in tests, finally.

Just did some cleanup and pushed the rebalancer updates, which were very challenging. Normally when updating the slab rebalancer I spend several hours drawing out interactions; I did not do that this time and suffered for it.

Most of the pain was preserving the (recent!) ability of the slab rebalancer to avoid evicting items when it has some free chunks in the slab class. The magic trick now extends to being able to rescue individual chunks from within an item: If a large, chunked item has a chunk that spans into a slab page that needs to be moved, the chunk can be swapped out with free memory without evicting or rewriting the entire item. If no free memory is available the entire item gets removed, of course.

This is going to need some refactoring once it's had time to sink in: There're too many layer violations between memcached.c, items.c, and slabs.c, but thinking through all of this on a higher level has given me ideas for the future.

More test cleanup is needed, then several runs through the buildbots and a looped-test to kick out some flaky tests. I'm low on TODO's so this should be very close to ready.

can set and store large items via asciiprot. gets/append/prepend/binprot not implemented yet.

can actually fetch items now, and fixed a few bugs with storage/freeing. added fetching for binprot. added some basic tests. many tests still fail for various reasons, and append/prepend isn't fixed yet.

has spent some time under performance testing. For larger items there's less than 5% extra CPU usage, however the max usable CPU when using large items is 1/10th or less before you run out of bandwidth. Mixed small/large items will still balance out. comments out debugging (which must be removed for release). restores defaults and ensures only t/chunked-items.t is affected. dyn-maxbytes and item_size_max tests still fail. append/prepend aren't implemented, sasl needs to be guarded. slab mover needs to be updated.

not entirely sure how to test this or guarantee people don't set the chunk low enough to cause problems. might circle back to add better tests.

This is also the only actual tests of append and prepend in the codebase :/ t/binary.t had some simple tests but no others were ever written.

now extends data in the chunk rather than overwriting the end.

also fixes the new LRU algorithm to balance by total bytes used rather than total chunks used, since total chunks used isn't tracked for multi-chunk items. also fixes a bug where the lru limit wasn't being utilized for HOT_LRU also some cleanup from previous commits.

more rules around item sizes now. slab chunks automatically adjust if requesting memory > 1M

when doing tests with long strings the results are buffered... so doing thousands of tests with long strings was using more than a gig of ram. now we just summarize if any were different. also fixes more offset stuff.

dormando added WIP needs review/testing labels Jul 7, 2016

dormando self-assigned this Jul 7, 2016

dormando force-pushed the chunked_items branch from 3c6c78a to 51f1fdb Compare July 12, 2016 10:32

dormando added 9 commits July 12, 2016 18:42

chunked items checkpoint commit

0567967

can set and store large items via asciiprot. gets/append/prepend/binprot not implemented yet.

chunked item second checkpoint

b05653f

can actually fetch items now, and fixed a few bugs with storage/freeing. added fetching for binprot. added some basic tests. many tests still fail for various reasons, and append/prepend isn't fixed yet.

protect binary sasl from chunked items.

3e10a71

not entirely sure how to test this or guarantee people don't set the chunk low enough to cause problems. might circle back to add better tests.

chunked append/prepend now work.

5978cf7

This is also the only actual tests of append and prepend in the codebase :/ t/binary.t had some simple tests but no others were ever written.

CLRF in binprot was inverted.

6975235

now extends data in the chunk rather than overwriting the end.

update item_size_max tests

9c92c3a

more rules around item sizes now. slab chunks automatically adjust if requesting memory > 1M

fix test RAM bloat and more 32bit offsets.

5627405

when doing tests with long strings the results are buffered... so doing thousands of tests with long strings was using more than a gig of ram. now we just summarize if any were different. also fixes more offset stuff.

dormando force-pushed the chunked_items branch from 51f1fdb to 5627405 Compare July 13, 2016 05:55

dormando merged commit 5627405 into memcached:next Jul 13, 2016

dormando added merged/fixed for next and removed needs review/testing WIP labels Jul 13, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Chunked (large) item support#181

Chunked (large) item support#181
dormando merged 9 commits into
memcached:nextfrom
dormando:chunked_items

dormando commented Jul 7, 2016 •

edited

Loading

Uh oh!

dormando commented Jul 12, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

dormando commented Jul 7, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dormando commented Jul 12, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dormando commented Jul 7, 2016 •

edited

Loading