Skip to content

Chunked (large) item support#181

Merged
dormando merged 9 commits into
memcached:nextfrom
dormando:chunked_items
Jul 13, 2016
Merged

Chunked (large) item support#181
dormando merged 9 commits into
memcached:nextfrom
dormando:chunked_items

Conversation

@dormando

@dormando dormando commented Jul 7, 2016

Copy link
Copy Markdown
Member

Work in progress PR for chunked item support. As of writing this is largely functional, but is incomplete and may need a refactor.


  • What is large item support?

Items have always required a single chunk large enough to hold its header, key, and data. This is done via a slab allocator, which is split into many sized slabs. The "largest available slab", became the largest item you could store into memcached. This is where the 1 megabyte limit came in.

The larger the item size limit, the less efficient the slab allocator is, as there is more space between slab classes.

With this change, the max slab chunk size is decoupled from the max item size, and items larger than the max chunk size now consist of multiple chunks chained together.

This has a number of benefits:

  • Performance is nearly identical, as larger items will fill bandwidth much earlier than small items.
  • The max slab class size can be much smaller (I've been settling toward 16k after testing), down from 1mb. This means the rest of the slab classes can be closer together.
  • Since larger items use data at a much smaller granularity (ie 16k), they have less memory overhead than before.
  • The item size max can be hundreds of megabytes without any major impact.

There are some downsides:

  • It complicates the slab allocator a bit and adds a few extra code paths in reading/writing items.
  • Freeing a single item no longer guarantees enough memory to store a new single item. Pulling chunked items to free will need some careful tuning. This may not be awful considering very small items will continue to be 1:1, and the number of loops necessary to free memory for large items is reasonable given the bandwidth required to store them.

  • The current status

This largely works. It's been bench tested but not torture tested. Some tests have been written and I've fiddled with it manually for a while. There's a decent amount of code left to write, and I feel like it might need a little rewriting or simplification. This implementation is making the deep slab allocator more aware of how items are structured which feels like a layer violation: items.c should instead ask the slab allocator for a specific number of chunks, but then we might have to loop through them twice, or pass a callback function into the slab allocator to initialize the chunks. Moving more of the implementation up to the item layer would allow it to more intelligently garbage collect when freeing up chunks for a new chunked item as well.

Could use some feedback on any obvious bugs I've left in.

Work remaining:

  • Write more tests (+ fix more existing tests + tests for 00-startup.t)
  • Protect binary sasl auth from getting a chunked item
  • Implement chunked-aware append/prepend commands
  • Implement chunked-aware slab page mover. This is also complicated.
  • Run burn-in test
  • Clean out debug code

Punting or doing later:

  • Test using readv to reduce number of read() syscalls on store.

Run burn-in tests on very large items (hundreds of megabytes)
Any useful stats that could be added? Something to indicate how many large items have been seen, at least.

@dormando

Copy link
Copy Markdown
Member Author

It's been passing the burn-in tests, finally.

Just did some cleanup and pushed the rebalancer updates, which were very challenging. Normally when updating the slab rebalancer I spend several hours drawing out interactions; I did not do that this time and suffered for it.

Most of the pain was preserving the (recent!) ability of the slab rebalancer to avoid evicting items when it has some free chunks in the slab class. The magic trick now extends to being able to rescue individual chunks from within an item: If a large, chunked item has a chunk that spans into a slab page that needs to be moved, the chunk can be swapped out with free memory without evicting or rewriting the entire item. If no free memory is available the entire item gets removed, of course.

This is going to need some refactoring once it's had time to sink in: There're too many layer violations between memcached.c, items.c, and slabs.c, but thinking through all of this on a higher level has given me ideas for the future.

More test cleanup is needed, then several runs through the buildbots and a looped-test to kick out some flaky tests. I'm low on TODO's so this should be very close to ready.

dormando added 9 commits July 12, 2016 18:42
can set and store large items via asciiprot. gets/append/prepend/binprot not
implemented yet.
can actually fetch items now, and fixed a few bugs with storage/freeing.

added fetching for binprot.
added some basic tests.

many tests still fail for various reasons, and append/prepend isn't fixed yet.
has spent some time under performance testing. For larger items there's less
than 5% extra CPU usage, however the max usable CPU when using large items is
1/10th or less before you run out of bandwidth. Mixed small/large items will
still balance out.

comments out debugging (which must be removed for release).

restores defaults and ensures only t/chunked-items.t is affected.

dyn-maxbytes and item_size_max tests still fail.

append/prepend aren't implemented, sasl needs to be guarded.

slab mover needs to be updated.
not entirely sure how to test this or guarantee people don't set the chunk
low enough to cause problems. might circle back to add better tests.
This is also the only actual tests of append and prepend in the codebase :/
t/binary.t had some simple tests but no others were ever written.
now extends data in the chunk rather than overwriting the end.
also fixes the new LRU algorithm to balance by total bytes used rather than
total chunks used, since total chunks used isn't tracked for multi-chunk
items.

also fixes a bug where the lru limit wasn't being utilized for HOT_LRU

also some cleanup from previous commits.
more rules around item sizes now. slab chunks automatically adjust if
requesting memory > 1M
when doing tests with long strings the results are buffered... so doing
thousands of tests with long strings was using more than a gig of ram. now we
just summarize if any were different.

also fixes more offset stuff.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant