Chunked (large) item support#181
Conversation
|
It's been passing the burn-in tests, finally. Just did some cleanup and pushed the rebalancer updates, which were very challenging. Normally when updating the slab rebalancer I spend several hours drawing out interactions; I did not do that this time and suffered for it. Most of the pain was preserving the (recent!) ability of the slab rebalancer to avoid evicting items when it has some free chunks in the slab class. The magic trick now extends to being able to rescue individual chunks from within an item: If a large, chunked item has a chunk that spans into a slab page that needs to be moved, the chunk can be swapped out with free memory without evicting or rewriting the entire item. If no free memory is available the entire item gets removed, of course. This is going to need some refactoring once it's had time to sink in: There're too many layer violations between memcached.c, items.c, and slabs.c, but thinking through all of this on a higher level has given me ideas for the future. More test cleanup is needed, then several runs through the buildbots and a looped-test to kick out some flaky tests. I'm low on TODO's so this should be very close to ready. |
can set and store large items via asciiprot. gets/append/prepend/binprot not implemented yet.
can actually fetch items now, and fixed a few bugs with storage/freeing. added fetching for binprot. added some basic tests. many tests still fail for various reasons, and append/prepend isn't fixed yet.
has spent some time under performance testing. For larger items there's less than 5% extra CPU usage, however the max usable CPU when using large items is 1/10th or less before you run out of bandwidth. Mixed small/large items will still balance out. comments out debugging (which must be removed for release). restores defaults and ensures only t/chunked-items.t is affected. dyn-maxbytes and item_size_max tests still fail. append/prepend aren't implemented, sasl needs to be guarded. slab mover needs to be updated.
not entirely sure how to test this or guarantee people don't set the chunk low enough to cause problems. might circle back to add better tests.
This is also the only actual tests of append and prepend in the codebase :/ t/binary.t had some simple tests but no others were ever written.
now extends data in the chunk rather than overwriting the end.
also fixes the new LRU algorithm to balance by total bytes used rather than total chunks used, since total chunks used isn't tracked for multi-chunk items. also fixes a bug where the lru limit wasn't being utilized for HOT_LRU also some cleanup from previous commits.
more rules around item sizes now. slab chunks automatically adjust if requesting memory > 1M
when doing tests with long strings the results are buffered... so doing thousands of tests with long strings was using more than a gig of ram. now we just summarize if any were different. also fixes more offset stuff.
Work in progress PR for chunked item support. As of writing this is largely functional, but is incomplete and may need a refactor.
Items have always required a single chunk large enough to hold its header, key, and data. This is done via a slab allocator, which is split into many sized slabs. The "largest available slab", became the largest item you could store into memcached. This is where the 1 megabyte limit came in.
The larger the item size limit, the less efficient the slab allocator is, as there is more space between slab classes.
With this change, the max slab chunk size is decoupled from the max item size, and items larger than the max chunk size now consist of multiple chunks chained together.
This has a number of benefits:
There are some downsides:
This largely works. It's been bench tested but not torture tested. Some tests have been written and I've fiddled with it manually for a while. There's a decent amount of code left to write, and I feel like it might need a little rewriting or simplification. This implementation is making the deep slab allocator more aware of how items are structured which feels like a layer violation: items.c should instead ask the slab allocator for a specific number of chunks, but then we might have to loop through them twice, or pass a callback function into the slab allocator to initialize the chunks. Moving more of the implementation up to the item layer would allow it to more intelligently garbage collect when freeing up chunks for a new chunked item as well.
Could use some feedback on any obvious bugs I've left in.
Work remaining:
Punting or doing later:
Run burn-in tests on very large items (hundreds of megabytes)Any useful stats that could be added? Something to indicate how many large items have been seen, at least.