Slab rebalancer and slab automover improvements#113
Closed
dormando wants to merge 14 commits into
Closed
Conversation
09543d5 to
4da02f7
Compare
adf0308 to
d92875c
Compare
Test is a port of a golang test submitted by Scott Mansfield. There used to be an "angry birds mode" to slabs_automove, which attempts to force a slab move from "any" slab into the one which just had an eviction. This is an imperfect but fast way of responding to shifts in memory requirements. This change adds it back in plus a test which very quickly attempts to set data in via noreply. This isn't the end of improvements here. This commit is a starting point.
During a slab page move items are typically ejected regardless of their validity. Now, if an item is valid and free chunks are available in the same slab class, copy the item over and replace it. It's up to external systems to try to ensure free chunks are available before moving a slab page. If there is no memory it will simply evict them as normal. Also adds counters so we can finally tell how often these cases happen.
used to take the newest page of the page list and replace the oldest page with it. so only the first page we move from a slab class will actually be "old". instead, actually burn the slight CPU to shuffle all of the pointers down one. Now we always chew the oldest page.
If any slab classes have more than two pages worth of free chunks, attempt to free one page back to a global pool. Create new concept of a slab page move destination of "0", which is a global page pool. Pages can be re-assigned out of that pool during allocation. Combined with item rescuing from the previous patch, we can safely shuffle pages back to the reassignment pool as chunks free up naturally. This should be a safe default going forward. Users should be able to decide to free or move pages based on eviction pressure as well. This is coming up in another commit. This also fixes a calculation of the NOEXP LRU size, and completely removes the old slab automover thread. Slab automove decisions will now be part of the lru maintainer thread.
some new variables and change to the '1' mode. little sad nobody noticed I'd accidentally removed the '2' mode for a few versions.
Thanks Devon :)
If item does not have ITEM_SLABBED bit, or ITEM_LINKED bit, logic was falling through, defaulting to MOVE_PASS. If an item has had storage allocated via item_alloc(), but haven't completed the data upload, it will sit in this mode. With MOVE_PASS for an item in this state, if no other items trip the busy re-scan of the page the mover will consider the page completely wiped even with the outstanding item. The hilarious bit is I'd clearly thought this through: the top comment states the if this, then this, or that... with the "or that" logic completely missing. Add one line of code and it survived a 5 hour torture test, where before it crashed after 30-60 minutes. Leaves some handy debug code #ifdef'ed out. Also moves the memset wipe on page move completion to only happen if the page isn't being returned to the global page pool, as the page allocator does a memset and chunk-split. Thanks to Scott Mansfield for the initial information eventually leading to this discovery.
During an item rescue, item size was being added to the slab class when the new chunk requested, and then not removed again from the total if the item was successfully rescued. Now just always remove from the total.
uses the slab_rebal struct to summarize stats, more occasionally grabbing the global lock to fill them in, instead.
if we're deciding to move pages right on the chunk boundary it's too easy to cause flapping.
class 255 is now a legitimate class, used by the NOEXP LRU when the expirezero_does_not_evict flag is enabled. Instead, we now force a single bit ITEM_SLABBED when a chunk is returned to the slabber, and ITEM_SLABBED|ITEM_FETCHED means it's been cleared for a page move. item_alloc overwrites the chunk's flags on set. The only weirdness was slab_free |='ing in the ITEM_SLABBED bit. I tracked that down to a commit in 2003 titled "more debugging" and can't come up with a good enough excuse for preserving an item's flags when it's been returned to the free memory pool. So now we overload the flag meaning.
gross oversight putting two conditions into the same variable. now can tell if we're evicting because we're hitting the bottom of the free memory pool, or if we keep trying to rescue items into the same page as the one being cleared.
previously the slab mover would evict items if the new chunk was within the slab page being moved. now it will do an inline reclaim of the chunk and try until it runs out of memory.
mem_alloced was getting increased every time a page was assigned out of either malloc or the global page pool. This means total_malloced will inflate forever as pages are reused, and once limit_maxbytes is surpassed it will stop attempting to malloc more memory. The result is we would stop malloc'ing new memory too early if page reclaim happens before the whole thing fills. The test already caused this condition, so adding the extra checks was trivial.
7c52061 to
bd9bc1c
Compare
Member
Author
|
This is now merged to master. Release will take a little while as I have to divorce from google code. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR is release-candidate ready
Use start options like:
-o slab_reassign,slab_automove,lru_crawler,lru_maintainerThe slab rebalancer and automover have sat as a first-pass and proof of concept for several years.
This changeset aims to improve both to the point where it can be enabled by default in future versions as well as generally be more useful to people. The old automover would only move one page every 10-60 seconds, and did so very conservatively. Without an automover, long running instances of memcached can utilize memory poorly if the average size of items changes after the memory is full.
Some work remains for this branch:
automove=3
Will likely punt on this for now. I cannot determine the value of an object being evicted, so it is difficult to write an algorithm to generically move pages between slab classes when all memory is full and there are nothing but evictions. The right choice there is very dependent on the use case. I'll discuss some options below.
pull pages in from other classes if free chunks are below a watermark (ie; half a page)
Weighted shuffling of pages between slab classes to amortize evictions
If you need to have memory always evicting but still rebalance the slab pages, there is still the option of manually running the reassign command. centralized or per-host daemons can monitor the various
statscommands once per N seconds and reassign pages in a way that fits the needs of the particular usage scenario.The other changes in this branch related to rescuing items when possible should make it less traumatic for the hit ratio to arbitrarily move pages. What would improve this even more is a way to signal to the system to evict items from the tail before moving a slab page, if the slab class is full in the first place. Then items could always be rescued and force evictions at the tail rather than simply be evicted if no free chunks are available.