Slab rebalancer and slab automover improvements by dormando · Pull Request #113 · memcached/memcached

dormando · 2015-09-29T20:52:38Z

This PR is release-candidate ready

Use start options like: -o slab_reassign,slab_automove,lru_crawler,lru_maintainer

The slab rebalancer and automover have sat as a first-pass and proof of concept for several years.

This changeset aims to improve both to the point where it can be enabled by default in future versions as well as generally be more useful to people. The old automover would only move one page every 10-60 seconds, and did so very conservatively. Without an automover, long running instances of memcached can utilize memory poorly if the average size of items changes after the memory is full.

Slab rebalancer is improved to attempt to "rescue" items while moving a page. If free memory is available in the source slab class, copy and re-link items which are still valid, instead of silently evicting them.
New automover default mode will aggressively return pages to a "global pool" if there is more than 2*pagesize free chunks available in a slab class. Pages will then distribute as-needed back into any slab class.
Returned an experimental automove=2 mode, which aggressively requests random memory be assigned to a slab class on any eviction.
One more forthcoming patch to optionally make automove decisions even if all memory is full (based on eviction pressure, most likely).

Some work remains for this branch:

automove=3 mode + tunables for automoving while classes are full. [may not do this, see below]
Review for cleanups, extra counters, or verbose logging. [partly done]
Documentation of new tunables and variables.

automove=3

Will likely punt on this for now. I cannot determine the value of an object being evicted, so it is difficult to write an algorithm to generically move pages between slab classes when all memory is full and there are nothing but evictions. The right choice there is very dependent on the use case. I'll discuss some options below.

pull pages in from other classes if free chunks are below a watermark (ie; half a page)

Still need to decide how to pull pages from other classes most effectively. Could do it randomly, or pull from the one with the most pages at the moment.

Weighted shuffling of pages between slab classes to amortize evictions

If class 2 has 5 evictions per second, and class 1 has 2 evictions per second, slowly move pages from class 1 to 2.
If class 2 has 5 evictions per second and class 1 has 0 evictions per second, slowly move pages from class 1 to 2.
If class 2 has evictions where the item has been fetched before, and class 1 has either 0 evictions, or evictions where the items have not been fetched before, slowly move pages from 1 to 2.
- Weakness: if pages are moving through class 1 so quickly they never get a chance to be fetched. Could slew against the number of overall sets or evictions.
If class 2 has evictions where the item has been fetched before, and class 1 has either 0 evictions, evictions where the item has not been fetched before, and/or the last accessed time on class 2 is significantly lower then class 1, slowly move pages from 1 to 2.
- Similar problem as the option above.

If you need to have memory always evicting but still rebalance the slab pages, there is still the option of manually running the reassign command. centralized or per-host daemons can monitor the various stats commands once per N seconds and reassign pages in a way that fits the needs of the particular usage scenario.

The other changes in this branch related to rescuing items when possible should make it less traumatic for the hit ratio to arbitrarily move pages. What would improve this even more is a way to signal to the system to evict items from the tail before moving a slab page, if the slab class is full in the first place. Then items could always be rescued and force evictions at the tail rather than simply be evicted if no free chunks are available.

Test is a port of a golang test submitted by Scott Mansfield. There used to be an "angry birds mode" to slabs_automove, which attempts to force a slab move from "any" slab into the one which just had an eviction. This is an imperfect but fast way of responding to shifts in memory requirements. This change adds it back in plus a test which very quickly attempts to set data in via noreply. This isn't the end of improvements here. This commit is a starting point.

During a slab page move items are typically ejected regardless of their validity. Now, if an item is valid and free chunks are available in the same slab class, copy the item over and replace it. It's up to external systems to try to ensure free chunks are available before moving a slab page. If there is no memory it will simply evict them as normal. Also adds counters so we can finally tell how often these cases happen.

used to take the newest page of the page list and replace the oldest page with it. so only the first page we move from a slab class will actually be "old". instead, actually burn the slight CPU to shuffle all of the pointers down one. Now we always chew the oldest page.

If any slab classes have more than two pages worth of free chunks, attempt to free one page back to a global pool. Create new concept of a slab page move destination of "0", which is a global page pool. Pages can be re-assigned out of that pool during allocation. Combined with item rescuing from the previous patch, we can safely shuffle pages back to the reassignment pool as chunks free up naturally. This should be a safe default going forward. Users should be able to decide to free or move pages based on eviction pressure as well. This is coming up in another commit. This also fixes a calculation of the NOEXP LRU size, and completely removes the old slab automover thread. Slab automove decisions will now be part of the lru maintainer thread.

some new variables and change to the '1' mode. little sad nobody noticed I'd accidentally removed the '2' mode for a few versions.

Thanks Devon :)

If item does not have ITEM_SLABBED bit, or ITEM_LINKED bit, logic was falling through, defaulting to MOVE_PASS. If an item has had storage allocated via item_alloc(), but haven't completed the data upload, it will sit in this mode. With MOVE_PASS for an item in this state, if no other items trip the busy re-scan of the page the mover will consider the page completely wiped even with the outstanding item. The hilarious bit is I'd clearly thought this through: the top comment states the if this, then this, or that... with the "or that" logic completely missing. Add one line of code and it survived a 5 hour torture test, where before it crashed after 30-60 minutes. Leaves some handy debug code #ifdef'ed out. Also moves the memset wipe on page move completion to only happen if the page isn't being returned to the global page pool, as the page allocator does a memset and chunk-split. Thanks to Scott Mansfield for the initial information eventually leading to this discovery.

During an item rescue, item size was being added to the slab class when the new chunk requested, and then not removed again from the total if the item was successfully rescued. Now just always remove from the total.

uses the slab_rebal struct to summarize stats, more occasionally grabbing the global lock to fill them in, instead.

if we're deciding to move pages right on the chunk boundary it's too easy to cause flapping.

class 255 is now a legitimate class, used by the NOEXP LRU when the expirezero_does_not_evict flag is enabled. Instead, we now force a single bit ITEM_SLABBED when a chunk is returned to the slabber, and ITEM_SLABBED|ITEM_FETCHED means it's been cleared for a page move. item_alloc overwrites the chunk's flags on set. The only weirdness was slab_free |='ing in the ITEM_SLABBED bit. I tracked that down to a commit in 2003 titled "more debugging" and can't come up with a good enough excuse for preserving an item's flags when it's been returned to the free memory pool. So now we overload the flag meaning.

gross oversight putting two conditions into the same variable. now can tell if we're evicting because we're hitting the bottom of the free memory pool, or if we keep trying to rescue items into the same page as the one being cleared.

previously the slab mover would evict items if the new chunk was within the slab page being moved. now it will do an inline reclaim of the chunk and try until it runs out of memory.

mem_alloced was getting increased every time a page was assigned out of either malloc or the global page pool. This means total_malloced will inflate forever as pages are reused, and once limit_maxbytes is surpassed it will stop attempting to malloc more memory. The result is we would stop malloc'ing new memory too early if page reclaim happens before the whole thing fills. The test already caused this condition, so adding the extra checks was trivial.

dormando · 2015-11-19T07:21:03Z

This is now merged to master. Release will take a little while as I have to divorce from google code.

dormando self-assigned this Sep 29, 2015

dormando mentioned this pull request Sep 29, 2015

Slab rebalancer and slab automover improvements #112

Closed

dormando force-pushed the slab_rebal_next branch from 09543d5 to 4da02f7 Compare September 29, 2015 22:38

dormando mentioned this pull request Sep 30, 2015

Release memory back to system #93

Closed

dormando force-pushed the slab_rebal_next branch from adf0308 to d92875c Compare October 11, 2015 10:51

dormando added 14 commits October 26, 2015 15:51

documentation for slab rebal updates

cb1951f

some new variables and change to the '1' mode. little sad nobody noticed I'd accidentally removed the '2' mode for a few versions.

fix off by one in slab shuffling

9b94d72

Thanks Devon :)

"mem_requested" from "stats slabs" is now accurate

9fe5b80

During an item rescue, item size was being added to the slab class when the new chunk requested, and then not removed again from the total if the item was successfully rescued. Now just always remove from the total.

call STATS_LOCK() less in slab mover.

374f2a9

uses the slab_rebal struct to summarize stats, more occasionally grabbing the global lock to fill them in, instead.

tune automove to required 2.5 pages of free chunks

86f4193

if we're deciding to move pages right on the chunk boundary it's too easy to cause flapping.

split rebal_evictions into _nomem and _samepage

4fa8164

gross oversight putting two conditions into the same variable. now can tell if we're evicting because we're hitting the bottom of the free memory pool, or if we keep trying to rescue items into the same page as the one being cleared.

try harder to save items

dc58230

previously the slab mover would evict items if the new chunk was within the slab page being moved. now it will do an inline reclaim of the chunk and try until it runs out of memory.

dormando force-pushed the slab_rebal_next branch from 7c52061 to bd9bc1c Compare October 26, 2015 22:51

dormando mentioned this pull request Nov 11, 2015

Fix eviction accumulation in item_stats_evictions(). #116

Closed

dormando closed this Nov 19, 2015

mckelvin mentioned this pull request Jan 18, 2016

Evictions rate increased 3x after enabling slab_automove and lru_maintainer at the same time #137

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Slab rebalancer and slab automover improvements#113

Slab rebalancer and slab automover improvements#113
dormando wants to merge 14 commits into
memcached:nextfrom
dormando:slab_rebal_next

dormando commented Sep 29, 2015

Uh oh!

dormando commented Nov 19, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

dormando commented Sep 29, 2015

automove=3

pull pages in from other classes if free chunks are below a watermark (ie; half a page)

Weighted shuffling of pages between slab classes to amortize evictions

Uh oh!

dormando commented Nov 19, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant