Skip to content

Slab rebalancer and slab automover improvements#113

Closed
dormando wants to merge 14 commits into
memcached:nextfrom
dormando:slab_rebal_next
Closed

Slab rebalancer and slab automover improvements#113
dormando wants to merge 14 commits into
memcached:nextfrom
dormando:slab_rebal_next

Conversation

@dormando

Copy link
Copy Markdown
Member

This PR is release-candidate ready

Use start options like: -o slab_reassign,slab_automove,lru_crawler,lru_maintainer

The slab rebalancer and automover have sat as a first-pass and proof of concept for several years.

This changeset aims to improve both to the point where it can be enabled by default in future versions as well as generally be more useful to people. The old automover would only move one page every 10-60 seconds, and did so very conservatively. Without an automover, long running instances of memcached can utilize memory poorly if the average size of items changes after the memory is full.

  1. Slab rebalancer is improved to attempt to "rescue" items while moving a page. If free memory is available in the source slab class, copy and re-link items which are still valid, instead of silently evicting them.
  2. New automover default mode will aggressively return pages to a "global pool" if there is more than 2*pagesize free chunks available in a slab class. Pages will then distribute as-needed back into any slab class.
  3. Returned an experimental automove=2 mode, which aggressively requests random memory be assigned to a slab class on any eviction.
  4. One more forthcoming patch to optionally make automove decisions even if all memory is full (based on eviction pressure, most likely).

Some work remains for this branch:

  • automove=3 mode + tunables for automoving while classes are full. [may not do this, see below]
  • Review for cleanups, extra counters, or verbose logging. [partly done]
  • Documentation of new tunables and variables.

automove=3

Will likely punt on this for now. I cannot determine the value of an object being evicted, so it is difficult to write an algorithm to generically move pages between slab classes when all memory is full and there are nothing but evictions. The right choice there is very dependent on the use case. I'll discuss some options below.

pull pages in from other classes if free chunks are below a watermark (ie; half a page)

  • Still need to decide how to pull pages from other classes most effectively. Could do it randomly, or pull from the one with the most pages at the moment.

Weighted shuffling of pages between slab classes to amortize evictions

  • If class 2 has 5 evictions per second, and class 1 has 2 evictions per second, slowly move pages from class 1 to 2.
  • If class 2 has 5 evictions per second and class 1 has 0 evictions per second, slowly move pages from class 1 to 2.
  • If class 2 has evictions where the item has been fetched before, and class 1 has either 0 evictions, or evictions where the items have not been fetched before, slowly move pages from 1 to 2.
    • Weakness: if pages are moving through class 1 so quickly they never get a chance to be fetched. Could slew against the number of overall sets or evictions.
  • If class 2 has evictions where the item has been fetched before, and class 1 has either 0 evictions, evictions where the item has not been fetched before, and/or the last accessed time on class 2 is significantly lower then class 1, slowly move pages from 1 to 2.
    • Similar problem as the option above.

If you need to have memory always evicting but still rebalance the slab pages, there is still the option of manually running the reassign command. centralized or per-host daemons can monitor the various stats commands once per N seconds and reassign pages in a way that fits the needs of the particular usage scenario.

The other changes in this branch related to rescuing items when possible should make it less traumatic for the hit ratio to arbitrarily move pages. What would improve this even more is a way to signal to the system to evict items from the tail before moving a slab page, if the slab class is full in the first place. Then items could always be rescued and force evictions at the tail rather than simply be evicted if no free chunks are available.

Test is a port of a golang test submitted by Scott Mansfield.

There used to be an "angry birds mode" to slabs_automove, which attempts to
force a slab move from "any" slab into the one which just had an eviction.
This is an imperfect but fast way of responding to shifts in memory
requirements.

This change adds it back in plus a test which very quickly attempts to set
data in via noreply. This isn't the end of improvements here. This commit is a
starting point.
During a slab page move items are typically ejected regardless of their
validity. Now, if an item is valid and free chunks are available in the same
slab class, copy the item over and replace it.

It's up to external systems to try to ensure free chunks are available before
moving a slab page. If there is no memory it will simply evict them as normal.

Also adds counters so we can finally tell how often these cases happen.
used to take the newest page of the page list and replace the oldest page with
it. so only the first page we move from a slab class will actually be "old".
instead, actually burn the slight CPU to shuffle all of the pointers down one.
Now we always chew the oldest page.
If any slab classes have more than two pages worth of free chunks, attempt to
free one page back to a global pool.

Create new concept of a slab page move destination of "0", which is a global
page pool. Pages can be re-assigned out of that pool during allocation.

Combined with item rescuing from the previous patch, we can safely shuffle
pages back to the reassignment pool as chunks free up naturally. This should
be a safe default going forward. Users should be able to decide to free or
move pages based on eviction pressure as well. This is coming up in another
commit.

This also fixes a calculation of the NOEXP LRU size, and completely removes
the old slab automover thread. Slab automove decisions will now be part of the
lru maintainer thread.
some new variables and change to the '1' mode. little sad nobody noticed I'd
accidentally removed the '2' mode for a few versions.
If item does not have ITEM_SLABBED bit, or ITEM_LINKED bit, logic was falling
through, defaulting to MOVE_PASS. If an item has had storage allocated via
item_alloc(), but haven't completed the data upload, it will sit in this mode.
With MOVE_PASS for an item in this state, if no other items trip the busy
re-scan of the page the mover will consider the page completely wiped even
with the outstanding item.

The hilarious bit is I'd clearly thought this through: the top comment states
the if this, then this, or that... with the "or that" logic completely
missing. Add one line of code and it survived a 5 hour torture test, where
before it crashed after 30-60 minutes.

Leaves some handy debug code #ifdef'ed out. Also moves the memset wipe on page
move completion to only happen if the page isn't being returned to the global
page pool, as the page allocator does a memset and chunk-split.

Thanks to Scott Mansfield for the initial information eventually leading to
this discovery.
During an item rescue, item size was being added to the slab class when the
new chunk requested, and then not removed again from the total if the item was
successfully rescued. Now just always remove from the total.
uses the slab_rebal struct to summarize stats, more occasionally grabbing the
global lock to fill them in, instead.
if we're deciding to move pages right on the chunk boundary it's too easy to
cause flapping.
class 255 is now a legitimate class, used by the NOEXP LRU when the
expirezero_does_not_evict flag is enabled. Instead, we now force a single bit
ITEM_SLABBED when a chunk is returned to the slabber, and
ITEM_SLABBED|ITEM_FETCHED means it's been cleared for a page move.

item_alloc overwrites the chunk's flags on set. The only weirdness was
slab_free |='ing in the ITEM_SLABBED bit. I tracked that down to a commit in
2003 titled "more debugging" and can't come up with a good enough excuse for
preserving an item's flags when it's been returned to the free memory pool. So
now we overload the flag meaning.
gross oversight putting two conditions into the same variable. now can tell if
we're evicting because we're hitting the bottom of the free memory pool, or if
we keep trying to rescue items into the same page as the one being cleared.
previously the slab mover would evict items if the new chunk was within the
slab page being moved. now it will do an inline reclaim of the chunk and try
until it runs out of memory.
mem_alloced was getting increased every time a page was assigned out of either
malloc or the global page pool. This means total_malloced will inflate forever
as pages are reused, and once limit_maxbytes is surpassed it will stop
attempting to malloc more memory.

The result is we would stop malloc'ing new memory too early if page reclaim
happens before the whole thing fills. The test already caused this condition,
so adding the extra checks was trivial.
@dormando

Copy link
Copy Markdown
Member Author

This is now merged to master. Release will take a little while as I have to divorce from google code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant