Skip to content

LRU crawler "metadump" command and refactor#193

Closed
dormando wants to merge 12 commits into
memcached:nextfrom
Netflix-Skunkworks:crawler_new
Closed

LRU crawler "metadump" command and refactor#193
dormando wants to merge 12 commits into
memcached:nextfrom
Netflix-Skunkworks:crawler_new

Conversation

@dormando

@dormando dormando commented Aug 11, 2016

Copy link
Copy Markdown
Member

Finally! You can dump the cache. metadata.


  • New command: lru_crawler metadump [x,y,x|all]

ie: lru_crawler metadump all

Dumps output in the form key=foo cas=1\n etc from the bottom of the cache up to the top.

Notes:

  • This is not a multi-versioned consistent snapshot. This is a live-walk, which means it's possible to get dupes if items shift around, or are overwritten between when the LRU crawler first sees them at the bottom and when it gets to the top of the LRU. You can use the CAS values from the dump to avoid processing duplicates of the same exact key. Keys with different cas ids signify different values.
  • Only one dump can run at a time. This is a necessary restriction to avoid performance problems.
  • The value isn't included with the dump (hence meta[data]dump), which allows clients to fetch values in parallel while processing the feed dump. That is the fastest method of pulling all the data out of the cache.
  • This is highly performant and does hold not locks. It functions via a single side thread which holds locks for short periods of time and releases them between each item examined. There are default short sleeps between every 1000 items examined as well. It can use up to one extra core of CPU while the dump is being processed.
  • IE: My intel NUC can dump 50 million items in under a minute. It isn't that fast of a machine.

Other improvements:

  • The internal LRU crawler auto-run code for clearing expired items has been improved significantly. Should burn a lot less excess CPU. Still hoping for more feedback on how that should operate.
  • A number of bugs were fixed along the way, some will likely be backported before this is merged.
  • The LRU crawler is now its own unique file and has an internal plugin-type structure. This means adding further LRU crawler functions is trivial.

This feature and fixes have been sponsored by Netflix. Much thanks goes out to them for investing the time to do this feature correctly and ensure it is thoroughly tested.

As of this writing the feature is still being tested but the PR is ready to be opened and staged while we wait. Although there's more I'd like to do, I can do that on my own in future releases :) At this point the code is stabilized and we're only looking for bugs and some minor additions to the dump output.

dormando added 12 commits July 18, 2016 14:42
is_listen_thread() was removed from service after the new listen sockets were
added. this removes the rest of the code.
Functionality is nearly all there. A handful of FIXME's and TODO's to address.
From there it needs to be refactored into something proper.
it's useful to see when this happens in realtime right now.
... instead of reltime.
not 100% sure if it should do this in all cases, or only if the current crawl
type requested it.

this is difficult to test. while using a print in the early return condition I
could occasionally get it to fire once. this may not be correct yet.
also fixes a bug where metadump was closing the client connection after a
single slab class.

not ported to the logger yet.
now has internal module system for the LRU crawler.

autocrawl checker should be a bit better now. doesn't
constantly re-run the histogram calcs.

metadump works as a module now. ended up generalizing the client case outside
of the module system since it looks reusable. Cut the amount of functions
required for metadump specifically to nothing.

still need to bug hunt, a few more smaller refactors, and see about pulling
this out into its own file.
-I 2m would still allocate 2mb pages, then only use 1mb of it, halving memory
capacity.
~600 lines gone from items.c makes it a lot more manageable.

this change is almost purely moving code around and renaming functions. very
little logic has changed.
@floatingatoll

Copy link
Copy Markdown

Is "does hold locks" meant to read "does not hold locks"?

@dormando

Copy link
Copy Markdown
Member Author

Yes, thank you! Text updated.

@dormando

Copy link
Copy Markdown
Member Author

merged

@dormando dormando closed this Aug 20, 2016
@guandongyue

Copy link
Copy Markdown

memcached-1.4.31
libmemcached-1.0.18 memcached-2.2.0 PHP 5.6.17

$m = new Memcached();
$m->addServers( array('127.0.0.1', 11211, 100) );
$m->getAllKeys();
echo $m->getResultCode().":".$m->getResultMessage();

result:
9:CLIENT ERROR

What version with recommendations?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants