Skip to content

idle connection kicker using 'stats conns' and event notifier frameworks#80

Closed
elfchief wants to merge 2 commits into
memcached:masterfrom
elfchief:idlekick
Closed

idle connection kicker using 'stats conns' and event notifier frameworks#80
elfchief wants to merge 2 commits into
memcached:masterfrom
elfchief:idlekick

Conversation

@elfchief

@elfchief elfchief commented Sep 4, 2014

Copy link
Copy Markdown
Contributor

Still needs some cleanup and a stats counter, but submitting now for review. Could you please let me know if I'm missing anything obvious and that I'm following the suggested path for making this happen?

Possible problems/issues/observations:

  1. it wasn't real obvious how many connections to do per run through the thread, or how long to sleep. I ended up picking an arbitrary number (100) and sleeping for enough time that it takes roughly 1 second to cycle through all the possible connections. This may be a bit heavyweight (we could probably get away with once per 10 seconds or once per minute)
  2. That number is just a #define right now. I don't really see much value in making it tunable at runtime, though.
  3. There might be a more efficient way to doing this by figuring out which connection would time out next, and then scanning the entire connection list to find a new one when that connection expires/updates, but that requires doing a full connection scan and probably a mutex'd variable that's checked on every command, plus is way more complex, so probably isn't worth it.
  4. Only works on idle (rather than stuck-in-transmission) connections. Not sure how hard it would be to make it kill 'active' connections too.

@zwChan

zwChan commented Oct 28, 2014

Copy link
Copy Markdown
Contributor

I am interested in this pull request, but in conn_timeout_thread, it not thread safety to get a connection without any lock. It may be free in other thread.

@elfchief

Copy link
Copy Markdown
Contributor Author

This is, surprisingly, actually okay. The connection array is statically allocated, so the worst that will happen is we'll get a corrupt read on the last command time -- but that's okay, because all it does is trigger a write to the notify fd for the thread that owns the connection, which re-verifies that the connection is actually idle.

Worst case, that notification happens when it shouldn't, the connection's thread gets the notification, discovers that the connection isn't actually idle, and nothing happens.

Look at the implementation for 'stats conns' ... the connection array is actually intended to be used this way.

@zwChan

zwChan commented Oct 29, 2014

Copy link
Copy Markdown
Contributor

You are right, I just overlook the truth that conn_free(conn *c) is not called when connect is closed.

There is also another concern about this feature: Some of the memcached client(such as java memcached client on github) use the socket pool to connect to memcached server, and the socket pool are keep some minimum number connects(It is a practical method to lower the latency. e.g. we keep 5 connection at lease as default). If kicker close the connection, the client will create new connection for keeping the minimum available connection. We also encounter the problem that the server's connection too many(more than 10k, as there are lots of clients), but most of time the cause is that some developers trend to use a high minimum number of connection(It can be configured), they think it will make the application more "safe" :)

@dormando

dormando commented Jan 1, 2015

Copy link
Copy Markdown
Member

This is more or less the right approach. Feels weird putting it into memcached.c though - but I guess the rest of the conn structures are in there already.

There're some whitespace changes that make the code changes harder to see; please submit those as a separate PR. There's also a TODO or two in here? And the docs files need updating.

If you want to get fancy, you can adjust the scan time based on the given idle timeout and maybe a histogram. On a scan, populate some buckets for how many connections have been idle for so long. If the idle_timeout is 60, and 0 connections were idle when you scanned, there's no point to waking up until 60s later.

If 10,000 connections are open, idle_timeout is 60, and the last scan had 100 conns idle for at most 50s, you can wake up in 10s and scan again. Probably no point in optimizing it further, but you could in a few ways.

@dormando

dormando commented Jan 1, 2015

Copy link
Copy Markdown
Member

Oh, and use the per-thread stats for the stats counter. Very few things should be under the global stats lock.

@dormando

Copy link
Copy Markdown
Member

Any interest in finishing this thing? :P I think this might be closer than the other one (#62). I'd like to get this into a release after 1.4.25. I'll end up cleaning it up myself but it'd be nice if someone else did too.

@elfchief

Copy link
Copy Markdown
Contributor Author

I will look into finishing this up in the next week or two!

@dormando

Copy link
Copy Markdown
Member

perfect! Remember to rebase to master/next instead of merging it in. Let me know if you need a more thorough code review to go on as well. My old notes are still accurate as well.

@dormando

Copy link
Copy Markdown
Member

:'(

@elfchief

elfchief commented Jun 8, 2016

Copy link
Copy Markdown
Contributor Author

Yes, I am a horrible person. :( Life has gotten in the way much more than I'd really like. I do still intend to finish this up, unless you get to it first.

(I could also use more time in the day. I really don't understand why nobody has invented a time machine yet.)

@dormando

dormando commented Jun 8, 2016

Copy link
Copy Markdown
Member

:)

@elfchief elfchief force-pushed the idlekick branch 5 times, most recently from 8919ab1 to c65a413 Compare June 9, 2016 23:42
@elfchief

elfchief commented Jun 9, 2016

Copy link
Copy Markdown
Contributor Author

Okay, I updated this (finally).

Changes from last time:

  • Stats updates are now thread-local
  • Find the longest idle we're not in the process of booting, and sleep for long enough that we'll kick that one on-time when we wake next
  • Protocol docs updated
  • Added "stats settings" line for config setting

Still to do:

  • Split that whitespace change (I am apparently saving that for last)

@dormando

Copy link
Copy Markdown
Member

:))) Noting this again just in case: #62 - anything from their approach useful/etc. Otherwise I'll close it out in favor of this one.

@elfchief

Copy link
Copy Markdown
Contributor Author

Okay, there, split that whitespace change. ;)

I just flipped through #62, it's a valid approach, but I'm not sure it's a better approach, partially because it seems harder to follow what's going on. I think this one (mine) will also make it easier to add a timeout for "data pending for this client but no progress made in a long time", which I want to do in a future PR.

(Which I'm open to opinions on how best to add. I think adding most of the logic would be straightforward, but I'm not sure how to handle shutting down the socket correctly.)

And to be fair, #62 is probably more CPU-efficient, though I'd not expect the difference to be particularly noticeable.

@elfchief elfchief force-pushed the idlekick branch 2 times, most recently from 228d2bd to c4c7c41 Compare June 17, 2016 20:44
Comment thread memcached.c Outdated
int sleep_time;
useconds_t timeslice = 1000000 / (max_fds / CONNS_PER_SLICE);


Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extraneous newlines

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are hand-crafted artisan newlines! :)

@elfchief

Copy link
Copy Markdown
Contributor Author

Huh. I did a pass that caught most of that, must have gotten lost when I was rebasing stuff. Will fix.

Comment thread memcached.c Outdated
c->msgcurr = 0;
c->msgused = 0;
c->authenticated = false;
c->last_cmd_time = current_time;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a comment would be nice here: initializing for the idle kicker.

@dormando

Copy link
Copy Markdown
Member

left some pretty minor style comments. read over the other PR again, and I think this might end up being less resource intensive. Managing events in libevent adds more overhead than you'd expect: I've benchmarked just adding one firing once per second in the main loop and it dropped more than 5% of overall performance.

So I like this one better? it has docs and this and that.

@elfchief

Copy link
Copy Markdown
Contributor Author

Ok, should have all those fixed up.

@dormando

Copy link
Copy Markdown
Member

Thanks! can you fix the one minor issue on the other PR please?

@dormando

Copy link
Copy Markdown
Member

(looks like you did fix it here, you just need to push it to the other branch)

@dormando

Copy link
Copy Markdown
Member

merged into next, thanks!

could I interest you in any other small tasks? :)

@elfchief

Copy link
Copy Markdown
Contributor Author

You could! Whatcha got?

@dormando

Copy link
Copy Markdown
Member

shoot me an e-mail directly and we'll discuss :)

@dormando dormando closed this Jun 24, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants