idle connection kicker using 'stats conns' and event notifier frameworks#80
idle connection kicker using 'stats conns' and event notifier frameworks#80elfchief wants to merge 2 commits into
Conversation
|
I am interested in this pull request, but in conn_timeout_thread, it not thread safety to get a connection without any lock. It may be free in other thread. |
|
This is, surprisingly, actually okay. The connection array is statically allocated, so the worst that will happen is we'll get a corrupt read on the last command time -- but that's okay, because all it does is trigger a write to the notify fd for the thread that owns the connection, which re-verifies that the connection is actually idle. Worst case, that notification happens when it shouldn't, the connection's thread gets the notification, discovers that the connection isn't actually idle, and nothing happens. Look at the implementation for 'stats conns' ... the connection array is actually intended to be used this way. |
|
You are right, I just overlook the truth that conn_free(conn *c) is not called when connect is closed. There is also another concern about this feature: Some of the memcached client(such as java memcached client on github) use the socket pool to connect to memcached server, and the socket pool are keep some minimum number connects(It is a practical method to lower the latency. e.g. we keep 5 connection at lease as default). If kicker close the connection, the client will create new connection for keeping the minimum available connection. We also encounter the problem that the server's connection too many(more than 10k, as there are lots of clients), but most of time the cause is that some developers trend to use a high minimum number of connection(It can be configured), they think it will make the application more "safe" :) |
|
This is more or less the right approach. Feels weird putting it into memcached.c though - but I guess the rest of the conn structures are in there already. There're some whitespace changes that make the code changes harder to see; please submit those as a separate PR. There's also a TODO or two in here? And the docs files need updating. If you want to get fancy, you can adjust the scan time based on the given idle timeout and maybe a histogram. On a scan, populate some buckets for how many connections have been idle for so long. If the idle_timeout is 60, and 0 connections were idle when you scanned, there's no point to waking up until 60s later. If 10,000 connections are open, idle_timeout is 60, and the last scan had 100 conns idle for at most 50s, you can wake up in 10s and scan again. Probably no point in optimizing it further, but you could in a few ways. |
|
Oh, and use the per-thread stats for the stats counter. Very few things should be under the global stats lock. |
|
Any interest in finishing this thing? :P I think this might be closer than the other one (#62). I'd like to get this into a release after 1.4.25. I'll end up cleaning it up myself but it'd be nice if someone else did too. |
|
I will look into finishing this up in the next week or two! |
|
perfect! Remember to rebase to master/next instead of merging it in. Let me know if you need a more thorough code review to go on as well. My old notes are still accurate as well. |
|
:'( |
|
Yes, I am a horrible person. :( Life has gotten in the way much more than I'd really like. I do still intend to finish this up, unless you get to it first. (I could also use more time in the day. I really don't understand why nobody has invented a time machine yet.) |
|
:) |
8919ab1 to
c65a413
Compare
|
Okay, I updated this (finally). Changes from last time:
Still to do:
|
|
:))) Noting this again just in case: #62 - anything from their approach useful/etc. Otherwise I'll close it out in favor of this one. |
|
Okay, there, split that whitespace change. ;) I just flipped through #62, it's a valid approach, but I'm not sure it's a better approach, partially because it seems harder to follow what's going on. I think this one (mine) will also make it easier to add a timeout for "data pending for this client but no progress made in a long time", which I want to do in a future PR. (Which I'm open to opinions on how best to add. I think adding most of the logic would be straightforward, but I'm not sure how to handle shutting down the socket correctly.) And to be fair, #62 is probably more CPU-efficient, though I'd not expect the difference to be particularly noticeable. |
228d2bd to
c4c7c41
Compare
| int sleep_time; | ||
| useconds_t timeslice = 1000000 / (max_fds / CONNS_PER_SLICE); | ||
|
|
||
|
|
There was a problem hiding this comment.
These are hand-crafted artisan newlines! :)
|
Huh. I did a pass that caught most of that, must have gotten lost when I was rebasing stuff. Will fix. |
| c->msgcurr = 0; | ||
| c->msgused = 0; | ||
| c->authenticated = false; | ||
| c->last_cmd_time = current_time; |
There was a problem hiding this comment.
a comment would be nice here: initializing for the idle kicker.
|
left some pretty minor style comments. read over the other PR again, and I think this might end up being less resource intensive. Managing events in libevent adds more overhead than you'd expect: I've benchmarked just adding one firing once per second in the main loop and it dropped more than 5% of overall performance. So I like this one better? it has docs and this and that. |
|
Ok, should have all those fixed up. |
|
Thanks! can you fix the one minor issue on the other PR please? |
|
(looks like you did fix it here, you just need to push it to the other branch) |
|
merged into next, thanks! could I interest you in any other small tasks? :) |
|
You could! Whatcha got? |
|
shoot me an e-mail directly and we'll discuss :) |
Still needs some cleanup and a stats counter, but submitting now for review. Could you please let me know if I'm missing anything obvious and that I'm following the suggested path for making this happen?
Possible problems/issues/observations: