hacked in some basic idle timeout functionality#62
Conversation
…high load. timeout code can now enabled via autoconf with --enable-idle-timeouts a new stat, "idle_disconnected_connections" will count the total idle disconnects over time. verbose mode will emit timeout event info
|
I have a feeling there's a less intensive way to do this on top of a PR steven grimm has open right now. On top of steven's patch you could enable a background thread that slowly scans the connection structures for idlers. |
|
The "stats conns" stuff went into 1.4.18. Would you have any interest in trying for a more efficient implementation? A background thread could do dirty reads against the connections array: if it looks like it's timed out, issue a command to the owner thread's notify pipe. The owner thread would then read the notification, do a clean verification, and close it in correct context. Avoids having to add locks to the whole mess and is potentially much more efficient than using libevent timers. |
|
Sure, I'll take a look at 1.4.18. |
|
Question w.r.t. libevent, backward compatibility/portability and threading. Is it considered safe to call event_del(), event_add(), and friends from a thread other than the one that is running the event loop for a given event_base? I know "newer" libevent provides event_active(), evthread_make_base_notifiable() and even evthread_use_pthreads() but I'm not sure what level of backwards compatibility it is necessary to ensure. |
|
No, don't do that. That's why I was suggesting using the thread notifyer fd to pull the context into the correct thread/event base. People run this thing on pretty ancient systems, and using "fancy" libevent features under load has lead to tears in the past anyway. |
|
Okay. I was just trying to keep it really clean and touch as little pre-existing source as possible, but I can add another case in the switch statement in thread.c/thread_libevent_process() to dispatch to a timeout handler. |
|
It should still (for this independent feature) be a lot smaller than your original patch (I hope) since the "last accessed time" bits were added with seven's change. but yeah the uh... if you want to do event_del or _add you need to protect the base with a mutex I'm pretty sure. You also would have to protect the connections with a mutex in order to do a clean read of the timeout data.. so you're sort of forced to check it again from the right thread context anyway. |
|
I don't know that it will be technically "smaller" -- although not necessarily "bigger" either. But certainly it will be much more centralized, contained and performant. I'm not going to wrestle with synchronizing event_* calls on events "owned" by other thread's event bases as that seems to be over-engineering for what is a fairly simple feature. I'm a very slight bit perturbed by accessing last_cmd_time without synchronization, even if just for read purposes. Granted it will be double checked by the thread actually owning the connection so the worst thing that could happen is a timeout could get missed due to really old esoteric hardware, misaligning compilers, pipeline optimizing compilers or some combination of these. Perhaps alter the |
|
volatile doesn't help as much as you'd think, and could potentially slow down general access to it. If it's on a 4-byte or 8-byte-aligned (x86-64/etc) boundary the whole thing is an atomic read on most platforms, so you can't get corrupt numbers, just potentially old ones. Old by nanoseconds or microseconds won't matter for much given that in order to trip the checker the value can't have been modified for a long period of time. (minutes/hours). That narrows the race to "it was old enough to get reaped and is suddenly not too old". yeah it's a bit weird since you're using side effects, and you should still check the alignment of the read, because if the structure is something like: Since misreads are only a mild cpu burn compared to a constant cpu burn of wrapping everything properly I'm okay with it. This is a bit weird and I'd still consider good feedback tho :) |
|
Bit of a snag. The only solution that comes to mind immediately would be setting |
|
That seems fine. Just initialize it when something is connected... that's probably a legit bug anyway. |
|
sad we never finished it. maybe I can drag it over the line... needs tests and validation and etc. |
|
going to close this out as we're going with the other version. This was highly educational as for what the needs are though. Thanks for your time on this (a whopping two years ago...) I've also done some benchmarking with adding libevent timers to an event loop and there was a nontrivial amount of loss somehow. :/ |
Idle timeout support:
The following PR provides basic idle timeout support for the 1.4 tree. I realize this isn't the "active development" branch, but it is the one I was required to work with so this is what I'm sending back upstream.
Timeouts are configured globally, in seconds, via the
-Tcommand line argument. Default timeout is 0 -- no timeout. Each connection's timer is, of course, distinct. Each connection's idle timer is reset to the global initial value during any I/O and cleared when a connection is finalized. Idle timeouts are not meaningful with regard to UDP sockets and thus are not applied to them. Timeouts are handled synchronously, via libevent, and will perform a clean disconnect if/when they fire on a valid active connection.Idle timeout support is enabled via
./configure --enable-idle-timouts, but one must still supply-Ton the command line to activate at run-time.A new stat is provided to track idle disconnections -
idle_disconnected_connections. This counter can be viewed just like the rest of the general statistics.Use of this feature is not recommended in extremely high connection count environments as the overhead could begin to take its toll when there are many thousands of concurrent connections.