Modern age logging (commands, evictions, etc)#127
Conversation
|
I'm working on improving memcached usage and this PR seems the best solution for troubleshooting. |
|
annnnd, I just ruined this PR by having it opened against master instead of 'next' :) oh well. |
|
Quite a bit further along now. It's somewhat inspectable, even. Next big chunk of work could be another cleanup, which might just require renaming and comments. Need to make obvious what code is part of the bg thread and what are the internal interfaces. Also clean up random FIXME's/TODO's. Ideally remove more of the globals as well. Then it's formatting and logging endpoints. It's more or less ready for user review while I do this though. |
Logs are written to per-thread buffers. A new background thread aggregates the logs, further processes them, then writes them to any "watchers". Logs can have the time added to them, and all have a GID so they can be put back into strict order. This is an early preview. Code needs refactoring and a more complete set of options. All watchers are also stuck viewing the global feed of logs, even if they asked for different data. As of this commit there's no way to toggle the "stderr" watcher.
very temporary user control. allows to watch either fetchers or evictions, but not both, and always with timestamps.
can't compare bit fields en-masse, which makes things too difficult.
Stop doing a multi-reader circular buffer structure on top of a circular buffer. Also adds individualized streams based off of the central buffer. Sadly this requires managing iovecs and dealing with partial writes into said iovecs. That makes things very complicated. Since it's not clear to me how to simplify it too much (as of this writing), one of the next commits should remove iovecs and instead give each watcher its own circular buffer. The parser thread will copy watched events into each buffer. The above would only be slower for the case of many watchers watching the same event streams. Given all of the extra loops required for managing the iovecs, and the more complicated syscall handling, it might even be the same speed to manage multiple buffers anyway. I completed this intermediary change since it simplifies the surrounding code and was educational to fiddle with iovecs again.
after the previous commit's exercise, this greatly simplifies the writing process. The buffers are large but can be tuned with performance testing. Lines are now rendered into scratch space, and if event flags (eflags) match for a watcher, copy the bytes into that watcher's specific buffer. In most realistic cases, watchers will be watching different streams of information. While this does require a minimum of one memcpy instead of directly writing into the bipbuf as before, it removes all of the loops and management of iovecs required for filtering events on the write-to-socket end.
When lines are skipped to a watcher, a [skipped: n] is printed before resuming. Also polls full watcher directly once, and will wait until the outer loop to re-poll. Think the routine can be simplified... it works so will leave it and revisit later.
moves to its own function, respods "OK\r\n" to client, allows multiple arguments for multiple flags. Needs more thought before adding sampling.
four obvious ones. have a handy place to do the summarization when the logger wakes up for its run, avoiding the locks.
put header sections for utils/bgthread/api, edit function names slightly so it's clear what's only called from the bgthread. remaining FIXME/TODO's appear noncritical. might do those in the future.
makes a map on start of bgthread to speed up conversion a little.
add LOGGER_LOG() macro wrapper for callers. eviction logging is now using the "more final" method, where data is copied into structs embedded into the per-worker circular buffer. This moves the expensive snprintf calls to the background thread, and also allows making the logging format switchable. also switched the format to key=value. it's still largely readable but much easier to parse.
swapping "RAWCMDS" for internal hooks on when items are fetched or stored. This doesn't map 1:1 with commands, ie: a store is internally a fetch then store, so two log lines are generated. An ascii multiget one make one log line per key fetched. It's a good place to start. Need to come back and refactor parts of logger.c again, then convert all prints in the codebase to log entries.
tests pass now.
and a note for missing information.
|
This is as far as I'm going to take this for this release. The overhead should be none when not in use, but when in use performance hasn't been extensively examined. Worst case there's a very minor slowdown in workers and one core can be lost to the background thread processing results to watchers. If overloaded it simply drops logs. |
|
merged (with a small squash), closing PR. |
This is a work in progress branch to add high-ish speed and remote-monitoring logging capabilities to memcached.
Features
New logging will allow remote connections to monitor events, or subset of events.
Monitor incoming commands, or details of items which are being evicted. Further work on the branch will add more options.
Infrastructure in place for different types of watchers: "log to stderr", "log to networked client", or "log to syslog" should all be possible.
The final code should allow monitoring subsets of logs as well: "show me 1 out of every 4 evictions"
Logging code cannot block internally. If the logging thread gets behind, new entries are not written from the worker threads until it catches back up.
Demo
As of this writing:
Connect again:
First connection will output:
watch evictionscan show details about evicted items:(ttl: time remaining before expiration. la: last access time. fetch: whether or not it was ever read)
Implementation
Each thread in the system gets a local "logger" object. Logger objects contain a circular buffer and flags for which log events to write. The flags and buffer are allocated within each thread, to help allow NUMA aware accesses. Loggers are also pinned to a thread via pthread_setspecific. getspecific|setspecific are slower than a direct reference, so the logger reference should be brought into a function via argument passing or from fetching from another structure where possible.
Callers of the logging code are able to hide as much complexity as possible into the logging code. See the eviction logger for an example of this: one line added to lru_pull_tail() instead of formatting inline.
A background thread is created which iterates over every active logger and writes found log entries (flattening timestamp and GID) into scratch space, then copying it into the buffers for any subscribed watchers. keys (and arbitrary strings) are "URI encoded" before sending to watchers. (ie: %NN replaces odd characters).
Watchers can be remote clients. They can also be stderr, or other things as implementations are added.
Remote clients are removed from a worker thread and sent to the logger thread to be managed (via a poll() loop). The only way to stop receiving logs is to disconnect.
When no watchers are active, should have no performance impact compared to the current "verbose logging to stderr".
Work In Progress
This is ready for a first release. Only a handful of logging endpoints are done to start with, and more refactoring is needed in the code. In adhering to "early and often", it's already useful and should evolve with time.
The full implementation should have a lot more logging points. Able to subscribe to system event type logs, as well as detailed logs for fetches and sets (if a get worked, or if it missed because the object was expired, flushed, etc). The
watchcommand will allow fine grained access to which events you wish to subscribe to.Not all current stderr logs will be converted to the new system. Many will be left to stderr only as debugging prints. IE: dumps of binprot formats, connection state transitions, etc. Logging entries are targeted toward end users more than developers.
An unordered list of remaining TODO items are below:
watchcommand (done until extensions are needed)(doing a "key=value key=value" + uriencoded keys format, but not hard to allow multiple formats)
Some optional tasks (confirmed for another PR):