Threadpool stats API#1528
Conversation
|
comments, folks? In the absence of this, its not possible to know that the threadpool is the cause of perf problems, a consistent problem in node, see nodejs/node#8436 (comment) |
|
|
||
| It is possible to request statistics on the threadpool activity to observe the | ||
| number of idle threads, and the number of queued work items waiting for an idle | ||
| thread to do the work. This maybe be used to detect performance problems, or to |
|
|
||
| Start calling stats callbacks on threadpool activity. | ||
|
|
||
| .. c:function:: int uv_stats_start(uv_queue_stats_t* stats) |
|
|
||
| #include <sys/types.h> | ||
| #include <sys/syscall.h> | ||
| #include <unistd.h> |
There was a problem hiding this comment.
Just curious, Windows doesn’t usually have these, right? Does it still work?
There was a problem hiding this comment.
apologies, remnants of debug code, now removed.
| * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING | ||
| * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS | ||
| * IN THE SOFTWARE. | ||
| */ |
There was a problem hiding this comment.
Again, mostly just curious: Does libuv have rules similar to Node for this (i.e. no copyright header for new files required)?
There was a problem hiding this comment.
We do add this head to new files, but I think we go with Copyright libuv project contributors.
| .. c:type:: (*uv_queue_stats_cb)(int queued, int idle_threads, void* data) | ||
|
|
||
| `queued` is the number of submitted items not yet started by a thread. | ||
| `idle_threads` is the number of threads waiting for a item to start. |
| `idle_threads` is the number of threads waiting for a item to start. | ||
| `data` is the corresponding value from the uv_queue_stats_t structure. | ||
|
|
||
| *Note*: No uv API functions may be called in these callbacks other than |
| static void report(enum estage stage, int lock) { | ||
| QUEUE* q; | ||
| uv_queue_stats_t* s; | ||
| int length_; |
There was a problem hiding this comment.
Could you name these just length and threads.
There was a problem hiding this comment.
I was trying to avoid shadowing the file-scope var called threads, but gcc doesn't seem to care, so I renamed.
| * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING | ||
| * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS | ||
| * IN THE SOFTWARE. | ||
| */ |
There was a problem hiding this comment.
We do add this head to new files, but I think we go with Copyright libuv project contributors.
| static QUEUE wq; | ||
| static int wq_length; | ||
| static volatile int initialized; | ||
| QUEUE stats; |
| Thread pool statistics | ||
| =========================== | ||
|
|
||
| libuv provides a threadpool which can be used to run user code and get notified |
|
|
||
| `queued` is the number of submitted items not yet started by a thread. | ||
| `idle_threads` is the number of threads waiting for a item to start. | ||
| `data` is the corresponding value from the uv_queue_stats_t structure. |
There was a problem hiding this comment.
Can you linkify this? I believe the syntax is this:
:c:type:`uv_queue_stats_t`
Likewise for uv_async_send() below (except it's :c:function, IIRC.)
| switch(stage) { | ||
| case SUBMIT: s->submit_cb(length_, threads_, s->data); break; | ||
| case START: s->start_cb(length_, threads_, s->data); break; | ||
| case DONE: s->done_cb(length_, threads_, s->data); break; |
There was a problem hiding this comment.
The callbacks shouldn't be made with the lock held, should they?
Stylistic: libuv doesn't use everything-on-one-line style. (Same goes for the if statements.)
There was a problem hiding this comment.
The only reason to release would be to allow the user to make calls that mutate the state of the locked queue, but if if the queue being iterated is mutated, it will break iteration.
Isn't this an issue? Is there a better way to deal with this? I wanted to allow libuv to be reentrant, and it was in the first version that only allowed a single callback but with the more complex data structure that doesn't seem to be possible any more.
|
|
||
| QUEUE_FOREACH(q, &stats) { | ||
| s = QUEUE_DATA(q, struct uv_queue_stats_s, q); | ||
| switch(stage) { |
|
|
||
| for (i = 0; i < ARRAY_SIZE(pause_reqs); i += 1) | ||
| uv_sem_post(pause_sems + i); | ||
| } |
There was a problem hiding this comment.
Try sharing this with test-threadpool-cancel.c.
| int done; | ||
| int idle_max; | ||
| int idle_min = 2 * ARRAY_SIZE(pause_reqs); | ||
| int queued_max; |
|
|
||
| TEST_IMPL(threadpool_stats) { | ||
| /* uv_loop_t* loop; */ | ||
| /* uv_work_t req; */ |
There was a problem hiding this comment.
Don't keep commented out code.
|
re: #1528 (comment) @bnoordhuis PTAL at last commit You will notice I removed the UV_THREADPOOL_SIZE env var setting. It didn't work, AFAICT, the pool is a global singleton across the entire test runner, and can only have its size initialized once, the tests work because the default 4 is also the default threadpool size. It definitely didn't work for stats, I had to pull the env var setting out into a different part of the code. |
| case DONE: | ||
| s->done_cb(length, threads, s->data); | ||
| break; | ||
| default: abort(); |
There was a problem hiding this comment.
abort() should go on a new line.
|
|
||
| if (lock) | ||
| uv_mutex_unlock(&mutex); | ||
| } |
There was a problem hiding this comment.
This function still invokes the callbacks with the lock held.
Also, programmer economics: you can't call uv_queue_stats_stop() from a callback now (although for that to work properly, you can't use QUEUE_FOREACH().)
|
|
||
| void threadpool_unblock(void) { | ||
| unblock_threadpool(); | ||
| } |
There was a problem hiding this comment.
Why these wrapper functions? Why not expose them directly?
There was a problem hiding this comment.
The underlying function is called a dozen times through the file, I was looking to avoid churn where the existing functions were already in use. I renamed them: b7761de
| } | ||
|
|
||
| static void stats_submit_cb(int queued, int idle, void* data) { | ||
| fprintf(stderr, "submit: q %d i %d\n", queued, idle); |
There was a problem hiding this comment.
As a rule of thumb, tests should be silent (i.e., no printf.)
There was a problem hiding this comment.
Are you sure? The current tests use lots of fprintf(stderr - and the test runner does the right thing, it suppresses all output unless the test fails, then it shows the output so the test failure can be debugged. I don't mind deleting it, but it looked to me like this was the pattern in libuv's tests.
| #define TEST_THREADPOOL_H | ||
|
|
||
| void threadpool_saturate(unsigned start); | ||
| void threadpool_unblock(void); |
There was a problem hiding this comment.
You can just put these in test/task.h, no need for a separate header file. If you want to keep this header, add it to uv.gyp.
| w = container_of(q, struct uv__work, wq); | ||
| err = (w->work == uv__cancelled) ? UV_ECANCELED : 0; | ||
| w->done(w, err); | ||
| report(DONE, 1); |
There was a problem hiding this comment.
This incurs the overhead of locking and unlocking the mutex for every work item, even when there are no observers.
I'd cache the result of QUEUE_EMPTY(&stats) in the critical section a few lines up and only call report() if it's not empty.
That would introduce a small delay when a done callback calls uv_queue_stats_start() but that seems like an acceptable trade-off to me; it just needs to be documented.
As well, that would let you drop the second report() parameter because you can simply make it a rule that it should only be called with the lock held.
(We should review whether it's always safe to drop and reacquire the lock when invoking the callback though.)
|
|
||
| UV_EXTERN int uv_cancel(uv_req_t* req); | ||
|
|
||
| typedef void (*uv_queue_stats_cb)(int queued, int idle_threads, void* data); |
There was a problem hiding this comment.
I'd like to change this to to typedef void (*)(uv_queue_stats_t*, int, int), that way the struct can be embedded in another data structure a la container_of().
Side question: is the use of int (vs. for example unsigned) intentional?
There was a problem hiding this comment.
int is a bit friendlier to users, IMO, but unsigned reflects better that it can't be negative. went back and forth on this a bit in my head when writing it. Now its unsigned.
|
re: the locking, I think my comment #1528 (comment) got lost when I touched the code. If the queue is not locked for the entire iteration it would be possible to call If its allowed to call arbitrary libuv functions from a stats callback, that would include functions that queue work either explicitly or as a side effect ( Is it OK to call |
You The one issue is that you might invoke the callback just as another thread is unregistering it. Not really a problem for libuv (just cache the struct pointer and function pointer in local variables before releasing the lock) but it might be for the stats collector.
I don't have a problem with that. That's on the head of the stats collector.
That doesn't seem like a good idea to me. For one, you might run afoul of compiler optimizations. |
|
@bnoordhuis OK, see 3bead2b, though I don't think it has much useful effect other than allowing The
For It might be simpler to keep the mutex locked during calls to |
|
@bnoordhuis see last comment |
| structure. | ||
|
|
||
| .. warning:: | ||
| These callback functions are not threadsafe. No libuv API functions may be |
There was a problem hiding this comment.
Small typo: s/threadsafe/thread-safe/
| loop = container_of(handle, uv_loop_t, wq_async); | ||
| uv_mutex_lock(&loop->wq_mutex); | ||
| QUEUE_MOVE(&loop->wq, &wq); | ||
| QUEUE_MOVE(&loop->wq, &wq); /* XXX */ |
| while (!QUEUE_EMPTY(&sq)) { | ||
| q = QUEUE_HEAD(&sq); | ||
| QUEUE_REMOVE(q); | ||
| s = QUEUE_DATA(q, struct uv_queue_stats_s, q); |
There was a problem hiding this comment.
It would be better to cache s->submit_cb and friends in local variables. (Think: another thread concurrently freeing s.)
For the same reason you need to QUEUE_INSERT_TAIL(&stats, q) before you release the lock.
There was a problem hiding this comment.
I see your point about INSERT_TAIL(). I'm not seeing the point about s being freed, its not allowed to do that until _stop() is called, in which case we wouldn't have gotten to this point. Also, s is passed as an argument to the callback, so it must be valid memory at point the cb is called, so it seems safe to deref for us, as well as the cb receiving s as an argument.
I made both changes: see 0517509
There was a problem hiding this comment.
The logic w.r.t. to s is when another thread calls uv_queue_stats_stop(s) and free(s) between this thread's calls to uv_mutex_unlock() and s->submit_cb(s, ...).
| "UV_THREADPOOL_SIZE=%lu", | ||
| (unsigned long)ARRAY_SIZE(pause_reqs)); | ||
| putenv(buf); | ||
| if (start) { |
There was a problem hiding this comment.
Minor thing but can you write this as start > 0 to make it clear it's an integral and not a boolean?
| uv_queue_stats_t stats; | ||
| stats.submit_cb = stats_submit_cb; | ||
| stats.start_cb = stats_start_cb; | ||
| stats.done_cb = stats_done_cb; |
There was a problem hiding this comment.
Can you also assign a canary to stats.data and check that in the callbacks?
bnoordhuis
left a comment
There was a problem hiding this comment.
LGTM. CI: https://ci.nodejs.org/job/libuv-test-commit/503/
Might be good to get at least one more pair of eyeballs.
| while (!QUEUE_EMPTY(&sq)) { | ||
| q = QUEUE_HEAD(&sq); | ||
| QUEUE_REMOVE(q); | ||
| s = QUEUE_DATA(q, struct uv_queue_stats_s, q); |
There was a problem hiding this comment.
The logic w.r.t. to s is when another thread calls uv_queue_stats_stop(s) and free(s) between this thread's calls to uv_mutex_unlock() and s->submit_cb(s, ...).
| =========================== | ||
|
|
||
| Libuv provides a threadpool which can be used to run user code and get notified | ||
| in the loop thread. It is also used internally by libuv. The threadpool is |
There was a problem hiding this comment.
“get notified of completion”?
|
|
||
| .. warning:: | ||
| These callback functions are not thread-safe. No libuv API functions may | ||
| be called in these callbacks other than :c:func:`uv_async_send()`. |
There was a problem hiding this comment.
These two sentences sound contradictory to me – isn’t it the case that callback function needs to be thread-safe since it may be called from any thread?
There was a problem hiding this comment.
Agreed. The first sentence "These callback functions are not thread-safe" seems redundant and a bit confusing. I think the second sentence says it rightly.
| static int wq_length; | ||
| static volatile int initialized; | ||
| static QUEUE stats; | ||
| enum estage { SUBMIT, START, DONE }; |
There was a problem hiding this comment.
Just a question – the e prefix is for enum?
|
Build failures on the Windows and MacOS buildbots and test failures on the Linux buildbots. The build failures can probably be fixed by adding the test to uv.gyp. The test failures will need more investigation, though. |
|
Ping @sam-github. There are a couple outstanding questions/comments, and build failures. |
|
sorry, had other things to do lately, but this is getting to towards the top of my list, its not forgotten |
| } | ||
|
|
||
|
|
||
| static void report(enum estage stage) { |
There was a problem hiding this comment.
Worth a comment saying that mutex should be held by the caller?
A slight reworking of libuv#1528 to have a consistent API approach with loop stats
A slight reworking of libuv#1528 to have a consistent API approach with loop stats
A slight reworking of libuv#1528 to have a consistent API approach with loop stats
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Fix #1144
In the process of describing which node.js APIs use the threadpool (nodejs/node#14995) I came up against the difficulty of knowing whether the threadpool is out of idle threads, or accumulating work items that can't be processed. The threadpool is private, there was no way to access or observe it, even from C, so this PR introduces APIs to observe the pool.