Skip to content

Threadpool stats API#1528

Closed
sam-github wants to merge 10 commits into
libuv:v1.xfrom
sam-github:threadpool-stats
Closed

Threadpool stats API#1528
sam-github wants to merge 10 commits into
libuv:v1.xfrom
sam-github:threadpool-stats

Conversation

@sam-github

Copy link
Copy Markdown
Contributor

Fix #1144

In the process of describing which node.js APIs use the threadpool (nodejs/node#14995) I came up against the difficulty of knowing whether the threadpool is out of idle threads, or accumulating work items that can't be processed. The threadpool is private, there was no way to access or observe it, even from C, so this PR introduces APIs to observe the pool.

@sam-github

Copy link
Copy Markdown
Contributor Author

comments, folks? In the absence of this, its not possible to know that the threadpool is the cause of perf problems, a consistent problem in node, see nodejs/node#8436 (comment)

Comment thread docs/src/threadpool_stats.rst Outdated

It is possible to request statistics on the threadpool activity to observe the
number of idle threads, and the number of queued work items waiting for an idle
thread to do the work. This maybe be used to detect performance problems, or to

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: maybe bemay be?

Comment thread docs/src/threadpool_stats.rst Outdated

Start calling stats callbacks on threadpool activity.

.. c:function:: int uv_stats_start(uv_queue_stats_t* stats)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: startstop?

Comment thread src/threadpool.c Outdated

#include <sys/types.h>
#include <sys/syscall.h>
#include <unistd.h>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, Windows doesn’t usually have these, right? Does it still work?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apologies, remnants of debug code, now removed.

* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
* IN THE SOFTWARE.
*/

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, mostly just curious: Does libuv have rules similar to Node for this (i.e. no copyright header for new files required)?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do add this head to new files, but I think we go with Copyright libuv project contributors.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment thread docs/src/threadpool_stats.rst Outdated
.. c:type:: (*uv_queue_stats_cb)(int queued, int idle_threads, void* data)

`queued` is the number of submitted items not yet started by a thread.
`idle_threads` is the number of threads waiting for a item to start.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a item -> an item

Comment thread docs/src/threadpool_stats.rst Outdated
`idle_threads` is the number of threads waiting for a item to start.
`data` is the corresponding value from the uv_queue_stats_t structure.

*Note*: No uv API functions may be called in these callbacks other than

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uv -> libuv

Comment thread src/threadpool.c Outdated
static void report(enum estage stage, int lock) {
QUEUE* q;
uv_queue_stats_t* s;
int length_;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you name these just length and threads.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to avoid shadowing the file-scope var called threads, but gcc doesn't seem to care, so I renamed.

* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
* IN THE SOFTWARE.
*/

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do add this head to new files, but I think we go with Copyright libuv project contributors.

Comment thread src/threadpool.c Outdated
static QUEUE wq;
static int wq_length;
static volatile int initialized;
QUEUE stats;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static

Comment thread docs/src/threadpool_stats.rst Outdated
Thread pool statistics
===========================

libuv provides a threadpool which can be used to run user code and get notified

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Libuv

Comment thread docs/src/threadpool_stats.rst Outdated

`queued` is the number of submitted items not yet started by a thread.
`idle_threads` is the number of threads waiting for a item to start.
`data` is the corresponding value from the uv_queue_stats_t structure.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you linkify this? I believe the syntax is this:

:c:type:`uv_queue_stats_t`

Likewise for uv_async_send() below (except it's :c:function, IIRC.)

Comment thread src/threadpool.c Outdated
switch(stage) {
case SUBMIT: s->submit_cb(length_, threads_, s->data); break;
case START: s->start_cb(length_, threads_, s->data); break;
case DONE: s->done_cb(length_, threads_, s->data); break;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The callbacks shouldn't be made with the lock held, should they?

Stylistic: libuv doesn't use everything-on-one-line style. (Same goes for the if statements.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason to release would be to allow the user to make calls that mutate the state of the locked queue, but if if the queue being iterated is mutated, it will break iteration.

Isn't this an issue? Is there a better way to deal with this? I wanted to allow libuv to be reentrant, and it was in the first version that only allowed a single callback but with the more complex data structure that doesn't seem to be possible any more.

Comment thread src/threadpool.c Outdated

QUEUE_FOREACH(q, &stats) {
s = QUEUE_DATA(q, struct uv_queue_stats_s, q);
switch(stage) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space before (.

Comment thread test/test-threadpool-stats.c Outdated

for (i = 0; i < ARRAY_SIZE(pause_reqs); i += 1)
uv_sem_post(pause_sems + i);
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try sharing this with test-threadpool-cancel.c.

Comment thread test/test-threadpool-stats.c Outdated
int done;
int idle_max;
int idle_min = 2 * ARRAY_SIZE(pause_reqs);
int queued_max;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static

Comment thread test/test-threadpool-stats.c Outdated

TEST_IMPL(threadpool_stats) {
/* uv_loop_t* loop; */
/* uv_work_t req; */

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't keep commented out code.

@sam-github

Copy link
Copy Markdown
Contributor Author

re: #1528 (comment)

@bnoordhuis PTAL at last commit

You will notice I removed the UV_THREADPOOL_SIZE env var setting. It didn't work, AFAICT, the pool is a global singleton across the entire test runner, and can only have its size initialized once, the tests work because the default 4 is also the default threadpool size. It definitely didn't work for stats, I had to pull the env var setting out into a different part of the code.

Comment thread src/threadpool.c Outdated
case DONE:
s->done_cb(length, threads, s->data);
break;
default: abort();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

abort() should go on a new line.

Comment thread src/threadpool.c

if (lock)
uv_mutex_unlock(&mutex);
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function still invokes the callbacks with the lock held.

Also, programmer economics: you can't call uv_queue_stats_stop() from a callback now (although for that to work properly, you can't use QUEUE_FOREACH().)

Comment thread test/test-threadpool-cancel.c Outdated

void threadpool_unblock(void) {
unblock_threadpool();
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why these wrapper functions? Why not expose them directly?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The underlying function is called a dozen times through the file, I was looking to avoid churn where the existing functions were already in use. I renamed them: b7761de

}

static void stats_submit_cb(int queued, int idle, void* data) {
fprintf(stderr, "submit: q %d i %d\n", queued, idle);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a rule of thumb, tests should be silent (i.e., no printf.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure? The current tests use lots of fprintf(stderr - and the test runner does the right thing, it suppresses all output unless the test fails, then it shows the output so the test failure can be debugged. I don't mind deleting it, but it looked to me like this was the pattern in libuv's tests.

Comment thread test/test-threadpool.h Outdated
#define TEST_THREADPOOL_H

void threadpool_saturate(unsigned start);
void threadpool_unblock(void);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just put these in test/task.h, no need for a separate header file. If you want to keep this header, add it to uv.gyp.

Comment thread src/threadpool.c Outdated
w = container_of(q, struct uv__work, wq);
err = (w->work == uv__cancelled) ? UV_ECANCELED : 0;
w->done(w, err);
report(DONE, 1);

@bnoordhuis bnoordhuis Sep 19, 2017

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This incurs the overhead of locking and unlocking the mutex for every work item, even when there are no observers.

I'd cache the result of QUEUE_EMPTY(&stats) in the critical section a few lines up and only call report() if it's not empty.

That would introduce a small delay when a done callback calls uv_queue_stats_start() but that seems like an acceptable trade-off to me; it just needs to be documented.

As well, that would let you drop the second report() parameter because you can simply make it a rule that it should only be called with the lock held.

(We should review whether it's always safe to drop and reacquire the lock when invoking the callback though.)

Comment thread include/uv.h Outdated

UV_EXTERN int uv_cancel(uv_req_t* req);

typedef void (*uv_queue_stats_cb)(int queued, int idle_threads, void* data);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to change this to to typedef void (*)(uv_queue_stats_t*, int, int), that way the struct can be embedded in another data structure a la container_of().

Side question: is the use of int (vs. for example unsigned) intentional?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int is a bit friendlier to users, IMO, but unsigned reflects better that it can't be negative. went back and forth on this a bit in my head when writing it. Now its unsigned.

@sam-github

Copy link
Copy Markdown
Contributor Author

re: the locking, I think my comment #1528 (comment) got lost when I touched the code.

If the queue is not locked for the entire iteration it would be possible to call stats_stop() in a callback, as you say, which is nice, except that it corrupts the queue iteration. You mention not using FOREACH, how? Use of macro aside, if the data structure is modified while iterating, its hard to keep iterating.

If its allowed to call arbitrary libuv functions from a stats callback, that would include functions that queue work either explicitly or as a side effect (uv_getaddrinfo()), which invalidates the stats as they are being reported, and also would trigger a new queue iteration to report a new work item being submitted. Its a bit limiting to only be allowed to call uv_async_send(), but I can't think of a way around that, suggestions? Also, for this particular API, it leaves it up to the user to figure out how/if they want to accumulate stats before sending it the main loop, if they even want to.

Is it OK to call !QUEUE_EMPTY(&stats) && report() everywhere to avoid the funcall overhead when there are no stats listeners? The empty check isn't thread safe, it can return false positives or negatives, but I don't think that matters, because it will only happen occaisonally, and report() will safely reconfirm the state of the queue internally while it is locked.

@bnoordhuis

Copy link
Copy Markdown
Member

You mention not using FOREACH, how?

You QUEUE_MOVE() the global queue to a local variable, then you pop off the items one by one and add them back to the global queue (all the while making sure to hold the lock at appropriate times, of course.)

The one issue is that you might invoke the callback just as another thread is unregistering it. Not really a problem for libuv (just cache the struct pointer and function pointer in local variables before releasing the lock) but it might be for the stats collector.

functions that queue work either explicitly or as a side effect (uv_getaddrinfo()), which invalidates the stats as they are being reported

I don't have a problem with that. That's on the head of the stats collector.

Is it OK to call !QUEUE_EMPTY(&stats) && report() everywhere to avoid the funcall overhead when there are no stats listeners?

That doesn't seem like a good idea to me. For one, you might run afoul of compiler optimizations.

@sam-github

Copy link
Copy Markdown
Contributor Author

@bnoordhuis OK, see 3bead2b, though I don't think it has much useful effect other than allowing uv_threadpool_stats_start/stop() to be called from the the stats callbacks themselves, fwiw, seems like start/stop would be done based on some input from a loop, signal, IPC, etc.

The start() cb is called from the threadpool, it definitely can't be used to call anything other than uv_async_send() so I'm not sure its useful to do all the queue moving and mutex unlocking for it. I could change the code so it doesn't unlock for this specific cb.

submit() and done() are called from a loop thread, so superficially it seems like libuv API functions can be called from them, except that which livuv loop's thread are they being called from? Could be any of the multiple ones, and libuv APIs on a loop can only be called from that loop's thread.

For submit and for done(), the loop is known, I could include the uv_loop_t* as info in the callback for these two callbacks, then if the person was careful, they could use libuv API functions.

It might be simpler to keep the mutex locked during calls to report(), and make clear that only uv_async_send() can be called from the cb, and leave any threading concerns up to the user.

@sam-github

Copy link
Copy Markdown
Contributor Author

@bnoordhuis see last comment

@bnoordhuis bnoordhuis left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nearing completion.

Comment thread docs/src/threadpool_stats.rst Outdated
structure.

.. warning::
These callback functions are not threadsafe. No libuv API functions may be

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small typo: s/threadsafe/thread-safe/

Comment thread src/threadpool.c Outdated
loop = container_of(handle, uv_loop_t, wq_async);
uv_mutex_lock(&loop->wq_mutex);
QUEUE_MOVE(&loop->wq, &wq);
QUEUE_MOVE(&loop->wq, &wq); /* XXX */

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Que?

Comment thread src/threadpool.c
while (!QUEUE_EMPTY(&sq)) {
q = QUEUE_HEAD(&sq);
QUEUE_REMOVE(q);
s = QUEUE_DATA(q, struct uv_queue_stats_s, q);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to cache s->submit_cb and friends in local variables. (Think: another thread concurrently freeing s.)

For the same reason you need to QUEUE_INSERT_TAIL(&stats, q) before you release the lock.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point about INSERT_TAIL(). I'm not seeing the point about s being freed, its not allowed to do that until _stop() is called, in which case we wouldn't have gotten to this point. Also, s is passed as an argument to the callback, so it must be valid memory at point the cb is called, so it seems safe to deref for us, as well as the cb receiving s as an argument.

I made both changes: see 0517509

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic w.r.t. to s is when another thread calls uv_queue_stats_stop(s) and free(s) between this thread's calls to uv_mutex_unlock() and s->submit_cb(s, ...).

Comment thread test/test-threadpool-cancel.c Outdated
"UV_THREADPOOL_SIZE=%lu",
(unsigned long)ARRAY_SIZE(pause_reqs));
putenv(buf);
if (start) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor thing but can you write this as start > 0 to make it clear it's an integral and not a boolean?

uv_queue_stats_t stats;
stats.submit_cb = stats_submit_cb;
stats.start_cb = stats_start_cb;
stats.done_cb = stats_done_cb;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also assign a canary to stats.data and check that in the callbacks?

@bnoordhuis bnoordhuis left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. CI: https://ci.nodejs.org/job/libuv-test-commit/503/

Might be good to get at least one more pair of eyeballs.

Comment thread src/threadpool.c
while (!QUEUE_EMPTY(&sq)) {
q = QUEUE_HEAD(&sq);
QUEUE_REMOVE(q);
s = QUEUE_DATA(q, struct uv_queue_stats_s, q);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic w.r.t. to s is when another thread calls uv_queue_stats_stop(s) and free(s) between this thread's calls to uv_mutex_unlock() and s->submit_cb(s, ...).

===========================

Libuv provides a threadpool which can be used to run user code and get notified
in the loop thread. It is also used internally by libuv. The threadpool is

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“get notified of completion”?


.. warning::
These callback functions are not thread-safe. No libuv API functions may
be called in these callbacks other than :c:func:`uv_async_send()`.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two sentences sound contradictory to me – isn’t it the case that callback function needs to be thread-safe since it may be called from any thread?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. The first sentence "These callback functions are not thread-safe" seems redundant and a bit confusing. I think the second sentence says it rightly.

Comment thread src/threadpool.c
static int wq_length;
static volatile int initialized;
static QUEUE stats;
enum estage { SUBMIT, START, DONE };

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question – the e prefix is for enum?

@bnoordhuis

Copy link
Copy Markdown
Member

Build failures on the Windows and MacOS buildbots and test failures on the Linux buildbots.

run-tests.obj : error LNK2001: unresolved external symbol _run_test_threadpool_stats
Undefined symbols for architecture x86_64:
  "_run_test_threadpool_stats", referenced from:
      _TASKS in run-tests.o

The build failures can probably be fixed by adding the test to uv.gyp. The test failures will need more investigation, though.

@cjihrig

cjihrig commented Nov 1, 2017

Copy link
Copy Markdown
Contributor

Ping @sam-github. There are a couple outstanding questions/comments, and build failures.

@sam-github

Copy link
Copy Markdown
Contributor Author

sorry, had other things to do lately, but this is getting to towards the top of my list, its not forgotten

Comment thread src/threadpool.c
}


static void report(enum estage stage) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth a comment saying that mutex should be held by the caller?

jasnell added a commit to jasnell/libuv that referenced this pull request Feb 28, 2018
A slight reworking of libuv#1528 to
have a consistent API approach with loop stats
jasnell added a commit to jasnell/libuv that referenced this pull request Feb 28, 2018
A slight reworking of libuv#1528 to
have a consistent API approach with loop stats
jasnell added a commit to jasnell/libuv that referenced this pull request Mar 4, 2018
A slight reworking of libuv#1528 to
have a consistent API approach with loop stats
@davisjam davisjam mentioned this pull request Aug 27, 2018
@stale

stale Bot commented Sep 30, 2019

Copy link
Copy Markdown

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale Bot added the stale label Sep 30, 2019
@stale stale Bot closed this Oct 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feature request: enable observation and troubleshooting of the work queue/thread pool usage

5 participants