checkqueue: set MAX_SCRIPTCHECK_THREADS to nCores - 1 #32692

HowHsu · 2025-06-06T13:53:48Z

A fixed MAX_SCRIPTCHECK_THREADS value is not flexible for users to leverage their cpu resources, and a value large than nCores - 1 doesn't make sense since it only adds some context switch overhead. Set it to nCores - 1. Assumption: A user who sets the number of script verification workers is aware of how this affects the system performance, otherwise he/she leaves it as default (which is 0)

DrahtBot · 2025-06-06T13:53:52Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32692.

Reviews

See the guideline for information on the review process.
A summary of reviews will appear here.

Conflicts

No conflicts as of last run.

sipa · 2025-06-06T13:58:48Z

I think up to nCores makes sense, and we need to be sure that that value is an actual reflection of hardware concurrency on all platforms. It may be worth benchmarking to see if there is actually a gain above certain values still (because thread contention will start to become more and more noticeable, which may at some point overtake the gains from more parallellism).

DrahtBot · 2025-06-06T14:42:07Z

🚧 At least one of the CI tasks failed.
_{Task no wallet, libbitcoinkernel: https://github.com/bitcoin/bitcoin/runs/43620218125}
_{LLM reason (✨ experimental): Linker error due to undefined reference to GetNumCores() causing build failure.}

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

HowHsu · 2025-06-06T15:55:15Z

I think up to nCores makes sense, and we need to be sure that that value is an actual reflection of hardware concurrency on all platforms. It may be worth benchmarking to see if there is actually a gain above certain values still (because thread contention will start to become more and more noticeable, which may at some point overtake the gains from more parallellism).

There is a master worker, plus nCores-1 normal workers, so setting it to nCores actually allows nCores+1 worker threads.
I now think maybe it should even be smaller than nCores-1, since there may be some important bitcoind threads running (as a newbie, I can't point them out clearly).
About thread contention, I assume users know how many and which cores are idle enough to run these workers and they will bind workers to them(for example, use taskset on linux). Maybe more workers bring more lock contention, that's a issue to be investigated.

be sure that that value is an actual reflection of hardware concurrency on all platforms

I'll look into it.

sipa · 2025-06-06T16:04:03Z

There is a master worker, plus nCores-1 normal workers, so setting it to nCores actually allows nCores+1 worker threads.

That is incorrect. With -par=N, there is 1 master plus N-1 workers, which both perform script validation work, but the master only joins the other threads when it is done with other work. If you have N perfectly parallel cores, and nothing else running on your machine, par=N is expected, so the software should allow up to that, if there is no reason to assume that this would result in degraded performance.

About thread contention, I assume users know which cores are idle enough to run these workers and they will bind workers to them(for example, use taskset on linux).

Thread contention is the overhead inside bitcoind caused by coordination between the threads, and is unrelated to what the hardware provides or how loaded it is. The reason for the existing limit of 15 is because benchmarks showed that even on an otherwise unloaded machine with more cores than that, adding more script validation threads was net slower after about that many. That benchmark dates from 2013, so there is no reason to assume it still holds, as both hardware and typical blockchain script usage has changed significantly, as well as other parts of the codebase. But it would still be worth investigating if there isn't just some number above which it makes no sense.

sipa · 2025-06-06T16:14:00Z

src/validation.h

Don't do this. Upper-case variable names are for constants, and this is no longer a constant. If there is reason for it to be a variable, the constant and its name should be removed, and the code using it should be adapted to compute it on the fly.

HowHsu · 2025-06-06T16:31:12Z

There is a master worker, plus nCores-1 normal workers, so setting it to nCores actually allows nCores+1 worker threads.

That is incorrect. With -par=N, there is 1 master plus N-1 workers, which both perform script validation work, but the master only joins the other threads when it is done with other work. If you have N perfectly parallel cores, and nothing else running on your machine, par=N is expected, so the software should allow up to that, if there is no reason to assume that this would result in degraded performance.

From the code I read from checkqueue.h ( https://raw.githubusercontent.com/bitcoin/bitcoin/refs/heads/master/src/checkqueue.h#:~:text=std%3A%3Aoptional%3CR%3E%20Complete()%20EXCLUSIVE_LOCKS_REQUIRED(!m_mutex) ) I think par=N means N workers, plus one master worker which is actually the main bitcoind thread itself. And I run benchmark in ben/connectblock.cpp to debug it, shows the same thing.

About thread contention, I assume users know which cores are idle enough to run these workers and they will bind workers to them(for example, use taskset on linux).

Thread contention is the overhead inside bitcoind caused by coordination between the threads, and is unrelated to what the hardware provides or how loaded it is. The reason for the existing limit of 15 is because benchmarks showed that even on an otherwise unloaded machine with more cores than that, adding more script validation threads was net slower after about that many. That benchmark dates from 2013, so there is no reason to assume it still holds, as both hardware and typical blockchain script usage has changed significantly, as well as other parts of the codebase. But it would still be worth investigating if there isn't just some number above which it makes no sense.

I'll try to find a machine which has enough cpu resource to test it.

sipa · 2025-06-06T16:33:48Z

I think par=N means N workers, plus one master worker which is actually the main bitcoind thread itself.

That's not correct. The number of worker threads is one less than the argument to -par: https://github.com/bitcoin/bitcoin/blob/v29.0/src/node/chainstatemanager_args.cpp#L62.

HowHsu · 2025-06-06T16:40:52Z

I think par=N means N workers, plus one master worker which is actually the main bitcoind thread itself.

That's not correct. The number of worker threads is one less than the argument to -par: https://github.com/bitcoin/bitcoin/blob/v29.0/src/node/chainstatemanager_args.cpp#L62.

I see, sorry for missing this part, but MAX_SCRIPTCHECK_THREADS is to limit opt.worker_threads_num, so it still should be nCores-1

sipa · 2025-06-06T16:42:22Z

@HowHsu I am so sorry! You are right.

HowHsu · 2025-06-08T13:50:38Z

Hi Sipa,

The reason for the existing limit of 15 is because benchmarks showed that even on an otherwise unloaded machine with more cores than that, adding more script validation threads was net slower after about that many.

Any doc about this test?

That benchmark dates from 2013

Is the benchmark you mentioned /bench/connectblock.cpp

I've done some investigation, and found the bottleneck may be the ECC calculation itself, not lock contension or something else.
I did some test on my Intel i5-13600KF（13th Gen）PC which has 6 P-cores and 8 E-cores (6 * 2 + 8 = 20 logical cores) with ./bench/connectblock.cpp, the producer adds tasks to the queue fast, lock contention is low from perf report. But the TPS doesn't go higher after thread_num=14 or 15, which is equal to the number of physical cores.
From internet, I learned that ECC computation highly uses ALU、FPU etc. and logical cpu threads on one core shares these stuff, which means ECC cannot get benefit from SMT. So I guess a user can set MAX_SCRIPTCHECK_THREADS to nPhysicalCores - 1, but still needs more research. Unfortunately I don't have a test environment with a bunch of CPUs (Would be super great if someone can do this test on a machine with different number of physical cores)

HowHsu · 2025-06-22T15:18:47Z

I'm going to close this one since I found that under the current scriptcheck queue implementation, cores >=14 or 15 may degrade the performance.

luke-jr · 2025-06-24T19:27:26Z

Maybe this is something best solved by runtime benchmarking? (Or perhaps just at startup, since early blocks may not give us the parallelization we need for it)

HowHsu · 2025-06-29T13:32:29Z

I post a new scriptcheck pool implementation #32791 with atomic variables rather than mutex to reduce thread sync contention. That achieves very good performance improvement. But the idea is based on that the adding period is very short compared to the script verification period, so it has to be with PR #31132. @sipa and others, welcome to take a look.

HowHsu · 2025-06-29T13:38:44Z

Maybe this is something best solved by runtime benchmarking? (Or perhaps just at startup, since early blocks may not give us the parallelization we need for it)

I think we can do it so, binary search the best worker thread number with bencharking on some standard test data.

maflcko · 2025-07-03T07:32:18Z

Maybe turn into draft while CI is red and this is still a WIP?

A fixed MAX_SCRIPTCHECK_THREADS value is not flexible for users to leverage their cpu resources, and a value large than nCores - 1 doesn't make sense since it only adds some context switch overhead. Set it to nCores - 1. Assumption: A user who sets the number of script verification workers is aware of how this affects the system performance, otherwise he/she leaves it as default (which is 0) Signed-off-by: Hao Xu <hao.xu@linux.dev>

achow101 · 2025-10-22T15:26:57Z

Are you still working on this?

HowHsu · 2025-10-23T09:33:46Z

Are you still working on this?

No

fjahr · 2025-10-29T23:42:20Z

The reason for the existing limit of 15 is because benchmarks showed that even on an otherwise unloaded machine with more cores than that, adding more script validation threads was net slower after about that many. That benchmark dates from 2013, so there is no reason to assume it still holds, as both hardware and typical blockchain script usage has changed significantly, as well as other parts of the codebase. But it would still be worth investigating if there isn't just some number above which it makes no sense.

I also checked this using the ConnectBlock benchmark as well as #33740 plus modifying MAX_SCRIPTCHECK_THREADS of course. The assumption still appears to hold based on these results. ~~If anything, based on my results (which I reran several times) we could potentially reduce MAX_SCRIPTCHECK_THREADS to 13.~~ EDIT: After some time I started to see the best results with 14 and 15, so maybe this was nothing.

$ build/bin/bench_bitcoin -filter=ConnectBlockAllSchnorr -min-time=1000 -scale-threads
Running benchmarks with worker threads from 1 to 32

|            ns/block |             block/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|       43,184,145.50 |               23.16 |    0.1% |      0.99 | `ConnectBlockAllSchnorr_1_threads`
|       29,091,010.25 |               34.37 |    0.4% |      1.10 | `ConnectBlockAllSchnorr_2_threads`
|       22,236,218.75 |               44.97 |    0.1% |      1.09 | `ConnectBlockAllSchnorr_3_threads`
|       17,901,305.67 |               55.86 |    0.3% |      1.09 | `ConnectBlockAllSchnorr_4_threads`
|       15,134,386.86 |               66.07 |    0.2% |      1.08 | `ConnectBlockAllSchnorr_5_threads`
|       13,131,000.00 |               76.16 |    0.1% |      1.08 | `ConnectBlockAllSchnorr_6_threads`
|       11,635,203.67 |               85.95 |    0.1% |      1.07 | `ConnectBlockAllSchnorr_7_threads`
|       10,472,699.00 |               95.49 |    0.1% |      1.07 | `ConnectBlockAllSchnorr_8_threads`
|        9,571,375.00 |              104.48 |    0.1% |      1.08 | `ConnectBlockAllSchnorr_9_threads`
|        9,643,621.18 |              103.70 |    0.7% |      1.09 | `ConnectBlockAllSchnorr_10_threads`
|        9,545,537.50 |              104.76 |    0.6% |      1.09 | `ConnectBlockAllSchnorr_11_threads`
|        9,536,983.30 |              104.85 |    0.8% |      1.07 | `ConnectBlockAllSchnorr_12_threads`
|        9,313,783.30 |              107.37 |    0.7% |      1.08 | `ConnectBlockAllSchnorr_13_threads`
|        9,630,087.50 |              103.84 |    1.8% |      1.09 | `ConnectBlockAllSchnorr_14_threads`
|        9,893,344.73 |              101.08 |    1.5% |      1.10 | `ConnectBlockAllSchnorr_15_threads`
|        9,840,655.36 |              101.62 |    1.3% |      1.12 | `ConnectBlockAllSchnorr_16_threads`
|        9,914,708.27 |              100.86 |    3.3% |      1.07 | `ConnectBlockAllSchnorr_17_threads`
|       10,038,008.40 |               99.62 |    1.0% |      1.09 | `ConnectBlockAllSchnorr_18_threads`
|       10,126,620.90 |               98.75 |    1.0% |      1.09 | `ConnectBlockAllSchnorr_19_threads`
|       10,279,966.60 |               97.28 |    2.2% |      1.12 | `ConnectBlockAllSchnorr_20_threads`
|       10,334,737.50 |               96.76 |    1.8% |      1.10 | `ConnectBlockAllSchnorr_21_threads`
|       10,016,070.80 |               99.84 |    3.2% |      1.12 | `ConnectBlockAllSchnorr_22_threads`
|       10,447,277.78 |               95.72 |    1.5% |      1.08 | `ConnectBlockAllSchnorr_23_threads`
|       10,244,954.55 |               97.61 |    1.4% |      1.12 | `ConnectBlockAllSchnorr_24_threads`
|        9,975,325.73 |              100.25 |    1.4% |      1.09 | `ConnectBlockAllSchnorr_25_threads`
|       10,181,458.30 |               98.22 |    1.1% |      1.11 | `ConnectBlockAllSchnorr_26_threads`
|       10,263,129.20 |               97.44 |    0.8% |      1.10 | `ConnectBlockAllSchnorr_27_threads`
|       10,168,634.33 |               98.34 |    1.0% |      1.02 | `ConnectBlockAllSchnorr_28_threads`
|        9,781,129.20 |              102.24 |    1.9% |      1.11 | `ConnectBlockAllSchnorr_29_threads`
|       10,461,203.12 |               95.59 |    4.9% |      1.00 | `ConnectBlockAllSchnorr_30_threads`
|       10,292,564.89 |               97.16 |    2.0% |      0.99 | `ConnectBlockAllSchnorr_31_threads`
|       10,371,725.00 |               96.42 |    0.6% |      1.11 | `ConnectBlockAllSchnorr_32_threads`

maflcko added the Needs Benchmark label Jun 6, 2025

DrahtBot added the CI failed label Jun 6, 2025

sipa reviewed Jun 6, 2025

View reviewed changes

This comment was marked as spam.

Sign in to view

DrahtBot mentioned this pull request Jun 21, 2025

checkqueue: implement a new scriptcheck worker pool with atomic variables #32791

Draft

HowHsu closed this Jun 22, 2025

luke-jr mentioned this pull request Jun 24, 2025

Runtime ScriptCheck thread calibration bitcoinknots/bitcoin#143

Open

HowHsu reopened this Jun 29, 2025

fanquake marked this pull request as draft July 3, 2025 08:22

HowHsu force-pushed the master branch from ef71fe4 to d6bf3b1 Compare July 4, 2025 09:55

HowHsu force-pushed the master branch from d6bf3b1 to ae1fbae Compare July 4, 2025 10:01

maflcko removed the CI failed label Sep 26, 2025

achow101 requested a review from fjahr October 22, 2025 15:27

HowHsu closed this Oct 23, 2025

checkqueue: set MAX_SCRIPTCHECK_THREADS to nCores - 1 #32692

checkqueue: set MAX_SCRIPTCHECK_THREADS to nCores - 1 #32692

Uh oh!

Conversation

HowHsu commented Jun 6, 2025

Uh oh!

DrahtBot commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage & Benchmarks

Reviews

Conflicts

Uh oh!

sipa commented Jun 6, 2025

Uh oh!

DrahtBot commented Jun 6, 2025

Uh oh!

HowHsu commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sipa commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sipa Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

HowHsu commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sipa commented Jun 6, 2025

Uh oh!

HowHsu commented Jun 6, 2025

Uh oh!

sipa commented Jun 6, 2025

Uh oh!

HowHsu commented Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as spam.

Uh oh!

HowHsu commented Jun 22, 2025

Uh oh!

luke-jr commented Jun 24, 2025

Uh oh!

HowHsu commented Jun 29, 2025

Uh oh!

HowHsu commented Jun 29, 2025

Uh oh!

maflcko commented Jul 3, 2025

Uh oh!

achow101 commented Oct 22, 2025

Uh oh!

HowHsu commented Oct 23, 2025

Uh oh!

fjahr commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

DrahtBot commented Jun 6, 2025 •

edited

Loading

HowHsu commented Jun 6, 2025 •

edited

Loading

sipa commented Jun 6, 2025 •

edited

Loading

HowHsu commented Jun 6, 2025 •

edited

Loading

HowHsu commented Jun 8, 2025 •

edited

Loading

fjahr commented Oct 29, 2025 •

edited

Loading