More precise std::hardware_concurrency#594
More precise std::hardware_concurrency#594AlexGuteniev wants to merge 8 commits intomicrosoft:masterfrom
Conversation
stl/src/cthread.cpp
Outdated
| return info.dwNumberOfProcessors; | ||
| DWORD_PTR process_affinity; | ||
| DWORD_PTR system_affinity; | ||
| if (GetProcessAffinityMask(GetCurrentProcess(), &process_affinity, &system_affinity)) |
There was a problem hiding this comment.
GetProcessAffinityMask documentation says:
On a system with more than 64 processors, if the threads of the calling process are in a single processor group, the function sets the variables pointed to by
lpProcessAffinityMaskandlpSystemAffinityMaskto the process affinity mask and the processor mask of active logical processors for that group. If the calling process contains threads in multiple groups, the function returns zero for both affinity masks.
dwNumberOfProcessors documentation says:
The number of logical processors in the current group.
Is there a behavioral difference here? If so, is it desirable? (These are not rhetorical questions; I am far from a WinAPI expert.)
There was a problem hiding this comment.
Counting GetProcessAffinityMask is documented to return zero affinity for cases when process belongs to multiple processor groups.
As dwNumberOfProcessors documentation does not say anything about cases when process belongs to multiple processor groups. It could be assumed, that it still the number of logical processors in the current group. (It might be however lack of documentation for such cases).
So there's no difference regarding maximum return until process belongs to multiple processor groups.
And there is a possible difference, when process belongs to multiple processor groups.
Process would belong to multiple processor groups after one of its thread is explicitly specified to run in different processor group (by a call to SetThreadGroupAffinity or as creation parameter). This is done manually for each thread, and std::thread will not do this by default. So zero result of std::hardware_concurrency() is legitimate for this case (the value is not well defined, it depends on users further calls to SetThreadGroupAffinity). Per-group value returned by dwNumberOfProcessors, which is supposedly 64 for multi-group cases, if it really still returns nonzero for such cases (system will try to pack 64 processors in one group) -- is also suitable, as std::hardware_concurrency() is a hint. Once user took control by SetThreadGroupAffinity call, they took responsibility about effective hardware concurrency.
This change primarily targets other case, when this or other process calls SetProcessAffinityMask for this process. This technique can be used to test concurrent algorithms for performance on fewer CPUs (remove some other cores, other SMTs or both), or for fixing concurrency bugs (Application Compatibility can do this, for example).
For multi-group cases - I'm not sure. There is a possibility to enumerate all processor groups by obtaining GROUP_RELATIONSHIP from GetLogicalProcessorInformationEx. But I do not know if this is the right thing to do, as bigger concurrency value obtained by such means is only available for those who will call SetThreadGroupAffinity.
There was a problem hiding this comment.
And there is a possible difference, when process belongs to multiple processor groups.
I missed that. Sadly to test this we need a machine with more than 64 threads...
(the value is not well defined, it depends on users further calls to SetThreadGroupAffinity).
I disagree, the value is defined for all portable uses of std::thread, since it uses the default group.
There is a possibility to enumerate all processor groups
Right, we should not do that because std::thread can't reach the other groups.
…essary comment removal
BillyONeal
left a comment
There was a problem hiding this comment.
We need to confirm that the behavior is unchanged for multi-group systems. (There might be nothing @AlexGutenev can do about this yet)
|
We believe the compiler back-end team has an 80-core machine; we should look into borrowing it for testing. |
I asked around on win32prg and got a volunteer. Just need to prepare the test for them... |
Try this, in x64 primarily, and in x86 for more data: |
|
I think the right behavior is to do this change, then fall back to the old method if the result is 0. I'll do some experimentation tomorrow; Windows folks pointed me to this knob for testing: https://docs.microsoft.com/en-us/windows-hardware/drivers/devtest/boot-parameters-to-test-drivers-for-multiple-processor-group-support (A 3970X that the system thinks has 4 sockets?) |
|
I am too curious to avoid trying myself. So, my results show that even after I successfully called Maybe it is not covered by simulation knob. Maybe documentation for Another source of concern is truncation of masks in 32 bit process, and there are even two cases:
|
|
If it is possible to propose Windows API functions, I would propose the following:
For pull request, maybe it should be closed, if full understanding is not reachable. |
I believe the right place for something like that is the Windows feedback tool. Of course even if they add such a thing it'll be some time before we can use it here :) |
|
OK I did some testing and I think this breaks 32 bit programs, since the status quo is that systems with more than 32 CPUs will do the right thing for them: As a result I think we should leave this unchanged. |
|
On a good part, |
I'm referring to the opposite. On my 32c/64t system, |
|
I understand that the change is breaking this case. |

Description
Mostly for debugging purposes for code that pretends single CPU
Checklist
Be sure you've read README.md and understand the scope of this repo.
If you're unsure about a box, leave it unchecked. A maintainer will help you.
_Uglyas perhttps://eel.is/c++draft/lex.name#3.1 or there are no product code changes.
verified by an STL maintainer before automated testing is enabled on GitHub,
leave this unchecked for initial submission).
members, adding virtual functions, changing whether a type is an aggregate
or trivially copyable, etc.).
the C++ Working Draft (including any cited standards), other WG21 papers
(excluding reference implementations outside of proposed standard wording),
and LWG issues as reference material. If they were derived from a project
that's already listed in NOTICE.txt, that's fine, but please mention it.
If they were derived from any other project (including Boost and libc++,
which are not yet listed in NOTICE.txt), you must mention it here,
so we can determine whether the license is compatible and what else needs
to be done.