Skip to content

<thread>: std::thread::hardware_concurrency limited to 64, even on Windows 11 #5453

@JoostHouben

Description

@JoostHouben

Describe the bug

On machines with more than 64 logical processors, std::thread::hardware_concurrency() is capped to 64.

Microsoft's documentation for this function states:

hardware_concurrency returns the number of logical processors, which corresponds to the number of hardware threads that can execute simultaneously. It takes into account the number of physical processors, the number of cores in each physical processor, and simultaneous multithreading on each single core.

Before Windows 11 and Windows Server 2022, applications were limited by default to a single processor group, having at most 64 logical processors. This limited the number of concurrently executing threads to 64. For more information, see Processor Groups.

Starting with Windows 11 and Windows Server 2022, processes and their threads have processor affinities that by default span all processors in the system and across multiple groups on machines with more than 64 processors. The limit on the number of concurrent threads is now the total number of logical processors in the system.

This suggests that on Windows 11 (and Windows Server 2022 and 2025), hardware_concurrency should return the total number of logical processors in the system. But it does not. On my Threadripper 7980X system, it returns 64 instead of 128.

Command-line test case

**********************************************************************
** Visual Studio 2022 Developer Command Prompt v17.13.3
** Copyright (c) 2022 Microsoft Corporation
**********************************************************************

C:\Temp>type hardware_concurrency.cpp
#include <thread>
#include <iostream>

int main() {
    std::cout << std::thread::hardware_concurrency() << std::endl;
}
C:\Temp>cl /EHsc /W4 /WX /std:c++latest .\hardware_concurrency.cpp
Microsoft (R) C/C++ Optimizing Compiler Version 19.43.34809 for x86
Copyright (C) Microsoft Corporation.  All rights reserved.

/std:c++latest is provided as a preview of language features from the latest C++
working draft, and we're eager to hear about bugs and suggestions for improvements.
However, note that these features are provided as-is without support, and subject
to changes or removal as the working draft evolves. See
https://go.microsoft.com/fwlink/?linkid=2045807 for details.

hardware_concurrency.cpp
Microsoft (R) Incremental Linker Version 14.43.34809.0
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:hardware_concurrency.exe
hardware_concurrency.obj

C:\Temp>hardware_concurrency.exe
64

C:\Temp>echo %NUMBER_OF_PROCESSORS%
128

C:\Temp>systeminfo

...
OS Name:                       Microsoft Windows 11 Pro
OS Version:                    10.0.26100 N/A Build 26100
...

Expected behavior

std::thread::hardware_concurrency() should return the number of logical CPUs available on the system, in this case 128.

STL version

Visual Studio version:

Microsoft Visual Studio Community 2022 (64-bit) - Current
Version 17.13.3

Additional context

This issue was previously reported as #1230. It was closed with this comment (#1230 (comment)):

Like the above commentators have pointed out std::hardware_concurrency is only a hint.
The reason it makes sense to return the number of threads in the processor group is for the reason that @sylveon gives. Without using Windows API calls there isn't a way to use all the threads you expect to be able to use if we returned the true number of processors on the system.

The comment by sylveon that was referred to was (#1230 (comment)):

A problem I see with making it process group aware is that if you use thread::hardware_concurrency as a reference for any kind of concurrency, without using the Windows API to configure processor groups you'll be running your 128 threads on 64 logical processors only.

Sure, making the STL aware of processor groups might be doable, but consider the case where you use thread::hardware_concurrency without using std::thread (arguably, this is bad but unfortunately fairly common, just searching for hardware_concurrency on GitHub gives you plenty of examples where this is done)

I don't think it's a standards violation, as the standard says the value should only be considered as a hint. In this case, you're being hinted at how much threads are supported by default without additional configuration.

Note that this explanation no longer holds. As described in Microsoft's official STL documentation quoted above, as well as in the Windows API documentation, starting on Windows 11, all logical processors across all processor groups can be used without needing to make any special Windows API calls. Moreover, multithreaded STL code actually does make use of more than 64 logical processors. For example, if I use a parallel execution policy with an algorithm from <algorithm>, all 128 logical processors of my machine get fully utilized.

Another similar issue was #2099. That issue was closed last September by updating the documentation (including what I quoted above), see here. However, as explained in this issue, the behavior of hardware_concurrency is not consistent with the updated documentation, and so hardware_concurrency should be updated.

The STL implementation of hardware_concurrency currently delegates to _Thrd_hardware_concurrency, which is:

_CRTIMP2_PURE unsigned int __cdecl _Thrd_hardware_concurrency() noexcept { // return number of processors
    SYSTEM_INFO info;
    GetNativeSystemInfo(&info);
    return info.dwNumberOfProcessors;
}

On Windows 11 and above this could be updated to use GetLogicalProcessorInformationEx. Here is a very crude implementation that gets the logical processors in a single physical package. Though perhaps a different RelationshipType would be more appropriate.

unsigned my_Thrd_hardware_concurrency() noexcept {  // return number of processors in the physical package
    DWORD byte_length = 0;

    GetLogicalProcessorInformationEx(RelationProcessorPackage, nullptr, &byte_length);

    std::unique_ptr<char[]> buffer(new char[byte_length]);
    auto ptr = PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX(buffer.get());

    if (!GetLogicalProcessorInformationEx(RelationProcessorPackage, ptr, &byte_length)) {
        std::cerr << "GetLogicalProcessorInformationEx failed." << std::endl;
    }

    int num_groups     = ptr->Processor.GroupCount;
    int num_processors = 0;
    for (int group = 0; group < num_groups; ++group) {
        num_processors += std::popcount(ptr->Processor.GroupMask[group].Mask);
    }

    return num_processors;
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingfixedSomething works now, yay!

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions