Skip to content

Fix cpu core count detection#5625

Merged
rainersigwald merged 16 commits intodotnet:masterfrom
mfkl:core-count-detection
Sep 18, 2020
Merged

Fix cpu core count detection#5625
rainersigwald merged 16 commits intodotnet:masterfrom
mfkl:core-count-detection

Conversation

@mfkl
Copy link
Contributor

@mfkl mfkl commented Aug 7, 2020

Attempt to fix #435

  • I'm not sure whether similar code should be needed for Unix support.
  • I don't have a machine with > 32 cores to actually test this.

Potentially relevant:

Copy link
Contributor

@Forgind Forgind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this almost works—thanks! I imagine we do care about linux, but I don't know whether the same API works. @ladipro, do you know?

IntPtr ptr = IntPtr.Zero;
try
{
ptr = Marshal.AllocHGlobal((int)ReturnLength);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to make this part of the original definition? If it fails, you don't have to free it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually put Marshal.AllocHGlobal inside try/catch statements to catch potential OOM exceptions. Calling free with IntPtr.Zero is safe. Can change it, if you prefer.

{
count++;
}
pos += Marshal.ReadInt32(ptr, pos + 4);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
Can you add this to the for statement?

Also, is this right? Maybe I'm misunderstanding, but it looks like you're increasing it by whatever 32-bit integer happens to be four bytes along? Is that the size of the current struct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See new version.

int count = 0;
for(int pos = 0; pos < ReturnLength; )
{
LOGICAL_PROCESSOR_RELATIONSHIP Type = (LOGICAL_PROCESSOR_RELATIONSHIP)Marshal.ReadInt16(ptr, pos);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

LOGICAL_PROCESSOR_RELATIONSHIP Type = (LOGICAL_PROCESSOR_RELATIONSHIP)Marshal.ReadInt16(ptr, pos);
if(Type == LOGICAL_PROCESSOR_RELATIONSHIP.RelationProcessorCore)
{
count++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the documentation:
The specified logical processors share a single processor core. The ProcessorCore member contains additional information.

I switched this to run whenever numberOfCpus exceeded 1 and printed out the number of cpus it calculated. It printed 6. I have 6 dual cores. I think we can get a more accurate count if we use the ProcessorCore to get processors/core as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, reused previous corefx code, count should be more accurate now.

@mfkl
Copy link
Contributor Author

mfkl commented Aug 8, 2020

I imagine we do care about linux, but I don't know whether the same API works.

Don't think so as GetLogicalProcessorInformationEx is a win32 API. There are several options for Linux/macOS, unsure which one you'd prefer.

Also, shouldn't this cpu checking code be located in https://github.com/dotnet/runtime/blob/96f178d32b7ba62485917ac46ef1edcfd3c2d10d/src/coreclr/src/classlibnative/bcltype/system.cpp#L327 instead?

Copy link
Member

@ladipro ladipro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine we do care about linux, but I don't know whether the same API works. @ladipro, do you know?

Isn't the issue Windows specific? I, too, wish I had a computer with >32 cores 😃


internal unsafe struct GROUP_RELATIONSHIP
{
private byte MaximumGroupCount;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This is defined as

WORD                 MaximumGroupCount;

in the unmanaged world so should be an ushort here.

}

internal enum LOGICAL_PROCESSOR_RELATIONSHIP
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Bad indentation.

}
}

public unsafe static int GetPhysicalCoreCount()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a comment with a short <summary>?

int groupCount = current->Group.ActiveGroupCount;
for (int i = 0; i < groupCount; i++)
{
processorCount += (groupInfo + i)->ActiveProcessorCount;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a check that this dereference is not overrunning the containing buffer, just in case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure about the kind of check you have in mind here?

byte* ptr = bufferPtr, endPtr = bufferPtr + len;
while (ptr < endPtr)

This seems correct from what I can tell.
What check would you add to processorCount += (groupInfo + i)->ActiveProcessorCount;?

Copy link
Member

@ladipro ladipro Aug 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If groupCount is a bogus number, (groupInfo + i) could be pointing outside of the bufferPtr <-> endPtr block. I believe it's a good security practice to not trust the data and check that all memory accesses are within the bounds when using unsafe C# code.

Strictly speaking, this should also include current->Relationship and current->Group.ActiveGroupCount because ptr being less than endPtr is not sufficient for these reads to be from within the buffer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this code was directly taken from dotnet/runtime unmodified, and has been pushed to production, I'm inclined to believe it works and does not need additional checks. I'll add the checks since you prefer though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some checks 4223cc2, please feel free to let me know if that's not what you had in mind.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These checks will now just ignore invalid data which is, to our understanding, logically impossible. Should they not be asserting or throwing?

String.Equals(switchName, "maxcpucount", StringComparison.OrdinalIgnoreCase))
{
int numberOfCpus = Environment.ProcessorCount;
if (numberOfCpus == 32 && NativeMethodsShared.IsWindows) // 32-bit process, 32-bit Windows had a 32-core limit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better if either:

  • This code ran always, i.e. remove the numberOfCpus == 32 condition. It would help test the new code on more machines. Unless calling the Win32 API has a measurable perf impact.
  • Or this code ran only if the process is actually 32-bit. Users may choose to run 64-bit msbuild.exe and my understanding is that this fix is not needed there.

@benvillalobos
Copy link
Member

Closing to reopen and trigger a CI build. See #5646


internal unsafe struct GROUP_RELATIONSHIP
{
private byte MaximumGroupCount;
Copy link
Contributor

@elachlan elachlan Aug 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a build error related to the MaximumGroupCount:
error CS0169: The field 'NativeMethodsShared.GROUP_RELATIONSHIP.MaximumGroupCount' is never used

and ActiveGroupCount:
error CS0649: Field 'NativeMethodsShared.GROUP_RELATIONSHIP.ActiveGroupCount' is never assigned to, and will always have its default value 0

and SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX.Size:
error CS0649: Field 'NativeMethodsShared.SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX.Size' is never assigned to, and will always have its default value 0

and SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX.Relationship:
error CS0649: Field 'NativeMethodsShared.SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX.Relationship' is never assigned to, and will always have its default value

@mfkl mfkl force-pushed the core-count-detection branch from 475840d to ff27678 Compare August 11, 2020 16:56
Copy link
Contributor

@Forgind Forgind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this works—thank you!

String.Equals(switchName, "maxcpucount", StringComparison.OrdinalIgnoreCase))
{
int numberOfCpus = Environment.ProcessorCount;
if (IntPtr.Size == 4 && NativeMethodsShared.IsWindows) // 32-bit process, 32-bit Windows had a 32-core limit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a more intuitive way to get that it's 32-bit? If not, this is fine. I was only confused until I read the comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there isn't another way, can we comment with a link to https://docs.microsoft.com/en-us/dotnet/api/system.intptr.size?view=netcore-3.1, or the excerpt from that page
The size of a pointer or handle in this process, measured in bytes. The value of this property is 4 in a 32-bit process, and 8 in a 64-bit process. You can define the process type by setting the /platform switch when you compile your code with the C# and Visual Basic compilers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I understood immediately after reading the comment. Although I'd count this as somewhat subtle but definitely simple and functional.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe extract it to a readonly static property is32bit? Or use an enum?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! You reminded me of https://github.com/dotnet/msbuild/blob/master/src/Shared/EnvironmentUtilities.cs#L11

We should follow this model.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use this but test for the negation?

Copy link
Contributor

@elachlan elachlan Aug 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way works, but I'd make a new Is32Bit version. We probably don't have to worry about 16-bit machines, and we don't currently have to worry about 128-bit machines, but just in case this code lasts a while...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EnvironmentUtilities is internal to this assembly thus cannot be referenced without adding an InternalsVisibleTo which I don't think we want to do. Used !Environment.Is64BitProcess instead.

// Walk each SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX in the buffer, where the Size of each dictates how
// much space it's consuming. For each group relation, count the number of active processors in each of its group infos.
int processorCount = 0;
byte* ptr = bufferPtr, endPtr = bufferPtr + len;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
split onto two lines

I also kinda liked your for loop better, but this is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split onto two lines

done

RelationGroup,
RelationAll = 0xffff
}
internal struct SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question for myself. I noticed this struct has more members: https://docs.microsoft.com/en-us/windows/win32/api/winnt/ns-winnt-system_logical_processor_information_ex.

Can you simply not define all members and it will populate only what you have defined without raising some sort of error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you simply not define all members and it will populate only what you have defined without raising some sort of error?

Fair question. I think it depends on what you need to marshal. In case of union, this way of doing things seems to be fine and the original writer had a comment about it: https://github.com/dotnet/runtime/blob/3ab9c0b79300a5b3d954421aa4da15a9c2ca1271/src/libraries/Common/src/Interop/Windows/kernel32/Interop.GetLogicalProcessorInformationEx.cs#L27

/// Useful for getting the exact core count in 32 bits processes,
/// as Environment.ProcessorCount has a 32-core limit in that case.
/// </summary>
public unsafe static int GetPhysicalCoreCount()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this should be copy pasted code from a trustworthy repo (dotnet runtime, corefx, etc) and we just copy paste it and add a comment citing the source. If that's the case, can you please cite the source? (github file line permalink)

Well, ideally ideally it should be in a shared library of some sorts, but let's not get too ideal :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 70d41ae

if (groupInfo != null)
{
int groupCount = current->Group.ActiveGroupCount;
for (int i = 0; i < groupCount; i++)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have misunderstood, but I thought @ladipro was thinking something more like this?

Suggested change
for (int i = 0; i < groupCount; i++)
for (int i = 0; i < groupCount && groupInfo + i < endPtr; i++)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, yes, I was thinking something like this, not the null checks added in 4223cc2. It would have to be:

for (int i = 0; i < groupCount && groupInfo + i + 1 <= endPtr; i++)

i.e. the entire structure pointed to by (groupInfo + i) fits in the buffer.

I mentioned this to @danmosemsft offline and he doesn't consider this kind of hardening necessary when calling a trusted API. @mfkl, feel free to revert the 4223cc2. Apologies for randomizing you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, clearly we don't do it in the code this came from in dotnet/runtime and I think it would be simply "testing the OS has no bugs". But if there's a security context, don't take my word for it - we should ask others.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good, reverted.

String.Equals(switchName, "maxcpucount", StringComparison.OrdinalIgnoreCase))
{
int numberOfCpus = Environment.ProcessorCount;
if (!Environment.Is64BitProcess && NativeMethodsShared.IsWindows) // 32-bit process, 32-bit Windows had a 32-core limit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

64-bit Windows had a 64-core limit, too. I think this should be called when we're running on .NET Framework regardless of bitness (on .NET Core/.NET 5, we can rely on Environment.ProcessorCount to do the trick for us, right?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

64-bit Windows had a 64-core limit, too. I think this should be called when we're running on .NET Framework regardless of bitness (on .NET Core/.NET 5, we can rely on Environment.ProcessorCount to do the trick for us, right?).

.NET 5 Environment.ProcessorCount calls the native GetProcessorCount function. So I think its all good.
https://github.com/dotnet/runtime/blob/ea2b09beea535abf960651c59359e40f99f2fecf/src/libraries/System.Private.CoreLib/src/System/Environment.cs#L13

https://github.com/dotnet/runtime/blob/60eff3f3766631bd4e7ee256ca17619fec90e9e6/src/coreclr/src/System.Private.CoreLib/src/System/Environment.CoreCLR.cs#L83-L84

https://github.com/dotnet/runtime/blob/96f178d32b7ba62485917ac46ef1edcfd3c2d10d/src/coreclr/src/classlibnative/bcltype/system.cpp#L327-L364

Copy link
Contributor

@Forgind Forgind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks how I imagined it, thanks! I don't know whether Environment.ProcessorCount works properly on .NET Core/.NET 5+.

Copy link
Member

@rainersigwald rainersigwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good to me. I'm trying to pull some strings internally to borrow a machine big enough to test it on (looks like Azure will charge me merely $50,000/month for a big enough VM!).

mfkl and others added 2 commits August 19, 2020 13:27
Co-authored-by: Rainer Sigwald <raines@microsoft.com>
Co-authored-by: Rainer Sigwald <raines@microsoft.com>
@stackedsax
Copy link

I think this looks good to me. I'm trying to pull some strings internally to borrow a machine big enough to test it on (looks like Azure will charge me merely $50,000/month for a big enough VM!).

@rainersigwald I reached out to some people within the dotnet foundation to see if there might be funds for something like this. I'll let you know if I hear anything.

@elachlan
Copy link
Contributor

@rainersigwald I emailed AMD and they have offered to test on their 64-core ThreadRipper 3990X processor.

@rainersigwald
Copy link
Member

Current status:

I got access to a Standard_F72s_v2 VM, installed a private Visual Studio with this fix and tested it out.

It almost works! In 64-bit everything looks great. In 32-bit, the new code correctly identifies 2 processor groups (2 NUMA nodes), but Windows lies to us again and claims only 32 cores in each, when really there are 36--so we're artificially limited to 64 MSBuild nodes.

This is in the docs:

In addition, the sum of all per-group ActiveProcessorCount and MaximumProcessorCount values reported in PROCESSOR_GROUP_INFO structures may exclude some active logical processors.

I'll start a thread up with some Windows folks internally to see if this is the best we can do.

Works around win32 limitation--a group will report only up to 32 processors on a 32-bit process.
@rainersigwald
Copy link
Member

Latest push seems to work, at least until AMD blows things up.

@rainersigwald
Copy link
Member

However, depending on how dotnet/runtime#41902 is resolved, we may need to do this on .NET Core/Windows too.

@elachlan
Copy link
Contributor

elachlan commented Sep 4, 2020

@rainersigwald "AMD's Milan CPUs for the third generation of Epyc products are unlikely to support quadruple smt."
https://www.techpowerup.com/forums/threads/amd-zen-3-milan-cpus-not-getting-4-threads-per-core.259833/

I imagine in the future it might happen, but lucky not so soon.

Copy link
Member

@rainersigwald rainersigwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested my latest commit on the 72-core machine in dotnet build and msbuild.exe scenarios and both successfully started and used 72 nodes. Looks good to me!

@rainersigwald rainersigwald merged commit 580f1bb into dotnet:master Sep 18, 2020
@rainersigwald
Copy link
Member

Thank you @mfkl! This should be available starting in 16.8 preview 4.

@mfkl
Copy link
Contributor Author

mfkl commented Sep 18, 2020

Thank you for picking it up and finishing this :-)

@mfkl mfkl deleted the core-count-detection branch September 18, 2020 17:31
@rainersigwald
Copy link
Member

By the way, GitHub somehow mangled the squash/merge commit--I did not intentionally erase you from the commit history. Sorry about that!

@mfkl
Copy link
Contributor Author

mfkl commented Sep 18, 2020

No worries.

Forgind added a commit that referenced this pull request Sep 25, 2020
Makes three PRs (#5669, #5625, and #5653) use Change Waves (#5710) to opt out as a unit. Also added a test to ensure that change waves worked properly for one of the three PRs and corrected a minor issue with tests for changes under a change wave that assume it is enabled when there exist tests that disable the relevant change wave without resetting it to empty at the end.
@dmex
Copy link
Contributor

dmex commented Sep 5, 2022

@rainersigwald @mfkl

This PR for the MSBuild cpu core count detection has an issue with debugging processor groups. I've created a new issue here and included some extra information for debugging without physical hardware: #7943

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MSBuild detects wrong core count on multi-proc machines