Skip to content

Enable new win-arm64 platform#82

Merged
malfet merged 1 commit intopytorch:masterfrom
gaborkertesz-linaro:win-arm64
Jul 5, 2022
Merged

Enable new win-arm64 platform#82
malfet merged 1 commit intopytorch:masterfrom
gaborkertesz-linaro:win-arm64

Conversation

@gaborkertesz-linaro
Copy link
Copy Markdown
Contributor

@gaborkertesz-linaro gaborkertesz-linaro commented Mar 11, 2022

This PR enables the new Windows on Arm (win-arm64) platform, by implementing the required APIs, reading topology information via Windows API.

@gaborkertesz-linaro
Copy link
Copy Markdown
Contributor Author

@gaborkertesz-linaro
Copy link
Copy Markdown
Contributor Author

gaborkertesz-linaro commented Mar 11, 2022

Build on win-arm64
cmake . -G "Visual Studio 16 2019" -A ARM64
cmake --build .

break;
}
/* Reset the lowest bit in affinity mask */
group_processors_mask &= (group_processors_mask - 1);
Copy link
Copy Markdown

@marcpems marcpems Mar 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This resets the most significant bits, not the lowest bit.
Would it be clearer to shift right and test for 1?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've inherited this code from x86 init: https://github.com/pytorch/cpuinfo/blob/master/src/x86/windows/init.c#L229
I mean it's absolutely not an excuse, I'll check that again.

Copy link
Copy Markdown
Contributor Author

@gaborkertesz-linaro gaborkertesz-linaro Mar 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it really clears all bits, that are lower than the least set bit.

ff 11111111
fe 11111110
fc 11111100
f8 11111000
f0 11110000
e0 11100000
c0 11000000
80 10000000
0 00000000

0x10
0x0

0x20
0x0

0x40
0x0

0x80
0x0

}

/* 2. Read topology information via MSDN API: packages, cores and caches*/
nr_of_packages = read_packages_for_processors(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are many things that can go wrong in these calls, but there is no way to chain the error and exit the function on error. That's not great for defensive code.
Consider adding throw/catch if permitted.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I see, there are not throw/catch in the current codebase and I'd think that's intentional, so I'd try to cleanup the error paths and make sure it's going to be safe.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes C code mostly

Copy link
Copy Markdown

@marcpems marcpems left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general principals of querying CPU details and assigning to the cpuinfo relevant structures looks ok to me. There are some issues with the error handling here that might need addressing.

@gaborkertesz-linaro
Copy link
Copy Markdown
Contributor Author

@malfet Could you review it? Thanks!

@gaborkertesz-linaro
Copy link
Copy Markdown
Contributor Author

related PR:
pytorch/pytorch#72424

Copy link
Copy Markdown
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good to me, but please explain why ARM64 needs a special treatment in ELSEIF(NOT CPUINFO_TARGET_PROCESSOR_MATCHES
And use bool instead of BOOL whenever possible (i.e. in callback its probably fine, but otherwise should not be used)

@gaborkertesz-linaro
Copy link
Copy Markdown
Contributor Author

@malfet Can I do anything else to close this and the related PR?

@strega-nil-ms
Copy link
Copy Markdown

any updates on this?

@gaborkertesz-linaro
Copy link
Copy Markdown
Contributor Author

any updates on this?

Not on my side, but I'd happily close this!

@niyas-sait
Copy link
Copy Markdown

Can this be merged, please?

@kalaskarsanket
Copy link
Copy Markdown

Any plans on merging this PR ?

Copy link
Copy Markdown
Contributor

@Maratyszcza Maratyszcza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use dash - to separate filename parts rather than underscore _ for consistency with other files in cpuinfo.

};

/* Please add new SoC/chip info here! */
static struct woa_chip_info woa_chips[] = {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This covers literally two devices. We need a more generic detection method that doesn't require hard-coding every SoC.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I absolutely agree that this solution is not ideal, but we haven't found any better and still robust option considering the existing Windows API. Also new SoCs won't turn up frequently, so even though it's a limited solution, it can be maintained.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a similar concern but there doesn't seem to be any better way to do it with the help of Windows API.

Ampere SoC also runs Windows/Arm64 - Does that need to be added?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice, but I can't really test that, so I'd ask to do as a follow-up commit.

CMakeLists.txt Outdated
LIST(APPEND CPUINFO_SRCS
src/arm/android/properties.c)
ENDIF()
ELSEIF(CMAKE_SYSTEM_NAME STREQUAL "Windows" AND CPUINFO_TARGET_PROCESSOR STREQUAL "ARM64")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong indentation?

Copy link
Copy Markdown

@kaadam kaadam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Gabor, thanks for working on this. It will help to solve our build issue in WoA Chromium. I managed to backport your PR to Chromium to check how the build works.
I left some comments about my findings. Basically lot of them are warnings reported by Clang-cl which treat as errors. Thanks.

static void set_cpuinfo_isa_fields();
static bool get_system_info_from_registry(
struct woa_chip_info** chip_info,
enum cpu_info_vendor* vendor);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: cpu_info_vendor -> cpuinfo_vendor


struct vendor_info {
char vendor_name[VENDOR_NAME_MAX];
enum cpu_info_vendor cpu_info_vendor;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

PINIT_ONCE init_once, PVOID parameter, PVOID* context)
{
struct woa_chip_info *chip_info = NULL;
enum cpu_info_vendor vendor = cpuinfo_vendor_unknown;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: cpu_info_vendor -> cpuinfo_vendor


static bool get_system_info_from_registry(
struct woa_chip_info** chip_info,
enum cpu_info_vendor* vendor)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@@ -0,0 +1,32 @@
#pragma once

bool get_core_uarch_for_efficiency(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move these functions down to avoid 'forward references to enum' reported by Clang-cl.

bool result = false;
char* textBuffer = NULL;
LPCTSTR cpu0_subkey =
"HARDWARE\\DESCRIPTION\\System\\CentralProcessor\\0";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please mark these strings as wide-character literals by using 'L'.
L"HARDWARE\DESCRIPTION\System\CentralProcessor\0"

Copy link
Copy Markdown
Contributor Author

@gaborkertesz-linaro gaborkertesz-linaro Jul 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly the MSVC compiler throws the following error to 'L' , even though I'd expected your proposal is the correct one:

warning C4133: 'initializing': incompatible types - from 'unsigned short [47]' to 'LPCTSTR'

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh okay, seems using (LPCTSTR) cast works fine.

break;
case CacheData:
processors[processor_global_index].cache.l1d = current_cache;
break;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here as above

case CacheData:
current_cache = l1d_base + l1d_counter;
l1d_counter++;
break;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here as above

break;
case CacheData:
numbers_of_caches[cpuinfo_cache_level_1d]++;
break;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clang-cl complains about the missing other two enum values:
"error: enumeration values 'CacheUnified' and 'CacheTrace' not handled in switch [-Werror,-Wswitch]"
Could you add these values as empty cases as well?

/* Please add new SoC/chip info here! */
static struct woa_chip_info woa_chips[] = {
/* Microsoft SQ1 Kryo 495 4 + 4 cores (3 GHz + 1.80 GHz) */
"Microsoft SQ1",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coud you add some extra braces to each struct elements?
../../third_party/cpuinfo/src/src/arm/windows/init.c(39,2): error: suggest braces around initialization of subobject [-Werror,-Wmissing-braces] "Microsoft SQ1", ^~~~~~~~~~~~~~~~ { ../../third_party/cpuinfo/src/src/arm/windows/init.c(52,2): error: suggest braces around initialization of subobject [-Werror,-Wmissing-braces] "Microsoft SQ2", ^~~~~~~~~~~~~~~~ {

@niyas-sait
Copy link
Copy Markdown

(@gaborkertesz-linaro is on holiday now and planning to address the comments in couple of weeks)

aarongable pushed a commit to chromium/chromium that referenced this pull request Jun 28, 2022
At the moment the Chrome build is broken for Windows on ARM,
since ruy build relies on cpuinfo third_party library which
doesn't support Windows on ARM port yet.
There is a pull request which implements the required functions
for WoA platfrom [1]. After it is landed we can re-enable cpuinfo build again.

[1] pytorch/cpuinfo#82

Bug: 1329774
Change-Id: Idb619eaa3180fac5068a939704151da0f653ee89
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/3721652
Reviewed-by: Tarun Bansal <tbansal@chromium.org>
Commit-Queue: Tarun Bansal <tbansal@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1018705}
@gaborkertesz-linaro gaborkertesz-linaro force-pushed the win-arm64 branch 2 times, most recently from f15be14 to 6551262 Compare July 5, 2022 08:58
@gaborkertesz-linaro
Copy link
Copy Markdown
Contributor Author

I hope I've fixed all of your findings, please let me know if I missed any of them!
Unfortunetaly I wasn't able to setup building with LLVM by CMake, but I've built with '-Wall' by MSVC compiler.

@kaadam
Copy link
Copy Markdown

kaadam commented Jul 5, 2022

I hope I've fixed all of your findings, please let me know if I missed any of them! Unfortunetaly I wasn't able to setup building with LLVM by CMake, but I've built with '-Wall' by MSVC compiler.

Gabor, thanks for updating it. Checking on my side, I get back to you asap.

This patch implements the required APIs for the new
win-arm64 platform by reading topology information via
Windows API.
Build config: cmake . -A ARM64
Copy link
Copy Markdown

@kaadam kaadam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks. I'm not a reviewer, but I give you an informal LGTM.

@gaborkertesz-linaro
Copy link
Copy Markdown
Contributor Author

@malfet What do you think?

@malfet malfet merged commit 1baac2b into pytorch:master Jul 5, 2022
megengine-bot pushed a commit to MegEngine/cpuinfo that referenced this pull request Sep 5, 2022
This patch implements the required APIs for the new
win-arm64 platform by reading topology information via
Windows API.
Build config: cmake . -A ARM64

GitOrigin-RevId: 1baac2b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants