Skip to content

Fix sparse windows on CPU with MKL#102604

Closed
mantaionut wants to merge 1 commit intopytorch:mainfrom
JaneaSystems:Fix_sparse_windows
Closed

Fix sparse windows on CPU with MKL#102604
mantaionut wants to merge 1 commit intopytorch:mainfrom
JaneaSystems:Fix_sparse_windows

Conversation

@mantaionut
Copy link
Copy Markdown
Contributor

@mantaionut mantaionut commented May 31, 2023

Fix #97352.
This PR changes the way the linking to intel MKL is done and updating MKL on Windows to mkl-2021.4.0 .
There are for both conda and pip packages MKL version with which you can link dynamically. mkl-devel contains the static versions of the dlls and MKL contains the needed dlls for the runtime. MKL dlls and static libs starting with 2021.4.0 have the version in their names( for MKL 2023 we have mkl_core.2.dll and for 2021.4.0 we have mkl_core.1.dll) so its possible to have multiple versions installed and it will work properly.
For the wheel build, I added dependency for whell MKL and on conda a dependecy for the conda MKL and on libtorch I copied the MKL binaries in libtorch.
In order to test this PR I have to use custom builder pytorch/builder#1467

cc @alexsamardzic @nikitaved @pearu @cpuhrsch @amjames @bhosmer @jcaip @peterjc123 @mszhanyi @skyline75489 @nbcsm @vladimir-aubrecht @iremyux @Blackhex @cristianPanaite

@pytorch-bot pytorch-bot bot added the release notes: releng release notes category label May 31, 2023
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented May 31, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/102604

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 4156083 with merge base 6049998 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@mantaionut mantaionut force-pushed the Fix_sparse_windows branch 4 times, most recently from 06a2d02 to 987b456 Compare June 5, 2023 09:37
@mantaionut mantaionut added the ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR label Jun 5, 2023
@mantaionut mantaionut force-pushed the Fix_sparse_windows branch 2 times, most recently from 4eb772e to b387cd9 Compare June 6, 2023 05:09
@mantaionut mantaionut added the ciflow/binaries Trigger all binary build and upload jobs on the PR label Jun 6, 2023
@mantaionut mantaionut force-pushed the Fix_sparse_windows branch 2 times, most recently from 3769714 to 097ea78 Compare June 11, 2023 17:00
@mantaionut mantaionut force-pushed the Fix_sparse_windows branch 2 times, most recently from f22be71 to fc8b4da Compare June 26, 2023 06:55
@mantaionut mantaionut force-pushed the Fix_sparse_windows branch 4 times, most recently from 85b363a to aaf3d35 Compare June 30, 2023 06:08
@mantaionut mantaionut force-pushed the Fix_sparse_windows branch 11 times, most recently from b2a7af2 to b72a468 Compare August 6, 2023 09:32
@mantaionut mantaionut force-pushed the Fix_sparse_windows branch 4 times, most recently from 8e5d591 to 1139d36 Compare October 16, 2023 05:17
@mantaionut mantaionut marked this pull request as ready for review October 16, 2023 09:41
Copy link
Copy Markdown
Collaborator

@IvanYashchuk IvanYashchuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to the aten/ directory look good. Exciting update!

@IvanYashchuk IvanYashchuk requested review from malfet and removed request for lezcano and nikitaved October 16, 2023 14:13
@IvanYashchuk IvanYashchuk added the module: sparse Related to torch.sparse label Oct 16, 2023
@lezcano
Copy link
Copy Markdown
Collaborator

lezcano commented Oct 16, 2023

cc @amjames for visibility

Copy link
Copy Markdown
Collaborator

@amjames amjames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly asking for clarification on a couple of things.

setup.py Outdated
"fsspec",
]
if IS_WINDOWS:
install_requires.append("mkl==2021.4.0")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is an an exact pin needed here? I don't see why can't it be 2021.4.0 or later?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MKL has the version in its name. Since we still link with the static version of the dlls it will search for dlls with 1 in their name. So if we install MKL 2023.1.0 it will have those libraries but we 2 in their names and it will fail to load. So based on my tests the only thing compatible MKL versions are 2021.1.1-2021.4.0.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I figured you had a good reason here.

@cpuhrsch
Copy link
Copy Markdown
Contributor

@malfet - Does this work for you?

@malfet
Copy link
Copy Markdown
Contributor

malfet commented Oct 23, 2023

@cpuhrsch Change can not go as is as it depends on builder change that hasn't merged yet.

Copy link
Copy Markdown
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make suggested changes (and rebase your builder PR first, but looks ok to me)

!{{ set_runner_specific_vars() }}
!{{ common.checkout(deep_clone=False, directory="pytorch") }}
!{{ common.checkout(deep_clone=False, directory="builder", repository=common.builder_repo, branch=common.builder_branch) }}
!{{ common.checkout(deep_clone=False, directory="builder", repository="mantaionut/builder", branch="copy_mkl_windows") }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be updated when builder PR is merged

Comment on lines 1118 to +1097
]
if IS_WINDOWS:
install_requires.append("mkl>=2021.1.1,<=2021.4.0")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, why build windows with MKL-2021.x, while Linux is build with MKL-2022?

Also, in order to place nice with poetry, one needs to add it unconditionally but add arch + platform constraint

Suggested change
]
if IS_WINDOWS:
install_requires.append("mkl>=2021.1.1,<=2021.4.0")
"mkl>=2021.1.1,<=2021.4.0; platform_system == \"Windows\" and platform_machine == \"x86_64\"",
]

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking first to have this implementation since previously we were using MKL 2020. See if everything works well and then in the future will be easier to just change the version to 2022.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mantaionut do you still want to update it to 2022? Also, are you sure that we are not double packing mkl into the wheel, but keeping dependency at the same time?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@malfet the binaries of mkl are not copied in the wheel, only in libtorch they are copied. I created a draft PR #118200 for updating to 2022. However I see for conda it might not be possible since numpy 1.19 -> mkl[version='>=2019.4,<2021.0a0|>=2021.4.0,<2022.0a0|>=2023.1.0,<2024.0a0'].
So based on this we could update to 2022 however on conda we should use 2023 instead. Let me know if you think this will work for you.

@cpuhrsch
Copy link
Copy Markdown
Contributor

cpuhrsch commented Jan 9, 2024

@mantaionut @malfet - Do we still want to move forward with this?

@mantaionut
Copy link
Copy Markdown
Contributor Author

@mantaionut @malfet - Do we still want to move forward with this?

I would like to move forward. But it also depends on pytorch/builder#1467 for which i had to make additional changes after rebasing.

@malfet
Copy link
Copy Markdown
Contributor

malfet commented Jan 22, 2024

@mantaionut @malfet - Do we still want to move forward with this?

I would like to move forward. But it also depends on pytorch/builder#1467 for which i had to make additional changes after rebasing.

Merged builder PR, please remove temporary changes from this PR and merge it

Added support for intel Sparse on Windows
@mantaionut
Copy link
Copy Markdown
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/binaries Trigger all binary build and upload jobs on the PR ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/trunk Trigger trunk jobs on your pull request Merged module: mkl Related to our MKL support module: sparse Related to torch.sparse module: windows Windows support for PyTorch open source release notes: releng release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sparse is not available on Windows

8 participants