Fix for AMD unit tests #2047

mrwyattii · 2022-06-23T17:52:34Z

Add asserts to prevent FP16 with CPUAdam on AMD CPUs (not supported)
AMD runners now use a venv, removed sudo from pip install
Add skips to unit tests for CPUAdam

deepspeed/ops/adam/cpu_adam.py

jithunnair-amd · 2023-11-21T20:46:07Z

deepspeed/ops/adam/cpu_adam.py

+                for param_id, p in enumerate(group['params']):
+                    if p.dtype == torch.half:
+                        logger.warning(
+                            "FP16 params for CPUAdam may not work on AMD CPUs")


@mrwyattii @RezaYazdaniAminabadi Can you please elaborate a bit more on what error/issue you observed when you added this warning?

w.r.t #4698
cc: @loadams

Hi @rraminen, Hi @jithunnair-amd

I think CPU-Adam should be supported to some extent but some overlapping features for copying data from CPU to GPU may not be working properly. Also, I remember that I could not find the corresponding vector instructions for handling the fp16 data-type. So, we may be able to remove this and let it run safely only for restricted cases. Or, if you can help verify the functionality that we need for CPU-Adam is supported, we can have another PR to add this?
Thanks,
Reza

@RezaYazdaniAminabadi You mentioned: "I could not find the corresponding vector instructions for handling the fp16 data-type." Would that be the ones used in SIMD_STORE2 and SIMD_LOAD2 here: https://github.com/microsoft/DeepSpeed/blob/master/csrc/includes/simd.h#L55?

@RezaYazdaniAminabadi,
Below are the only references to FP16 instructions in simd.h:

https://github.com/microsoft/DeepSpeed/blob/a8b821535aa0b254efa681d51b4951734ca021cc/csrc/includes/simd.h#L36
https://github.com/microsoft/DeepSpeed/blob/a8b821535aa0b254efa681d51b4951734ca021cc/csrc/includes/simd.h#L58

Those conversion instructions are part of AVX512F which is supported in Zen4 as it is very basic AVX512 feature. Do you want to qualify that this code works fine on newer than AMD Zen 4 CPUs?

Hi @rraminen @jithunnair-amd

sorry for the delay.
@jithunnair-amd, I think except the loading part there were some casting operations used which is kind of hacky and I had to do that to go around loading the data right. Honestly, I have not tested this part of the code recently but I would be happy to change this part based on your suggestion (and remove the warning or give warning for older architectures as @rraminen mentioned). Unfortunately, I don't have AMD CPUs to try this so I would need your help to verify this.
Thanks,
Reza

assert no FP16 with AMD CPUs

3ab656f

mrwyattii requested review from RezaYazdaniAminabadi, ShadenSmith, arashb, awan-10, cli99, conglongli, duli2012, eltonzheng, jeffra, minjiaz, samyam, tjruwase, xiaoxiawu-microsoft and yaozhewei as code owners June 23, 2022 17:52

mrwyattii added 2 commits June 23, 2022 10:57

add unit test for AMD assert error

762eff9

missing import

b30e68d

RezaYazdaniAminabadi reviewed Jun 23, 2022

View reviewed changes

deepspeed/ops/adam/cpu_adam.py Outdated Show resolved Hide resolved

downgrade assert to warning

6d37860

RezaYazdaniAminabadi approved these changes Jun 23, 2022

View reviewed changes

mrwyattii merged commit 7bae53d into deepspeedai:master Jun 23, 2022

jithunnair-amd reviewed Nov 21, 2023

View reviewed changes

jithunnair-amd mentioned this pull request Apr 11, 2024

[WARNING] [cpu_adam.py:84:__init__] FP16 params for CPUAdam may not work on AMD CPUs #4698

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix for AMD unit tests #2047

Fix for AMD unit tests #2047

Uh oh!

mrwyattii commented Jun 23, 2022

Uh oh!

Uh oh!

jithunnair-amd Nov 21, 2023 •

edited

Loading

Uh oh!

rraminen Nov 27, 2023

Uh oh!

RezaYazdaniAminabadi Nov 28, 2023

Uh oh!

jithunnair-amd Apr 11, 2024 •

edited

Loading

Uh oh!

rraminen Apr 15, 2024

Uh oh!

sfc-gh-reyazda Apr 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Fix for AMD unit tests #2047

Fix for AMD unit tests #2047

Uh oh!

Conversation

mrwyattii commented Jun 23, 2022

Uh oh!

Uh oh!

jithunnair-amd Nov 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rraminen Nov 27, 2023

Choose a reason for hiding this comment

Uh oh!

RezaYazdaniAminabadi Nov 28, 2023

Choose a reason for hiding this comment

Uh oh!

jithunnair-amd Apr 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rraminen Apr 15, 2024

Choose a reason for hiding this comment

Uh oh!

sfc-gh-reyazda Apr 18, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jithunnair-amd Nov 21, 2023 •

edited

Loading

jithunnair-amd Apr 11, 2024 •

edited

Loading