-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Description
Describe the bug
The elapsed time in flops_profiler gets by _get_elapsed_msec is msec now, introducing many time unit inconsistencies. #2090 and its fix #2095, and #1934 have resolved some of the misuses.
I mainly examined the MoE part and found the remaining below.
-
The flop profiler log print the duration as ms
https://github.com/microsoft/DeepSpeed/blob/fda63432ba67643d2f54885e45f0b2058fa2b937/deepspeed/runtime/engine.py#L1714-L1716
However it is captured and *1000 manually
https://github.com/microsoft/DeepSpeed/blob/fda63432ba67643d2f54885e45f0b2058fa2b937/deepspeed/moe/sharded_moe.py#L434 -
The time breakdown log needs ms here
https://github.com/microsoft/DeepSpeed/blob/fda63432ba67643d2f54885e45f0b2058fa2b937/deepspeed/runtime/engine.py#L2095-L2097