Skip to content

Conversation

@ShadenSmith
Copy link
Contributor

No description provided.

@ShadenSmith ShadenSmith requested a review from tjruwase February 9, 2020 04:11
@ShadenSmith ShadenSmith added the documentation Improvements or additions to documentation label Feb 9, 2020
@ShadenSmith ShadenSmith merged commit 92514ac into master Feb 10, 2020
@ShadenSmith ShadenSmith deleted the shaden/lrrt_tut branch February 10, 2020 19:04
kouml pushed a commit to kouml/DeepSpeed that referenced this pull request Apr 3, 2020
jeffra added a commit that referenced this pull request May 19, 2020
Co-authored-by: yuxionghe <yuxhe@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
rraminen added a commit to rraminen/DeepSpeed that referenced this pull request Nov 18, 2021
delock referenced this pull request in delock/DeepSpeedSYCLSupport Sep 21, 2022
liamcli pushed a commit to determined-ai/DeepSpeed that referenced this pull request May 8, 2023
* Add SLURM launcher

Signed-off-by: Dashiell Stander <dash.stander@gmail.com>

* Need to import SlurmRunner

Signed-off-by: Dashiell Stander <dashiell@ip-172-31-45-20.ec2.internal>

* Clean up the config JSON

Signed-off-by: Dashiell Stander <dashiell@ip-172-31-45-20.ec2.internal>

* Properly clean up json configs

Signed-off-by: Dashiell Stander <dashiell@ip-172-31-45-20.ec2.internal>

* runner

Signed-off-by: Dashiell Stander <dstander@protonmail.com>

* Switch to using an argument

Signed-off-by: Dashiell Stander <dstander@protonmail.com>

* Pre-commit

Signed-off-by: Dashiell Stander <dstander@protonmail.com>

* Prevent clean-up when using slurm, add in hostfile

Signed-off-by: Dashiell Stander <dstander@protonmail.com>

* Pass launcher in to autotuning jobs

Signed-off-by: Dashiell Stander <dstander@protonmail.com>

* Pass slurm comment in

Signed-off-by: Dashiell Stander <dstander@protonmail.com>

* Add a comment argument to DeepSpeed runner

Signed-off-by: Dashiell Stander <dstander@protonmail.com>

* Switch slurm_comment to just comment

Signed-off-by: Dashiell Stander <dstander@protonmail.com>

* Switch slurm_comment to just comment

Signed-off-by: Dashiell Stander <dstander@protonmail.com>

* Use SLURM --nodelist instead of --include

Co-authored-by: Quentin Anthony <anthony.301@osu.edu>
Signed-off-by: Dashiell Stander <dstander@protonmail.com>

* Use SLURM --nodelist instead of --include
>
>
> Co-authored-by: Quentin Anthony <anthony.301@osu.edu>

Signed-off-by: Dashiell Stander <dstander@protonmail.com>

* Launcher args

Signed-off-by: Dashiell Stander <dstander@protonmail.com>

* Debug print statement...

Signed-off-by: Dashiell Stander <dstander@protonmail.com>

* Debug print statements...

Signed-off-by: Dashiell Stander <dstander@protonmail.com>

* Debug print statements...

Signed-off-by: Dashiell Stander <dstander@protonmail.com>

* Debug print statements...

Signed-off-by: Dashiell Stander <dstander@protonmail.com>

* Debug print statements...

Signed-off-by: Dashiell Stander <dstander@protonmail.com>

* user_config bug

Signed-off-by: Dashiell Stander <dstander@protonmail.com>

* user_config bug

Signed-off-by: Dashiell Stander <dstander@protonmail.com>

* Fix config dict

* Pydantic to dict

Signed-off-by: Dashiell Stander <dstander@protonmail.com>

* Pydantic to dict

Signed-off-by: Dashiell Stander <dstander@protonmail.com>

* Will it work now?

Signed-off-by: Dashiell Stander <dstander@protonmail.com>

* Just make it a dict immediately

Signed-off-by: Dashiell Stander <dstander@protonmail.com>

* Exclude unset things

Signed-off-by: Dashiell Stander <dstander@protonmail.com>

* Add dilation to pooling flops profiler

Signed-off-by: Dashiell Stander <dstander@protonmail.com>

* Adding return_indices...

Signed-off-by: Dashiell Stander <dstander@protonmail.com>

* Do cleanup with SLURM.

Co-authored-by: Quentin Anthony <anthony.301@osu.edu>

* Do cleanup with SLURM.

Co-authored-by: Quentin Anthony <anthony.301@osu.edu>

* Horrific hack to get metrics.json

* Push pipeline grad tail fix

* No longer hardcode path

Signed-off-by: Dashiell Stander <dstander@protonmail.com>

* Also pass in no_ssh_check

Signed-off-by: Dashiell Stander <dstander@protonmail.com>

* Also pass in no_ssh_check

Signed-off-by: Dashiell Stander <dstander@protonmail.com>

* Also pass in master_addr

Signed-off-by: Dashiell Stander <dstander@protonmail.com>

* Stop hardcoding number of steps....

Signed-off-by: Dashiell Stander <dstander@protonmail.com>

* detailed flops breakdown

Signed-off-by: Dashiell Stander <dstander@protonmail.com>

* Fix autotuning reporting bug

Signed-off-by: Dashiell Stander <dstander@protonmail.com>

* Fix autotuning reporting bug

Signed-off-by: Dashiell Stander <dstander@protonmail.com>

* Actually off by a million, not a thousand

Signed-off-by: Dashiell Stander <dstander@protonmail.com>

* Clean up debugging stuff

Signed-off-by: Dashiell Stander <dstander@protonmail.com>

* Add JSRunner for summit launching on multiple nodes

* import JSRUN_LAUNCHER from constants

* Fix jsrun typo

* Update multinode_runner.py (deepspeedai#45)

* add CUDA_VISIBLE_DEVICES to jsrunner

---------

Signed-off-by: Dashiell Stander <dash.stander@gmail.com>
Signed-off-by: Dashiell Stander <dashiell@ip-172-31-45-20.ec2.internal>
Signed-off-by: Dashiell Stander <dstander@protonmail.com>
Co-authored-by: Dashiell Stander <dash.stander@gmail.com>
Co-authored-by: Dashiell Stander <dashiell@ip-172-31-45-20.ec2.internal>
Co-authored-by: Dashiell Stander <dstander@protonmail.com>
Co-authored-by: Quentin TastyRice <quentin@ip-172-31-47-203.ec2.internal>
Co-authored-by: Dashiell Stander <dashiell@ip-172-31-47-203.ec2.internal>
Co-authored-by: MLRichter <matrichter@uos.de>
Co-authored-by: Stella Biderman <stellabiderman@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants