-
Notifications
You must be signed in to change notification settings - Fork 4.7k
LRRT tutorial #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
LRRT tutorial #45
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
tjruwase
requested changes
Feb 10, 2020
ac53563 to
9701f05
Compare
tjruwase
approved these changes
Feb 10, 2020
kouml
pushed a commit
to kouml/DeepSpeed
that referenced
this pull request
Apr 3, 2020
jeffra
added a commit
that referenced
this pull request
May 19, 2020
Co-authored-by: yuxionghe <yuxhe@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
rraminen
added a commit
to rraminen/DeepSpeed
that referenced
this pull request
Nov 18, 2021
…: 3.6.13 not in '>=3.7' during cupy build. (deepspeedai#45)
delock
referenced
this pull request
in delock/DeepSpeedSYCLSupport
Sep 21, 2022
liamcli
pushed a commit
to determined-ai/DeepSpeed
that referenced
this pull request
May 8, 2023
* Add SLURM launcher Signed-off-by: Dashiell Stander <dash.stander@gmail.com> * Need to import SlurmRunner Signed-off-by: Dashiell Stander <dashiell@ip-172-31-45-20.ec2.internal> * Clean up the config JSON Signed-off-by: Dashiell Stander <dashiell@ip-172-31-45-20.ec2.internal> * Properly clean up json configs Signed-off-by: Dashiell Stander <dashiell@ip-172-31-45-20.ec2.internal> * runner Signed-off-by: Dashiell Stander <dstander@protonmail.com> * Switch to using an argument Signed-off-by: Dashiell Stander <dstander@protonmail.com> * Pre-commit Signed-off-by: Dashiell Stander <dstander@protonmail.com> * Prevent clean-up when using slurm, add in hostfile Signed-off-by: Dashiell Stander <dstander@protonmail.com> * Pass launcher in to autotuning jobs Signed-off-by: Dashiell Stander <dstander@protonmail.com> * Pass slurm comment in Signed-off-by: Dashiell Stander <dstander@protonmail.com> * Add a comment argument to DeepSpeed runner Signed-off-by: Dashiell Stander <dstander@protonmail.com> * Switch slurm_comment to just comment Signed-off-by: Dashiell Stander <dstander@protonmail.com> * Switch slurm_comment to just comment Signed-off-by: Dashiell Stander <dstander@protonmail.com> * Use SLURM --nodelist instead of --include Co-authored-by: Quentin Anthony <anthony.301@osu.edu> Signed-off-by: Dashiell Stander <dstander@protonmail.com> * Use SLURM --nodelist instead of --include > > > Co-authored-by: Quentin Anthony <anthony.301@osu.edu> Signed-off-by: Dashiell Stander <dstander@protonmail.com> * Launcher args Signed-off-by: Dashiell Stander <dstander@protonmail.com> * Debug print statement... Signed-off-by: Dashiell Stander <dstander@protonmail.com> * Debug print statements... Signed-off-by: Dashiell Stander <dstander@protonmail.com> * Debug print statements... Signed-off-by: Dashiell Stander <dstander@protonmail.com> * Debug print statements... Signed-off-by: Dashiell Stander <dstander@protonmail.com> * Debug print statements... Signed-off-by: Dashiell Stander <dstander@protonmail.com> * user_config bug Signed-off-by: Dashiell Stander <dstander@protonmail.com> * user_config bug Signed-off-by: Dashiell Stander <dstander@protonmail.com> * Fix config dict * Pydantic to dict Signed-off-by: Dashiell Stander <dstander@protonmail.com> * Pydantic to dict Signed-off-by: Dashiell Stander <dstander@protonmail.com> * Will it work now? Signed-off-by: Dashiell Stander <dstander@protonmail.com> * Just make it a dict immediately Signed-off-by: Dashiell Stander <dstander@protonmail.com> * Exclude unset things Signed-off-by: Dashiell Stander <dstander@protonmail.com> * Add dilation to pooling flops profiler Signed-off-by: Dashiell Stander <dstander@protonmail.com> * Adding return_indices... Signed-off-by: Dashiell Stander <dstander@protonmail.com> * Do cleanup with SLURM. Co-authored-by: Quentin Anthony <anthony.301@osu.edu> * Do cleanup with SLURM. Co-authored-by: Quentin Anthony <anthony.301@osu.edu> * Horrific hack to get metrics.json * Push pipeline grad tail fix * No longer hardcode path Signed-off-by: Dashiell Stander <dstander@protonmail.com> * Also pass in no_ssh_check Signed-off-by: Dashiell Stander <dstander@protonmail.com> * Also pass in no_ssh_check Signed-off-by: Dashiell Stander <dstander@protonmail.com> * Also pass in master_addr Signed-off-by: Dashiell Stander <dstander@protonmail.com> * Stop hardcoding number of steps.... Signed-off-by: Dashiell Stander <dstander@protonmail.com> * detailed flops breakdown Signed-off-by: Dashiell Stander <dstander@protonmail.com> * Fix autotuning reporting bug Signed-off-by: Dashiell Stander <dstander@protonmail.com> * Fix autotuning reporting bug Signed-off-by: Dashiell Stander <dstander@protonmail.com> * Actually off by a million, not a thousand Signed-off-by: Dashiell Stander <dstander@protonmail.com> * Clean up debugging stuff Signed-off-by: Dashiell Stander <dstander@protonmail.com> * Add JSRunner for summit launching on multiple nodes * import JSRUN_LAUNCHER from constants * Fix jsrun typo * Update multinode_runner.py (deepspeedai#45) * add CUDA_VISIBLE_DEVICES to jsrunner --------- Signed-off-by: Dashiell Stander <dash.stander@gmail.com> Signed-off-by: Dashiell Stander <dashiell@ip-172-31-45-20.ec2.internal> Signed-off-by: Dashiell Stander <dstander@protonmail.com> Co-authored-by: Dashiell Stander <dash.stander@gmail.com> Co-authored-by: Dashiell Stander <dashiell@ip-172-31-45-20.ec2.internal> Co-authored-by: Dashiell Stander <dstander@protonmail.com> Co-authored-by: Quentin TastyRice <quentin@ip-172-31-47-203.ec2.internal> Co-authored-by: Dashiell Stander <dashiell@ip-172-31-47-203.ec2.internal> Co-authored-by: MLRichter <matrichter@uos.de> Co-authored-by: Stella Biderman <stellabiderman@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.