ZeRO-Offload (squash) #381

jeffra · 2020-09-09T17:13:13Z

No description provided.

* update DSE to point to ZeRO-Offload staging * ZeRO-2 enable CPU offload (#313) * cpu-offload * update * deleted: deepspeed/pt/deepspeed_zero_optimizer_cpuoffload.py modified: deepspeed/pt/fp16_unfused_optimizer.py new file: install_output.txt modified: tests/unit/test_dynamic_loss_scale.py * modified: deepspeed/pt/deepspeed_zero_optimizer.py * update * modified: deepspeed/pt/deepspeed_cpu_adam.py modified: deepspeed/pt/deepspeed_zero_optimizer.py modified: tests/unit/test_checkpointing.py modified: tests/unit/test_fp16.py * deleted: install_output.txt * modified: deepspeed/pt/fp16_unfused_optimizer.py modified: tests/unit/test_dynamic_loss_scale.py * modified: deepspeed/pt/deepspeed_cpu_adam.py * modified: deepspeed/pt/deepspeed_zero_optimizer.py * modified: deepspeed/pt/deepspeed_cpu_adam.py modified: deepspeed/pt/deepspeed_zero_optimizer.py * deleted: deepspeed_cpu_adam.py modified: deepspeed_light.py modified: deepspeed_zero_optimizer.py ../../deepspeed_zero_optimizer_cpu_offload.py * modified: deepspeed/pt/deepspeed_light.py * modified: deepspeed/pt/deepspeed_light.py modified: deepspeed/pt/deepspeed_zero_optimizer.py modified: deepspeed/pt/deepspeed_zero_utils.py modified: tests/unit/test_fp16.py * modified: deepspeed/pt/deepspeed_config.py modified: deepspeed/pt/deepspeed_light.py modified: deepspeed/pt/deepspeed_zero_optimizer.py modified: tests/unit/test_checkpointing.py modified: tests/unit/test_fp16.py * modified: deepspeed/pt/deepspeed_checkpointing.py * update DSE to ZeRO-Offload commit Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * Enable ZeRO checkpointing for ZeRO-Offload (#337) * Enable ZeRO checkpointing for ZeRO-Offload Fix unit tests Bump DSE to 33b9fb77c8cecdb49118188890f662526d8e9397 * Fix accidental revert * Add ZeRO-Offload checkpointing model tests (#344) * Enable ZeRO checkpointing for ZeRO-Offload Fix unit tests Bump DSE to 33b9fb77c8cecdb49118188890f662526d8e9397 * Fix accidental revert * Fix ZeRO-Offload checkpointing bug when change gpu count Add checkpointing model tests for ZeRO-Offload Remove optimizer key from Megatron model tests Use different deepspeed master port for Megatron model tests Co-authored-by: Jie <37380896+jren73@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

* adding link to Sparse Attention in Navigation page

* Update test_sparse_attention.py * jren changes * Merge with correctness/perf fixes * Formatting fixes Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* add cpu adam optimizer * run precommit * clean adam_test * add accuracy test for adam

…e steps

* fixing gradient accumulation for zero offload * Bug fixes. ZeRO Stage 1,2 and Offload all produce the same loss with gradient accumulation step of 2

* use relative imports and add support for conditional op imports * formatting and llvm command check change * fix remaining absolute import * hide the isntalled ops var * fix unit tests Co-authored-by: Reza Yazdani <reyazda@microsoft.com>

… into staging-zero-dual-v2

…PU (#360) * Allocating CPU memory directly on CPU without transfering them from GPU * formatting fixes

Co-authored-by: Reza Yazdani <reyazda@microsoft.com>

* Improve test for ZeRO supported optimizers * Rename test function * Format fixes * Add model tests that wraps client FusedAdam with fused fp16 optimizer * Format fixes

* fixing the cpu_adam API and add deepspeed_adam flag in config.py * run precommit

* cpu_offload enables overlap_comm and contiguous_gradients Remove non-portable tensor.mul_() * Model functionality tests now passing * Move to perf tests folder

…367) * fixing adam copy fp16-param-add more compile flags for cpu_adam * run precommit * fix variance indexes * fix array-sizes * move adam_test * rename perf test

…n the refactor (#364)

* Various correctness fixes * Format fixes

* adding BingSqaud e2e test * updating the draft test; bring final step under try section * finalizinf test for base deepspeed and deepspeed with ZeRO * applying the comment (thanks Jeff); fixed formatting * update Sparse Attention Tutorial * fixed few issues and applied comments for better organization and readability * updated sparse attention tutorial with making how to use section incremental; applying more comments Co-authored-by: arashashari <arashashari@ArashMSLaptop.redmond.corp.microsoft.com>

* fixing corner cases * revert to the previous perf for adam * adam high performance * run precommit

* Add ZeRO-Offload model tests Restrict optimizer update+copy to DeepSpeedCPUAdam * Format fixes * Increate bucket size scaler

* fixing the compilation error for AVX2 architecture * running precommit * adding cpufeature to requirements * Update install.sh * Update install.sh * include cpu-adam in the features * update features * update features Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* add DS_BUILD_AVX512 flag and update the feature part accordingly * run precommit

* ZeRO-Offload v1 (squash) (#345) * update DSE to point to ZeRO-Offload staging * ZeRO-2 enable CPU offload (#313) * cpu-offload * update * deleted: deepspeed/pt/deepspeed_zero_optimizer_cpuoffload.py modified: deepspeed/pt/fp16_unfused_optimizer.py new file: install_output.txt modified: tests/unit/test_dynamic_loss_scale.py * modified: deepspeed/pt/deepspeed_zero_optimizer.py * update * modified: deepspeed/pt/deepspeed_cpu_adam.py modified: deepspeed/pt/deepspeed_zero_optimizer.py modified: tests/unit/test_checkpointing.py modified: tests/unit/test_fp16.py * deleted: install_output.txt * modified: deepspeed/pt/fp16_unfused_optimizer.py modified: tests/unit/test_dynamic_loss_scale.py * modified: deepspeed/pt/deepspeed_cpu_adam.py * modified: deepspeed/pt/deepspeed_zero_optimizer.py * modified: deepspeed/pt/deepspeed_cpu_adam.py modified: deepspeed/pt/deepspeed_zero_optimizer.py * deleted: deepspeed_cpu_adam.py modified: deepspeed_light.py modified: deepspeed_zero_optimizer.py ../../deepspeed_zero_optimizer_cpu_offload.py * modified: deepspeed/pt/deepspeed_light.py * modified: deepspeed/pt/deepspeed_light.py modified: deepspeed/pt/deepspeed_zero_optimizer.py modified: deepspeed/pt/deepspeed_zero_utils.py modified: tests/unit/test_fp16.py * modified: deepspeed/pt/deepspeed_config.py modified: deepspeed/pt/deepspeed_light.py modified: deepspeed/pt/deepspeed_zero_optimizer.py modified: tests/unit/test_checkpointing.py modified: tests/unit/test_fp16.py * modified: deepspeed/pt/deepspeed_checkpointing.py * update DSE to ZeRO-Offload commit Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * Enable ZeRO checkpointing for ZeRO-Offload (#337) * Enable ZeRO checkpointing for ZeRO-Offload Fix unit tests Bump DSE to 33b9fb77c8cecdb49118188890f662526d8e9397 * Fix accidental revert * Add ZeRO-Offload checkpointing model tests (#344) * Enable ZeRO checkpointing for ZeRO-Offload Fix unit tests Bump DSE to 33b9fb77c8cecdb49118188890f662526d8e9397 * Fix accidental revert * Fix ZeRO-Offload checkpointing bug when change gpu count Add checkpointing model tests for ZeRO-Offload Remove optimizer key from Megatron model tests Use different deepspeed master port for Megatron model tests Co-authored-by: Jie <37380896+jren73@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> * update DSE to staging for zero-dual * Update test_sparse_attention.py * Assert ZeRO-Offload+gradient accumulation (#347) * Adding link to Sparse Attention in Navigation page (#355) * adding link to Sparse Attention in Navigation page * Correctness and perf fixes (#354) * Update test_sparse_attention.py * jren changes * Merge with correctness/perf fixes * Formatting fixes Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * add cpu adam optimizer (#356) * add cpu adam optimizer * run precommit * clean adam_test * add accuracy test for adam * make the adam unit test work with random params and grads and for more steps * Samyamr/zero offload correctness (#359) * fixing gradient accumulation for zero offload * Bug fixes. ZeRO Stage 1,2 and Offload all produce the same loss with gradient accumulation step of 2 * Import path fixes + conditional imports (#358) * use relative imports and add support for conditional op imports * formatting and llvm command check change * fix remaining absolute import * hide the isntalled ops var * fix unit tests Co-authored-by: Reza Yazdani <reyazda@microsoft.com> * Enable contiguous gradients for cpu_offload * Allocating CPU memory directly on CPU without transfering them from GPU (#360) * Allocating CPU memory directly on CPU without transfering them from GPU * formatting fixes * change gpt2 pretrain to have DeepSpeed adam (#361) Co-authored-by: Reza Yazdani <reyazda@microsoft.com> * Jekyll installation instructions (#351) * Generalize detection of ZeRO supported optimizers (#349) * Improve test for ZeRO supported optimizers * Rename test function * Format fixes * Add model tests that wraps client FusedAdam with fused fp16 optimizer * Format fixes * everything is working * fixing the cpu_adam API and add deepspeed_adam flag in config.py (#365) * fixing the cpu_adam API and add deepspeed_adam flag in config.py * run precommit * fixing adam copy fp16-param-add more compile flags for cpu_adam * run precommit * fix variance indexes * fix array-sizes * ZeRO-Offload passing model functionality tests (#366) * cpu_offload enables overlap_comm and contiguous_gradients Remove non-portable tensor.mul_() * Model functionality tests now passing * Move to perf tests folder * move adam_test * rename perf test * fixing adam copy fp16-param and add more compile flags for cpu_adam (#367) * fixing adam copy fp16-param-add more compile flags for cpu_adam * run precommit * fix variance indexes * fix array-sizes * move adam_test * rename perf test * Perf tests * BumpDSE * fixed a typo; this was fixed before but seems like it has been lost in the refactor (#364) * Move code quality tests to Azure-hosted agents. (#368) * add casting kernel * run precommit * revert changes * revert changes * ZeRO-Offload: Integration code fixes (#370) * Various correctness fixes * Format fixes * Update installation instructions (#362) * Update Sparse Attention Tutorial (#357) * adding BingSqaud e2e test * updating the draft test; bring final step under try section * finalizinf test for base deepspeed and deepspeed with ZeRO * applying the comment (thanks Jeff); fixed formatting * update Sparse Attention Tutorial * fixed few issues and applied comments for better organization and readability * updated sparse attention tutorial with making how to use section incremental; applying more comments Co-authored-by: arashashari <arashashari@ArashMSLaptop.redmond.corp.microsoft.com> * fixing corner cases (#371) * fix adam perormance (#372) * fixing corner cases * revert to the previous perf for adam * adam high performance * run precommit * ZeRO-Offload passing model tests (#374) * Add ZeRO-Offload model tests Restrict optimizer update+copy to DeepSpeedCPUAdam * Format fixes * Increate bucket size scaler * fix cpu adam compilation for AVX2 (#378) * fixing the compilation error for AVX2 architecture * running precommit * adding cpufeature to requirements * Update install.sh * Update install.sh * include cpu-adam in the features * update features * update features Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * Move code quality tests to Azure-hosted agents. (#368) * Bump DSE * adding sparse attention to feature index page (#377) * support avx2 by default (#383) * add DS_BUILD_AVX512 flag and update the feature part accordingly * run precommit Co-authored-by: Jie <37380896+jren73@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Arash Ashari <arashari@microsoft.com> Co-authored-by: RezaYazdaniAminabadi <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: Reza Yazdani <reyazda@microsoft.com> Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: arashashari <arashashari@ArashMSLaptop.redmond.corp.microsoft.com>

* ZeRO-Offload (squash) (#381) Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Reza Yazdani <reyazda@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Jie <37380896+jren73@users.noreply.github.com> Co-authored-by: Arash Ashari <arashari@microsoft.com> Co-authored-by: Reza Yazdani <reyazda@microsoft.com> Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: arashashari <arashashari@ArashMSLaptop.redmond.corp.microsoft.com> Co-authored-by: RezaYazdaniAminabadi <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: Reza Yazdani <reyazda@microsoft.com> Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>

* Add residual_add triton op * add support of gptj style models to triton residual_add kernel * fix the residual_add tests * Add support of end to end run for residal_add triton kernels * Fix the MLP output tensor's shape * Fix the output tensor of residual_add_func python call * triton matmul kernels with python wrapper class added with pytests * clean-up and make it read autotune table when importing * fixed import problems with the naming * enable update_autotune_table for every forward in matmul * a int4 into int8 weight packing function added test parameters with alignment only (i.e. integer multiple of block_size in matmul kernel), this will be further investigated * lint * quantization added int8-packed-int4-fp16 matmul-block-deq added illegal cuda mem access bug in triton matmul kernel fixed (i.e. a mem boundary problem) * add torch block qunatization * dual quantization matmul added * cleanup, fix for lint * documentation lint fix * README added * typo * updated the kernel to have fused bias additioin and activation too * Add residual_add triton op * modified quantization to take additional bits, more than int8 * enable triton residual_add kernel in DS MLP * Add flash attention kernel and glue code * additional scale-norm added for weight * a temporary example for quantization added * comments * use the exact same ds quantizer as reference * added scale-norm (i.e. scale-of-scale) to both triton/torch version * snr check with fugsed-deq-gemm for block_deq and dual_block_deq * makes matmul kernels work for a6000 with smaller mem w8a8/w4a8 with sym block quantization on activation and row(or col)-wise quatnziation on weight works (snr test added) * Add layer norm triton kernel * Add gelu triton kernel * Add softmax triton kernel * Rename flash attn api * add triton gemm kernels * fix formatting of triton kernels * Add matmul triton kernels * Updated Triton Gelu to use non-approx computation * Updated Triton Gemm for f16 bias-add parity * Add DS triton encoder layer * Updated Softmax to work around block size 1 * fix the issue caused by merge conflict * Add trition layer norm unittests * dual-qblock snr verified too * Add triton gelu kernel unittests * Add triton softmax kernel unittests * fix flash kernels formatting (#382) * Add triton dependency to unittests workflow (#381) * w8a8 and w8a4 matmul with block quantization verified * Allow Gemm & MatMul to take arbitrary dimensions * Add triton matmul kernel unittests * fix triton dependency in github CI workflows * Fix matmul launching grid * fix formatting * Add triton gemm kernel unittests * modified dual-qblock to support wider scale_bits with int64 acc and vec-ops, which caused perf degradation workaround is to use "v2" kernel added with internal shift ops but not enabled yet * fix residual in gemm_3d kernel * Add flash attention trition kernels unit tests * test_matmul and test_gemm pass (but with smaller coverage as mentioned in the code) float32 can be supported later * added 'triton_gemm_eval.py' it is temporary script to evaluate accuracy of the triton matmul against the torch matmul * typo * typo * root-caused the parity error with fused_gelu. it is not with gelu but with residual-addition. disabled residual-addition and it still needs debugging * location of residual addition in reference modified to be after the activation * fixed index typo in the snr plot * Fix trition attention kernel unit tests * fix formatting * added batch support in matmul row/col-wise quantization matmul debugged * fixed bugs in the unit tests after the batch support change and so on test_int8_int8_fp_matmul_dual_block_deq still fails and need further debugging though * weight-only quantizatioin example and test are added to check_snr * matmul_ext basic check added as unit test under tests/unit * move triton ops under inference/triton * restore triton_ops.py * import path correction * restore ds_mlp and ds_attention * shaping bug with batching in matmul_ext fixed changed the gelu computation to use libdevice.erf instead of approx with sigmoid (otherwise, roberta unit test fails) * triton ops added with an option in config to use it with op_binding and config option * Triton transformer added: InferenceTransformerFactory, TritonTransformer, TritonSelfAttention, TritonMLP and so forth * Triton wrapper classes added * added simple triton eval scripts * rename the new benchmark script for triton-bert * added triton attention, triton layer-norm/softmax * adds tests to measure attention perf in triton and others * changed triton flash attn function name * attention set to use triton non-flash by default * enable triton for bert * made udpate_autotable be false by default because it degrade the perf * temp commit with debugging/profiling codes * temporary debugging/profiling code lines added, need to be cleaned up later * clean-up * unit tests for triton inference ops are now passing * removed unnecessary triton kernels * test_inference passes * removed debugging/profiling codes * triton==2.0.0.dev20221202 * clean-up for formating check pass added layer_norm test without residual-add * set triton version requirement * further clean-up * removed redundant files * readme for triton matmul * clean-up and add more test for triton-matmul * typo * removed another obsolete triton kernels and tests * removed unnecessary TransformerInferenceFactory class * removed obsolete test * formatting check, cleanup * formatting fix: added copyright to the head * formatting: missing lticense added * add pytest skip condition to test_matmul_ext * formatting fix * formatting * added --forked option to inference_ops unit pytests * Revert "added --forked option to inference_ops unit pytests" This reverts commit 743b86d354b041172b06e4a8505f43ddd4c2544a. * changed the pytest mark for softmax to be inference_ops * formatting fix * cleanup comments * add missing import * keep only fp16 matmuls because it's out of this PR's scope int8-based gemm kernels will be added later * removed the previous matmul_ext test * triton quantization kernel removed too * clean up comments * added comments for license * triton matmul always read the autotune table when imported and write the final table when closing * modfied triton kernels to have a new transposed_model arg * added license note to files * set default mlp kernel to be cuda as it's better than triton kernel with bert * adds changes missed from the prev commit * added license notes increased DEEPSPEED_TEST_TIMEOUT from 600 to 900 for triton compilation * added unit test for triton attention * moved tests in layer_norm.py to test_layer_norm.py * removed commented code lines * removed triton from the main requirement as commented in PR * follow PascalCase convention in class naming as suggested from pr review * changes to make deepspeed work without triton specifically, resolves error with importing any triton ops added code lines that check the availabilty of triton and skip the tests if it's not * added a feature to run triton autotune at initialization, i.e., at op-building phase * fix for the lint/formatting added " # noqa: F401" * move triton-bert-benchmark.py to microsoft/DeepSpeedExamples * modify the code as suggested from PR * make DEEPSPEED_TEST_TIMEOUT in unit test back to 600s * made an optioni to skip triton-autotune in config * lint fix for formatting * removed repeated has_triton when importing triton also the change for pr comment * removed duplicated triton_autotune arg passing * upgrade to triton 2.0 pydantic.validator for use_triton * move triton specific op mapping into model_implementation as commented from PR * removed commented lines * need to cite where the file came from, as commented from the PR review * change for the recent merge with the master * qkv-gemm change to make distilbert work after the merge with the master * format fix * fix triton attention for qkv passing for non-pre-norm requirements all use triton2.0.0 * skip autotune in test_matmul and test_attention with triton * formatting with pre-commit * add config for v100 test in matmul_4d kernel (small shared mem requirement) * inject tritn kernels only in bert and let it inform it through log_dist set triton to be the latest from requirements * reduced the config and added mem check for matmul_4d * added README.md tutorial page for triton-deepspeed * typi in README * refine README * refine readme * refine readme * refine readme * "Fix apex install bugs #3741" --------- Co-authored-by: Arash Bakhtiari <arash@bakhtiari.org> Co-authored-by: Stephen Youn <styoun@microsoft.com> Co-authored-by: Cheng Li <pistasable@gmail.com> Co-authored-by: Ethan Doe <yidoe@microsoft.com> Co-authored-by: yidoe <68296935+yidoe@users.noreply.github.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

jeffra and others added 30 commits September 2, 2020 10:59

update DSE to staging for zero-dual

75d70a9

Update test_sparse_attention.py

1ebcd6c

Assert ZeRO-Offload+gradient accumulation (#347)

0159ebb

Adding link to Sparse Attention in Navigation page (#355)

6deac82

* adding link to Sparse Attention in Navigation page

Correctness and perf fixes (#354)

6604a5d

* Update test_sparse_attention.py * jren changes * Merge with correctness/perf fixes * Formatting fixes Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

add cpu adam optimizer (#356)

20c414d

* add cpu adam optimizer * run precommit * clean adam_test * add accuracy test for adam

make the adam unit test work with random params and grads and for mor…

504a643

…e steps

Samyamr/zero offload correctness (#359)

af51211

* fixing gradient accumulation for zero offload * Bug fixes. ZeRO Stage 1,2 and Offload all produce the same loss with gradient accumulation step of 2

Import path fixes + conditional imports (#358)

130dd70

* use relative imports and add support for conditional op imports * formatting and llvm command check change * fix remaining absolute import * hide the isntalled ops var * fix unit tests Co-authored-by: Reza Yazdani <reyazda@microsoft.com>

Enable contiguous gradients for cpu_offload

ea5b991

Merge branch 'staging-zero-dual-v2' of github.com:microsoft/DeepSpeed…

077cfd4

… into staging-zero-dual-v2

Allocating CPU memory directly on CPU without transfering them from G…

7be128a

…PU (#360) * Allocating CPU memory directly on CPU without transfering them from GPU * formatting fixes

change gpt2 pretrain to have DeepSpeed adam (#361)

1a4a82b

Co-authored-by: Reza Yazdani <reyazda@microsoft.com>

Jekyll installation instructions (#351)

ac12833

Generalize detection of ZeRO supported optimizers (#349)

253b044

* Improve test for ZeRO supported optimizers * Rename test function * Format fixes * Add model tests that wraps client FusedAdam with fused fp16 optimizer * Format fixes

everything is working

9ba232a

fixing the cpu_adam API and add deepspeed_adam flag in config.py (#365)

606543d

* fixing the cpu_adam API and add deepspeed_adam flag in config.py * run precommit

fixing adam copy fp16-param-add more compile flags for cpu_adam

1d4b41f

run precommit

59ffc1a

fix variance indexes

ad8af38

fix array-sizes

2d17a6d

ZeRO-Offload passing model functionality tests (#366)

aa3f289

* cpu_offload enables overlap_comm and contiguous_gradients Remove non-portable tensor.mul_() * Model functionality tests now passing * Move to perf tests folder

move adam_test

3788724

rename perf test

36d5fde

fixing adam copy fp16-param and add more compile flags for cpu_adam (#…

d8ff56c

…367) * fixing adam copy fp16-param-add more compile flags for cpu_adam * run precommit * fix variance indexes * fix array-sizes * move adam_test * rename perf test

Perf tests

f0c34d0

BumpDSE

942ec90

fixed a typo; this was fixed before but seems like it has been lost i…

a64b0ab

…n the refactor (#364)

Move code quality tests to Azure-hosted agents. (#368)

4d4eafb

tjruwase and others added 10 commits September 5, 2020 22:46

ZeRO-Offload: Integration code fixes (#370)

19aac8a

* Various correctness fixes * Format fixes

Update installation instructions (#362)

9e83ef2

fixing corner cases (#371)

bae8131

fix adam perormance (#372)

75e9e32

* fixing corner cases * revert to the previous perf for adam * adam high performance * run precommit

ZeRO-Offload passing model tests (#374)

485a365

* Add ZeRO-Offload model tests Restrict optimizer update+copy to DeepSpeedCPUAdam * Format fixes * Increate bucket size scaler

Move code quality tests to Azure-hosted agents. (#368)

135dd08

Bump DSE

2df751c

adding sparse attention to feature index page (#377)

b73894d

jeffra requested review from RezaYazdaniAminabadi, ShadenSmith, arashashari, awan-10, cli99, conglongli, eltonzheng, minjiaz, niumanar, samyam and tjruwase as code owners September 9, 2020 17:13

jeffra and others added 2 commits September 9, 2020 10:22

Merge branch 'master' into staging-zero-dual-v4

d098ad8

support avx2 by default (#383)

9a80f4a

* add DS_BUILD_AVX512 flag and update the feature part accordingly * run precommit

jeffra changed the base branch from master to staging-zero-dual-v5 September 9, 2020 21:44

jeffra changed the title ~~ZeRO-Offload~~ ZeRO-Offload (squash) Sep 9, 2020

jeffra merged this pull request into staging-zero-dual-v5 Sep 9, 2020

jeffra deleted the staging-zero-dual-v4 branch November 19, 2020 23:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ZeRO-Offload (squash) #381

ZeRO-Offload (squash) #381

Uh oh!

jeffra commented Sep 9, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ZeRO-Offload (squash) #381

ZeRO-Offload (squash) #381

Uh oh!

Conversation

jeffra commented Sep 9, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants