Skip to content

Conversation

@trajepl
Copy link
Owner

@trajepl trajepl commented Jun 23, 2022

No description provided.

mrwyattii and others added 20 commits June 6, 2022 16:19
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Add '-S' argument to pdsh command to return the largest error code from the ssh sessions
Co-authored-by: Quentin Anthony <qganthony@yahoo.com>
Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
…gpt-j) (#1992)

Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
* fix to catch assert error for inference test imports

* fix wrong syntax

* changed to sequential inf tests

* fix for lm_eval import

* added environment check fixture

* added expected torch and cuda version

* check various version depth for cuda/torch

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
* Retain prefetched params until last use

* Unit tests fixes
* Split parameter offload from z3

* Format fixes

* Bug fixes

* Cleanup

* Remove dead code
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
@trajepl trajepl merged commit 25e04b3 into trajepl:master Jun 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.