Hi,
Thanks for this great work!
Just one question: Is it possible to train my model on a single GPU using this library and obtain the reported optimization benefits in memory consumption/training efficiency, or this is only achievable in case of using multiple GPUs?