Extend the load_checkpoint API to allow loading the checkpoint without loading the optimizer states. This is useful during evaluation and fine tuning. Need to make sure the FP32 bit model parameters are loaded along side the FP16 to avoid immediate model divergence when model is loaded without the optimizer states.