Skip to content

Conversation

@shjwudp
Copy link
Contributor

@shjwudp shjwudp commented Jul 28, 2022

Fix BF16_Optimizer compatibility issue with optimizer state 0-dim tensor, tensor.narrow does not support 0-dim tensor.

@tjruwase
Copy link
Contributor

@shjwudp, thanks for this PR. Can you share a bit more context on when this fails?

@shjwudp
Copy link
Contributor Author

shjwudp commented Jul 28, 2022

@shjwudp, thanks for this PR. Can you share a bit more context on when this fails?

OK, this is really an edge case, in fairseq's Adafactor implementation, the RMS is calculated by the formula tensor.norm(2) / (tensor.numel() ** 0.5) , which is a 0-dim tensor.
https://github.com/facebookresearch/fairseq/blob/main/fairseq/optim/adafactor.py#L223

@tjruwase
Copy link
Contributor

Interesting, thanks for sharing. So, to clarify, are you applying bf16_optimizer to adafactor for large model training? If so, we would be very interested in your experience. I suspect there might be issues with checkpoint load/stores, perhaps :).

@tjruwase tjruwase merged commit 57140e8 into deepspeedai:master Jul 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants