Often user code will have a user-defined batch size and the DeepSpeed config json will have it's own batch size. When using gradient accumulation this can cause bugs where DeepSpeed thinks grad accumulation steps should be different than what user code is doing.
If the user is using the default collate_fn then DeepSpeed should be able to detect and throw an exception in these cases. We can check to see what batch size is being passed in the forward pass by inspecting the first dimension.
Lastly, we probably want to add a error suppression flag in the DeepSpeed config to allow users to turn off this error if they know what they are doing and their batch alignment is non-standard.