Question about Gradient Accumulation fix

Hi,

First of all, thanks for your work on fixing gradient accumulation! I have a question about implementation in unsloth-zoo here. In a blog post https://unsloth.ai/blog/gradient you say that 

> This means naively averaging over each gradient accumulation step is wrong, but instead we must derive the denominator beforehand.

But checking your code implementation, I can see that you simply add up losses, but denominator is commented https://github.com/unslothai/unsloth-zoo/blob/7b0048e53a6239bdad76cad66bf2490f6a2f8a9b/unsloth_zoo/training_utils.py#L268-L270

shouldn't loss be multiplied by denominator here to match an "After - Unsloth fix" graph?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about Gradient Accumulation fix #6

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

	loss = model(input_ids = input_ids, labels = labels, n_items = n_items).loss
	# loss = loss * inverse_gradient_accumulation_steps
	accumulated_loss += loss.detach()

Question about Gradient Accumulation fix #6

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions