Skip to content

Question about Gradient Accumulation fix #6

@robinhad

Description

@robinhad

Hi,

First of all, thanks for your work on fixing gradient accumulation! I have a question about implementation in unsloth-zoo here. In a blog post https://unsloth.ai/blog/gradient you say that

This means naively averaging over each gradient accumulation step is wrong, but instead we must derive the denominator beforehand.

But checking your code implementation, I can see that you simply add up losses, but denominator is commented

loss = model(input_ids = input_ids, labels = labels, n_items = n_items).loss
# loss = loss * inverse_gradient_accumulation_steps
accumulated_loss += loss.detach()

shouldn't loss be multiplied by denominator here to match an "After - Unsloth fix" graph?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions