Skip to content

[feat][OSS] Add clip_grad_value_ function. #308

@dhkim0225

Description

@dhkim0225

🚀 Feature

There are two popular gradient clipping methods: one that limits the maximum gradient value of each model parameter and the other one that scales the gradient value based on the p-norm of a (sub-)set of model parameters.
clip_grad_norm(the second one) is useful when the norm of gradients is large, but not when only a small sub-set of model parameters have abnormal gradient values since the norm will still be reasonably small considering the number of all model parameters.

Related PR (Pytorch Lightning): Lightning-AI/pytorch-lightning#5477

How about to add the following function to optim/oss.py

def clip_grad_value(
    self,
    clip_value: Union[float, int],
    filter_params_fn: Callable[[Any], Any] = None,
) -> None:

I want to call function in PL's sharded_native_amp_plugin.py.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions