[feat][OSS] Add clip_grad_value_ function.

## 🚀 Feature


There are two popular gradient clipping methods: one that limits the maximum gradient value of each model parameter and the other one that scales the gradient value based on the p-norm of a (sub-)set of model parameters.
`clip_grad_norm`(the second one) is useful when the norm of gradients is large, but not when only a small sub-set of model parameters have abnormal gradient values since the norm will still be reasonably small considering the number of all model parameters.

Related PR (Pytorch Lightning): https://github.com/PyTorchLightning/pytorch-lightning/pull/5477


How about to add the following function to optim/oss.py

``` python
def clip_grad_value(
    self,
    clip_value: Union[float, int],
    filter_params_fn: Callable[[Any], Any] = None,
) -> None:
```

_I want to call function in PL's sharded_native_amp_plugin.py._

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat][OSS] Add clip_grad_value_ function. #308

🚀 Feature

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[feat][OSS] Add clip_grad_value_ function. #308

Description

🚀 Feature

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions