Add a `vectorize` flag to torch.autograd.functional.{jacobian, hessian}

## 🚀 Feature

Add a `vectorize` flag to torch.autograd.functional.jacobian and torch.autograd.functional.hessian (default: False). Under the hood, the `vectorize` flag uses `vmap` as the backend to compute the jacobian and hessian, respectively, providing speedups to users.

e.g.

```py
import torch
x = torch.randn(5, requires_grad=True)
f = lambda x: x ** 2
expected = torch.autograd.functional.jacobian(f, x)
jac = torch.autograd.functional.jacobian(f, x, vectorize=True)
assert torch.allclose(jac, expected)
```

## Motivation

Jacobian computation (and by extension, hessian computation) in PyTorch today involves invoking `torch.autograd.grad` once per row of the jacobian. The following explains the procedure used by [torch.autograd.functional.jacobian](https://github.com/pytorch/pytorch/blob/00d432a1ed179eff52a9d86a0630f623bf20a37a/torch/autograd/functional.py#L432-L457) at a high-level:
- Our autograd engine computes vector-jacobian products without fully materializing a jacobian.
- To compute the first row of the jacobian, we use a vector-jacobian-product with (1, 0, 0, ...) as the vector.
- To compute the ith row of the jacobian, we use the ith unit vector.
- Finally, when we have all of the rows of the jacobian, we stack all of them together

Assuming a N by N jacobian, we need to invoke the autograd engine N times. The amount of overhead here (due to tensor creation and operator overhead) can be and is significant in a number of use cases like bayesian logistic regression.

## Alternatives

Instead of updating `jacobian` and `hessian`, we can expose `vmap` directly and tell users to use (pseudocode) `vmap(vjp)` to compute efficient jacobians. However, this would create a "trap" in our API where a user using `autograd.functional.jacobian` cannot benefit from these performance improvements.

## Additional context

Not all batching rules that are needed for jacobian and hessian computation are implemented: https://github.com/pytorch/pytorch/issues/49562. A good number of these may require writing new CUDA kernels from scratch. We'd like to offer the `vectorize=False` API so that we can begin speeding up user code without having them wait until we are finished with writing a substantial number of batching rules.


cc @ezyang @albanD @zou3519 @gqchen @pearu @nikitaved @soulitzer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a `vectorize` flag to torch.autograd.functional.{jacobian, hessian} #50584

🚀 Feature

Motivation

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add a vectorize flag to torch.autograd.functional.{jacobian, hessian} #50584

Description

🚀 Feature

Motivation

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Add a `vectorize` flag to torch.autograd.functional.{jacobian, hessian} #50584