-
Notifications
You must be signed in to change notification settings - Fork 27.7k
Add a vectorize flag to torch.autograd.functional.{jacobian, hessian} #50584
Copy link
Copy link
Closed
Labels
enhancementNot as big of a feature, but technically not a bug. Should be easy to fixNot as big of a feature, but technically not a bug. Should be easy to fixmodule: autogradRelated to torch.autograd, and the autograd engine in generalRelated to torch.autograd, and the autograd engine in generalmodule: vmaptriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Metadata
Metadata
Assignees
Labels
enhancementNot as big of a feature, but technically not a bug. Should be easy to fixNot as big of a feature, but technically not a bug. Should be easy to fixmodule: autogradRelated to torch.autograd, and the autograd engine in generalRelated to torch.autograd, and the autograd engine in generalmodule: vmaptriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
🚀 Feature
Add a
vectorizeflag to torch.autograd.functional.jacobian and torch.autograd.functional.hessian (default: False). Under the hood, thevectorizeflag usesvmapas the backend to compute the jacobian and hessian, respectively, providing speedups to users.e.g.
Motivation
Jacobian computation (and by extension, hessian computation) in PyTorch today involves invoking
torch.autograd.gradonce per row of the jacobian. The following explains the procedure used by torch.autograd.functional.jacobian at a high-level:Assuming a N by N jacobian, we need to invoke the autograd engine N times. The amount of overhead here (due to tensor creation and operator overhead) can be and is significant in a number of use cases like bayesian logistic regression.
Alternatives
Instead of updating
jacobianandhessian, we can exposevmapdirectly and tell users to use (pseudocode)vmap(vjp)to compute efficient jacobians. However, this would create a "trap" in our API where a user usingautograd.functional.jacobiancannot benefit from these performance improvements.Additional context
Not all batching rules that are needed for jacobian and hessian computation are implemented: #49562. A good number of these may require writing new CUDA kernels from scratch. We'd like to offer the
vectorize=FalseAPI so that we can begin speeding up user code without having them wait until we are finished with writing a substantial number of batching rules.cc @ezyang @albanD @zou3519 @gqchen @pearu @nikitaved @soulitzer