Implement Reparameterized version of Gamma by fritzo · Pull Request #26 · probtorch/pytorch

fritzo · 2017-11-26T16:53:58Z

Fixes #25

DO NOT MERGE. This PR will be moved to the pytorch org after pytorch#3841 is merged.

This implements reparameterized gradient for distributions.Gamma. The gradient is implemented by directly approximating the reparameterized gradient function dx/dalpha following Knowles (2015). The approximation is accurate to within 1% relative error for a wide range of alphas.

Derivation

Note that if x ~ Gamma(alpha, beta) then x / beta ~ Gamma(alpha, 1). Since division is already implemented in PyTorch, we can thus reduce our problem to computing a reparameterized gradient of a standard gamma x ~ Gamma(alpha) = Gamma(alpha, 1) wrt alpha.

This PR implements a function standard_gamma_grad(x, alpha) that directly approximates the reparameterized gradient defined (for any continuous univariate distribution) as

                d/dalpha cdf(x; alpha)     d/dalpha cdf(x; alpha)
dx / dalpha = - ---------------------- = - ----------------------
                  d/dx cdf(x; alpha)           pdf(x; alpha)

This definition is used in the unit tests in tests/test_distributions.py, which compute d/dalpha cdf(x;alpha) via finite difference of the scipy.stats.gamma.cdf() function.

The approximation is split into three regions:

For x < 0.001 we differentiate the power-law approximation from Knowles (2015)
```
cdf(x; alpha)  \approx  x**alpha / (alpha * Gamma(alpha))
standard_gamma_grad(x, alpha) = -x/alpha * (log(x) - 1/alpha - digamma(alpha))
```
Until digamma() is implemented in PyTorch, we use a finite difference of lgamma().

For alpha > 30 we use the approximation

standard_gamma_grad(x, alpha) = sqrt(x/alpha)

For intermediate x,alpha we use a rational function approximation
```
standard_gamma_grad(x, alpha) = exp(PQ(log(x / alpha), log(alpha)))
```
where PQ(u,v) is a rational function of order 2 in u and 3 in v. This was trained using least squares minimizing squared relative error on ~20000 samples drawn from
```
alpha ~ log_uniform(1e-5, 1e2)
x ~ Gamma(alpha)
```

For complete derivation, see this Jupyter Notebook.

fritzo · 2017-11-26T17:50:20Z

@apaszke Any advice on autograd plumbing? (I'll send this PR to pytorch/pytorch after pytorch#3841 is merged)

apaszke

This mostly looks good, but needs a few tweaks

apaszke · 2017-11-26T18:21:34Z

        return -((value - self.mean) ** 2) / (2 * var) - log_std - math.log(math.sqrt(2 * math.pi))


+class _StandardGamma(Function):


This is an old-style autograd function. You should write it as shown in the docs.

I wasn't able to store the gradient via ctx.save_for_backward(grad). Is there a new-style way to save an intermediate that is neither an input nor an output?

You can't do that because that can only be done with inputs/outputs. Just do ctx.grad = grad.

apaszke · 2017-11-26T18:22:21Z


+// This is identical to THRandom_standard_gamma but also stores the
+// reparameterized gradient wrt alpha in grad_alpha.
+double THRandom_standard_gamma_with_grad(THGenerator *_generator, double alpha,


You need to compute the grad during forward, because you'd need to replay all the control flow here otherwise, right?

Correct. This is a complicated gradient due to the rejection sampler. To reproduce the computation, would need to store about 10x more state.

apaszke · 2017-11-26T18:22:53Z

          default: THPDefaultGenerator->cdata
          kwarg_only: True
        - THTensor* alpha
+    - cname: standard_gamma_alpha_with_grad


It's bette to expose this as an internal method instead of a new overload (think torch._C._standard_gamma_alpha_with_grad)

Could you point me to an example internal method whose plumbing I can copy? I'm a little lost here.

apaszke · 2017-11-26T18:23:45Z

+    def standard_gamma(self, grad=None):
+        if grad is None:
+            return Variable(torch.standard_gamma(self.data))
+        return Variable(torch.standard_gamma(self.data, grad), requires_grad=self.requires_grad)


You should just use the Function subclass you implemented here.

Thanks for the help! Should I embed that subclass in torch.autograd.variable, or should torch.distributions be the canonical interface for standard_gamma() for Variables?

fritzo · 2017-11-26T21:58:28Z

@alicanb Would you be up for reviewing the statistical aspects of this PR?

alicanb · 2017-11-26T22:04:10Z

Sure

…

On Sun, Nov 26, 2017, 4:58 PM Fritz Obermeyer ***@***.***> wrote: @alicanb <https://github.com/alicanb> Would you be up for reviewing the statistical aspects of this PR? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#26 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABCw1rqKMA7YYVEcBOXqmskJBjyYc83yks5s6d8EgaJpZM4Qq1wL> .

fritzo · 2017-11-26T22:06:45Z

+      const double accept_grad = d_grad * e + d * e_grad;
+      const double dv = d * v;
+      const double dv_grad = d_grad * v + d * v_grad      // Pathwise part.
+                           + (dv - alpha) * accept_grad;  // Acceptance part.


This is the trickiest part. This roughly follows Naesseth et al. (2017) but computes an exact reparameterized gradient rather than approximating (i.e. if you drew identical samples from an inverse CDF sampler Knowles (2015) and this cheaper sampler, the gradients would be identical). Naesseth et al. seem to be missing the analytical baseline alpha in (dv - alpha) which is easy to compute. Note that accept = log(acceptance ratio), so we're using the log trick in multiplying by accept_grad (and avoiding an expensive exp()).

apaszke · 2017-11-28T21:15:06Z

+  real*const alpha_data = THTensor_(data)(alpha);
+  real*const saved_u_data = THTensor_(data)(saved_u);
+  real*const saved_x_data = THTensor_(data)(saved_x);
+  for(int64_t i = 0, numel = THTensor_(nElement)(alpha); i < numel; ++i) {


Any reason why you don't use OpenMP here?

It's difficult to parallelize because THGenerator *gen is stateful. However this _fwd() step is much cheaper than the _bwd() step.

apaszke · 2017-11-28T21:16:11Z

    @staticmethod
    def backward(ctx, grad_output):
-        return grad_output * Variable(ctx.saved_grad)
+        alpha, = ctx.saved_variables


This backward isn't really differentiable twice. Can you mark it @once_differentiable and use ctx.saved_tensors? grad_output will become a tensor too.

apaszke · 2017-11-28T21:17:34Z

+        - arg: THTensor* output
+          output: True
+        - THTensor* alpha
+        - THTensor* saved_u


It would be nice if we could skip these outputs if we knew an op won't be differentiated. But it's fine as is, and we can fix that later.

Already done :-) standard_gamma() is a simplified version of standard_gamma_fwd() where the unused outputs are discarded.

alicanb · 2017-11-28T23:30:44Z

Do you know the difference between torch.set_rng_state and torch.manual_seed? I tried torch.manual_seed while testing normal but couldn't get it working so I used set_rng_state

apaszke · 2017-11-29T00:11:34Z

They are pretty much equivalent.

torch.manual_seed(2)
s = torch.get_rng_state()
print(torch.randn(1))
# Same
torch.manual_seed(2)
print(torch.randn(1)) 
# Same
torch.set_rng_state(s)
print(torch.randn(1))

The benefit of manual_seed is that it also seeds the GPU, while set_rng_state only sets it for the CPU generator.

alicanb · 2017-11-29T00:45:44Z

@@ -163,6 +166,48 @@ def test_gamma_sample(self):
                                        scipy.stats.gamma(alpha, scale=1 / beta),


1 / beta throws TypeError on python3. 1->1.0 fixes it

alicanb · 2017-11-29T00:46:06Z

+            alphas = Variable(torch.Tensor([alpha]), requires_grad=True)
+            betas = Variable(torch.Tensor([beta]), requires_grad=True)
+            self._check_sampler_sampler(Gamma(alphas, betas),
+                                        scipy.stats.gamma(alpha, scale=1 / beta),


fritzo · 2017-11-29T20:13:07Z

Ok, I've changed algorithms to now directly approximate the reparameterized gradient. This achieves simpler code, cheaper computation, and more accurate gradients (they are no longer stochastic for alpha < 1).

fritzo · 2017-11-29T22:27:02Z

@tbrx @jwvdm I've added a Jupyter notebook and some explanation in the PR description. Let me know if I can answer any other questions. Thanks for offering to review!

jwvdm · 2017-11-29T22:55:11Z

Thanks @fritzo – I'll plan on taking some time to review on Fri.

fritzo · 2017-12-02T01:01:23Z

Moving to pytorch#3978

fritzo added the WIP label Nov 26, 2017

apaszke reviewed Nov 26, 2017

View reviewed changes

fritzo added awaiting review and removed WIP labels Nov 26, 2017

fritzo commented Nov 26, 2017

View reviewed changes

apaszke reviewed Nov 28, 2017

View reviewed changes

apaszke approved these changes Nov 28, 2017

View reviewed changes

alicanb reviewed Nov 29, 2017

View reviewed changes

fritzo force-pushed the gamma-reparameterized branch 2 times, most recently from 57a96b1 to 69df177 Compare November 29, 2017 20:06

jwvdm self-requested a review November 29, 2017 22:55

This was referenced Nov 30, 2017

Implement Dirichlet Distribution #3

Closed

github error #27

Closed

Implement Dirichlet and Beta distributions #28

Closed

Implement torch.standard_gamma and distributions.Gamma pytorch/pytorch#3841

Merged

Implement reparameterized gradient for random gamma sampler

9c7694f

fritzo force-pushed the gamma-reparameterized branch from f74bfac to 9c7694f Compare December 2, 2017 00:48

fritzo changed the base branch from random-gamma to upstream December 2, 2017 00:52

Revert irrelvant changes

6021328

fritzo force-pushed the gamma-reparameterized branch from d6c1d30 to 6021328 Compare December 2, 2017 00:55

fritzo closed this Dec 2, 2017

fritzo removed the awaiting review label Dec 2, 2017

		return -((value - self.mean) ** 2) / (2 * var) - log_std - math.log(math.sqrt(2 * math.pi))


		class _StandardGamma(Function):

		@@ -163,6 +166,48 @@ def test_gamma_sample(self):
		scipy.stats.gamma(alpha, scale=1 / beta),

Conversation

fritzo commented Nov 26, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Derivation

Uh oh!

fritzo commented Nov 26, 2017

Uh oh!

apaszke left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fritzo commented Nov 26, 2017

Uh oh!

alicanb commented Nov 26, 2017 via email

Uh oh!

fritzo Nov 26, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alicanb commented Nov 28, 2017

Uh oh!

apaszke commented Nov 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alicanb Nov 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fritzo commented Nov 29, 2017

Uh oh!

fritzo commented Nov 29, 2017

Uh oh!

jwvdm commented Nov 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fritzo commented Dec 2, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fritzo commented Nov 26, 2017 •

edited

Loading

fritzo Nov 26, 2017 •

edited

Loading

apaszke commented Nov 29, 2017 •

edited

Loading

alicanb Nov 29, 2017 •

edited

Loading

jwvdm commented Nov 29, 2017 •

edited

Loading