Implemented Relaxed Distributions by rachtsingh · Pull Request #113 · probtorch/pytorch

rachtsingh · 2018-01-29T20:58:34Z

Not sure if this is the right name for this distribution (Concrete / GumbelSoftmax are other ideas), but this is what Tensorflow calls it. This PR uses the transforms machinery :)

I had to edit TransformedDistribution's log_prob method to take into account event_shape - this is probably not the right way to do it, but a quick first try that makes it work.

Implement RelaxedOneHotCategorical
Implement RelaxedBernoulli

cc @fritzo

fritzo

Nice factorization into 2 classes!

fritzo · 2018-01-30T01:03:26Z

Could you update the docs and example sample?

fritzo · 2018-01-30T01:04:12Z

ExpRelaxedCategorical seems ok too, since OneHot is kind of the only relay to relax.

fritzo · 2018-01-30T01:06:18Z

Since Tensor and Variable are being merged, this might not be necessary.

Prefer torch.is_tensor(), since torch.Tensor is an alias rather than a base class:

isinstance(torch.FloatTensor([0]), torch.Tensor) # True isinstance(torch.DoubleTensor([0]), torch.Tensor) # False torch.Tensor is torch.FloatTensor # True

I'm hoping it's not necessary soon, but it definitely is right now because log_softmax takes Variables and raises when called on a Tensor. Can we leave the shim until the merge?

Definitely, didn't realize this was the case.

fritzo · 2018-01-30T01:09:07Z

Document temperature parameter

fritzo · 2018-01-30T01:09:49Z

Is __neg__ cheaper or more stable than .mul(-1)?

Can I call it via -(-(uniforms.log()).log())?

Yes, I just assume it's a tiny bit cheaper.

fritzo · 2018-01-30T01:12:47Z

I would expect either the transform to be ExpTransform().inv or the base distribution to be LogRelaxedOneHotCategorical. Is the latter a reasonable renaming, analogous to LogNormal?

Right, I agree with LogRelaxedOneHotCategorical as well. However, TF uses the version I have because that's what the paper names it. My thinking is that it's probably more important just to document it and be consistent with the other interfaces, since no user will need to see it.

Sounds good, let's keep it standard.

fritzo · 2018-01-30T01:20:24Z

Some transforms should already know their event shape, in which case this .sum(-1) would sum too much. Also sometimes log_prob is just the number 0 here (e.g. for identity_transform). I think it would be safer to do one of:

add an event_shape or extra_event_shape arg to pointwise transforms

implemement an TensorTransform(base_transform, event_dim=1) to wrap ExpTransform

make this hack a tiny bit more robust:

log_prob = ...sum of log_abs_det_jacobian_terms... base_log_prob = self.base_dist.log_prob(y) # knows the correct dim if not isinstance(log_prob, numbers.Number): while log_prob.dim() > base_log_prob.dim(): log_prob = log_prob.sum(-1) log_prob += base_log_prob

I like solution 1. the best. I think fixing the hack as you did doesn't work because log_prob isn't a number most of the time - for example it can be [sample_shape x batch_shape x ...]. Let me rethink this and get back to you.

👍 We should also make constraints more aware of event dim. @tbrx is already doing this for multivariate normal in #52 and we'll need it e.g. for AffineOperatorTransform which transforms from constraints.real_vector to constraints.real_vector.

After thinking about this a bit, I think a 4th approach could be minimally invasive:

add a static event_dim attribute to each Transform (only about 4 lines of code diff)

update the log_prob accumulation loop in TransformedDistribution.log_prob()

I'll sketch this in a PR so just to discuss. EDIT here it is: #116

rachtsingh · 2018-01-31T12:49:48Z

Ok, made changes based on comments, and added the RelaxedBernoulli distribution. No longer blocked, so ready for another review if necessary @fritzo.

In some cases when there are two different versions of cudnn installed, one under /usr/local/cuda and other under a virtual env such as conda or under the main system path /usr/include, the compiler would pickup the cudnn.h from the virtual env/system path first. This is because cmake generates C_INCLUDES and CXX_INCLUDES flags with system include path first. All this may lead to linking problems as described in Issue pytorch#4869 Fixes pytorch#4869

rachtsingh · 2018-02-04T23:45:16Z

Added tests that:

Rounding the RelaxedBernoulli distribution gives the corresponding Bernoulli distribution
Taking the argmax of the RelaxedOneHotCategorical gives the corresponding Categorical distribution
As the temperature becomes very large, the first consistently gives 0.5 and the latter gives equal values for each index.

@ezyang

* Remove addValues and use WithInsertPoint * Use blocks to simplify differentiate Using @ezyang's suggestion, this change uses a block rather than staging annotations to represent the reverse pass. This allows us to reuse the machinery to copy graphs/blocks to extract the reverse pass concisely. This also change the input order of Gradients df to: [output vjps][temporary vjps][captures] In addition to being simpler to generate in this order, it also will allow ExecutionPlan to append the captures onto the already- existing input list of vjps that are given by the autograd, rather than have to prepend them, which should be slightly cheaper. * Enforce that input capture are before outputs This changes the Gradient struct to enforce that input captures appear before output captures in the capture list, which makes it easier to use in ExecutionPlan.

…r. (pytorch#5003) * Don't allow scalars where vectors are required in mv, addmv, ger, addr. * Fix scalar_tensor_test for ger. * Address review comments. * Fix merge.

Once Variable and Tensor are merged the existing Variable test would cause an infinite recursion. Instead, modify the Variables directly inside a `no_grad()` block.

sspaddmm, mm for sparse tensors to come in another pr; they're a little more involved.

fritzo

LGTM

fritzo · 2018-02-05T16:39:02Z

fritzo · 2018-02-05T16:46:53Z

Just curious, have you considered implementing this as a TransformedDistribution on top of Gumbel? That probably wouldn't be as good as your implementation here, but I'm curious why TransformedDistribution wouldn't work, and how we could improve it to be suitable in this context. For example, could we define this as a TranformedDistribution(Gumbel(...), BoltzmannTransform()) and implement a suitable BoltzmannTransform.log_abs_det_jacobian()?

We can definitely implement this via a TransformedDistribution of Gumbel + BoltzmannTransform; I just didn't see the BoltzmannTransform. I'll make an issue about implementing the log_abs_det_jacobian and then revisit it when that's been solved?

I think this will be tricky, but let's move discussion to that issue. Feel free to send this PR upstream before the refactoring.

* add reduce=True argument to MultiLabelMarginLoss * Fix lint * Addressed comments * Remove unneeded syncthreads calls

rachtsingh mentioned this pull request Jan 29, 2018

Implement Concrete Distribution #2

Closed

fritzo reviewed Jan 30, 2018

View reviewed changes

fritzo mentioned this pull request Jan 30, 2018

Correctly handle event_dim in Transforms #116

Closed

1 task

rachtsingh force-pushed the concrete branch from 8a7bde5 to 01c1c9a Compare January 30, 2018 16:31

fritzo mentioned this pull request Jan 30, 2018

Support multivariate TransformedDistributions pytorch/pytorch#4937

Merged

rachtsingh force-pushed the concrete branch 2 times, most recently from 9e64063 to ca14e21 Compare January 31, 2018 12:47

rachtsingh force-pushed the concrete branch from ca14e21 to 619da2b Compare January 31, 2018 23:18

rachtsingh changed the title ~~Implemented RelaxedOneHotCategorical Distribution~~ Implemented Relaxed Distributions Feb 1, 2018

zdevito and others added 4 commits February 5, 2018 10:43

Don't allow scalars where vectors are required in mv, addmv, ger, add…

f4a2b0e

…r. (pytorch#5003) * Don't allow scalars where vectors are required in mv, addmv, ger, addr. * Fix scalar_tensor_test for ger. * Address review comments. * Fix merge.

Operate on Variables in torch.nn.init (pytorch#4964)

76ae03d

Once Variable and Tensor are merged the existing Variable test would cause an infinite recursion. Instead, modify the Variables directly inside a `no_grad()` block.

Expose sparse variable addmm, addmm_ (pytorch#5016)

ba61eee

sspaddmm, mm for sparse tensors to come in another pr; they're a little more involved.

fritzo approved these changes Feb 5, 2018

View reviewed changes

li-roy and others added 2 commits February 5, 2018 12:28

add reduce=True argument to MultiLabelMarginLoss (pytorch#4924)

28f056f

* add reduce=True argument to MultiLabelMarginLoss * Fix lint * Addressed comments * Remove unneeded syncthreads calls

Implemented RelaxedOneHotCategorical + RelaxedBernoulli distributions

ea89b75

rachtsingh force-pushed the concrete branch from b1876aa to ea89b75 Compare February 5, 2018 18:55

rachtsingh closed this Feb 5, 2018

This was referenced Feb 5, 2018

Implemented RelaxedOneHotCategorical + RelaxedBernoulli distributions pytorch/pytorch#5056

Merged

Implement log_abs_det_jacobian for the SoftmaxTransform? #123

Open

Conversation

rachtsingh commented Jan 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fritzo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fritzo Jan 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fritzo Jan 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fritzo Jan 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rachtsingh commented Jan 31, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rachtsingh commented Feb 4, 2018

Uh oh!

fritzo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

rachtsingh commented Jan 29, 2018 •

edited

Loading

fritzo Jan 30, 2018 •

edited

Loading

fritzo Jan 30, 2018 •

edited

Loading

fritzo Jan 30, 2018 •

edited

Loading

rachtsingh commented Jan 31, 2018 •

edited

Loading