Distributions
PyTorch Distributions
Most distributions in Pyro are thin wrappers around PyTorch distributions.
For details on the PyTorch distribution interface, see
torch.distributions.distribution.Distribution.
For differences between the Pyro and PyTorch interfaces, see
TorchDistributionMixin.
Bernoulli
- class Bernoulli(probs=None, logits=None, validate_args=None)
Wraps
torch.distributions.bernoulli.BernoulliwithTorchDistributionMixin.Creates a Bernoulli distribution parameterized by
probsorlogits(but not both).Samples are binary (0 or 1). They take the value 1 with probability p and 0 with probability 1 - p.
Example:
>>> m = Bernoulli(torch.tensor([0.3])) >>> m.sample() # 30% chance 1; 70% chance 0 tensor([ 0.])
- Parameters
probs (Number, Tensor) – the probability of sampling 1
logits (Number, Tensor) – the log-odds of sampling 1
Beta
- class Beta(concentration1, concentration0, validate_args=None)[source]
Wraps
torch.distributions.beta.BetawithTorchDistributionMixin.Beta distribution parameterized by
concentration1andconcentration0.Example:
>>> m = Beta(torch.tensor([0.5]), torch.tensor([0.5])) >>> m.sample() # Beta distributed with concentration concentration1 and concentration0 tensor([ 0.1046])
Binomial
- class Binomial(total_count=1, probs=None, logits=None, validate_args=None)[source]
Wraps
torch.distributions.binomial.BinomialwithTorchDistributionMixin.Creates a Binomial distribution parameterized by
total_countand eitherprobsorlogits(but not both).total_countmust be broadcastable withprobs/logits.Example:
>>> m = Binomial(100, torch.tensor([0 , .2, .8, 1])) >>> x = m.sample() tensor([ 0., 22., 71., 100.]) >>> m = Binomial(torch.tensor([[5.], [10.]]), torch.tensor([0.5, 0.8])) >>> x = m.sample() tensor([[ 4., 5.], [ 7., 6.]])
- Parameters
total_count (int or Tensor) – number of Bernoulli trials
probs (Tensor) – Event probabilities
logits (Tensor) – Event log-odds
Categorical
- class Categorical(probs=None, logits=None, validate_args=None)[source]
Wraps
torch.distributions.categorical.CategoricalwithTorchDistributionMixin.Creates a categorical distribution parameterized by either
probsorlogits(but not both).Note
It is equivalent to the distribution that
torch.multinomial()samples from.Samples are integers from \(\{0, \ldots, K-1\}\) where K is
probs.size(-1).If probs is 1-dimensional with length-K, each element is the relative probability of sampling the class at that index.
If probs is N-dimensional, the first N-1 dimensions are treated as a batch of relative probability vectors.
Note
The probs argument must be non-negative, finite and have a non-zero sum, and it will be normalized to sum to 1 along the last dimension.
probswill return this normalized value. The logits argument will be interpreted as unnormalized log probabilities and can therefore be any real number. It will likewise be normalized so that the resulting probabilities sum to 1 along the last dimension.logitswill return this normalized value.See also:
torch.multinomial()Example:
>>> m = Categorical(torch.tensor([ 0.25, 0.25, 0.25, 0.25 ])) >>> m.sample() # equal probability of 0, 1, 2, 3 tensor(3)
- Parameters
probs (Tensor) – event probabilities
logits (Tensor) – event log probabilities (unnormalized)
Cauchy
- class Cauchy(loc, scale, validate_args=None)
Wraps
torch.distributions.cauchy.CauchywithTorchDistributionMixin.Samples from a Cauchy (Lorentz) distribution. The distribution of the ratio of independent normally distributed random variables with means 0 follows a Cauchy distribution.
Example:
>>> m = Cauchy(torch.tensor([0.0]), torch.tensor([1.0])) >>> m.sample() # sample from a Cauchy distribution with loc=0 and scale=1 tensor([ 2.3214])
Chi2
- class Chi2(df, validate_args=None)
Wraps
torch.distributions.chi2.Chi2withTorchDistributionMixin.Creates a Chi-squared distribution parameterized by shape parameter
df. This is exactly equivalent toGamma(alpha=0.5*df, beta=0.5)Example:
>>> m = Chi2(torch.tensor([1.0])) >>> m.sample() # Chi2 distributed with shape df=1 tensor([ 0.1046])
- Parameters
df (float or Tensor) – shape parameter of the distribution
ContinuousBernoulli
- class ContinuousBernoulli(probs=None, logits=None, lims=(0.499, 0.501), validate_args=None)
Wraps
torch.distributions.continuous_bernoulli.ContinuousBernoulliwithTorchDistributionMixin.Creates a continuous Bernoulli distribution parameterized by
probsorlogits(but not both).The distribution is supported in [0, 1] and parameterized by ‘probs’ (in (0,1)) or ‘logits’ (real-valued). Note that, unlike the Bernoulli, ‘probs’ does not correspond to a probability and ‘logits’ does not correspond to log-odds, but the same names are used due to the similarity with the Bernoulli. See [1] for more details.
Example:
>>> m = ContinuousBernoulli(torch.tensor([0.3])) >>> m.sample() tensor([ 0.2538])
- Parameters
probs (Number, Tensor) – (0,1) valued parameters
logits (Number, Tensor) – real valued parameters whose sigmoid matches ‘probs’
[1] The continuous Bernoulli: fixing a pervasive error in variational autoencoders, Loaiza-Ganem G and Cunningham JP, NeurIPS 2019. https://arxiv.org/abs/1907.06845
Dirichlet
- class Dirichlet(concentration, validate_args=None)[source]
Wraps
torch.distributions.dirichlet.DirichletwithTorchDistributionMixin.Creates a Dirichlet distribution parameterized by concentration
concentration.Example:
>>> m = Dirichlet(torch.tensor([0.5, 0.5])) >>> m.sample() # Dirichlet distributed with concentration [0.5, 0.5] tensor([ 0.1046, 0.8954])
- Parameters
concentration (Tensor) – concentration parameter of the distribution (often referred to as alpha)
Exponential
- class Exponential(rate, validate_args=None)
Wraps
torch.distributions.exponential.ExponentialwithTorchDistributionMixin.Creates a Exponential distribution parameterized by
rate.Example:
>>> m = Exponential(torch.tensor([1.0])) >>> m.sample() # Exponential distributed with rate=1 tensor([ 0.1046])
- Parameters
rate (float or Tensor) – rate = 1 / scale of the distribution
ExponentialFamily
- class ExponentialFamily(batch_shape: torch.Size = torch.Size([]), event_shape: torch.Size = torch.Size([]), validate_args: Optional[bool] = None)
Wraps
torch.distributions.exp_family.ExponentialFamilywithTorchDistributionMixin.ExponentialFamily is the abstract base class for probability distributions belonging to an exponential family, whose probability mass/density function has the form is defined below
\[p_{F}(x; \theta) = \exp(\langle t(x), \theta\rangle - F(\theta) + k(x))\]where \(\theta\) denotes the natural parameters, \(t(x)\) denotes the sufficient statistic, \(F(\theta)\) is the log normalizer function for a given family and \(k(x)\) is the carrier measure.
Note
This class is an intermediary between the Distribution class and distributions which belong to an exponential family mainly to check the correctness of the .entropy() and analytic KL divergence methods. We use this class to compute the entropy and KL divergence using the AD framework and Bregman divergences (courtesy of: Frank Nielsen and Richard Nock, Entropies and Cross-entropies of Exponential Families).
FisherSnedecor
- class FisherSnedecor(df1, df2, validate_args=None)
Wraps
torch.distributions.fishersnedecor.FisherSnedecorwithTorchDistributionMixin.Creates a Fisher-Snedecor distribution parameterized by
df1anddf2.Example:
>>> m = FisherSnedecor(torch.tensor([1.0]), torch.tensor([2.0])) >>> m.sample() # Fisher-Snedecor-distributed with df1=1 and df2=2 tensor([ 0.2453])
Gamma
- class Gamma(concentration, rate, validate_args=None)[source]
Wraps
torch.distributions.gamma.GammawithTorchDistributionMixin.Creates a Gamma distribution parameterized by shape
concentrationandrate.Example:
>>> m = Gamma(torch.tensor([1.0]), torch.tensor([1.0])) >>> m.sample() # Gamma distributed with concentration=1 and rate=1 tensor([ 0.1046])
Geometric
- class Geometric(probs=None, logits=None, validate_args=None)[source]
Wraps
torch.distributions.geometric.GeometricwithTorchDistributionMixin.Creates a Geometric distribution parameterized by
probs, whereprobsis the probability of success of Bernoulli trials. It represents the probability that in \(k + 1\) Bernoulli trials, the first \(k\) trials failed, before seeing a success.Samples are non-negative integers [0, \(\inf\)).
Example:
>>> m = Geometric(torch.tensor([0.3])) >>> m.sample() # underlying Bernoulli has 30% chance 1; 70% chance 0 tensor([ 2.])
- Parameters
probs (Number, Tensor) – the probability of sampling 1. Must be in range (0, 1]
logits (Number, Tensor) – the log-odds of sampling 1.
Gumbel
- class Gumbel(loc, scale, validate_args=None)
Wraps
torch.distributions.gumbel.GumbelwithTorchDistributionMixin.Samples from a Gumbel Distribution.
Examples:
>>> m = Gumbel(torch.tensor([1.0]), torch.tensor([2.0])) >>> m.sample() # sample from Gumbel distribution with loc=1, scale=2 tensor([ 1.0124])
HalfCauchy
- class HalfCauchy(scale, validate_args=None)
Wraps
torch.distributions.half_cauchy.HalfCauchywithTorchDistributionMixin.Creates a half-Cauchy distribution parameterized by scale where:
X ~ Cauchy(0, scale) Y = |X| ~ HalfCauchy(scale)
Example:
>>> m = HalfCauchy(torch.tensor([1.0])) >>> m.sample() # half-cauchy distributed with scale=1 tensor([ 2.3214])
- Parameters
scale (float or Tensor) – scale of the full Cauchy distribution
HalfNormal
- class HalfNormal(scale, validate_args=None)
Wraps
torch.distributions.half_normal.HalfNormalwithTorchDistributionMixin.Creates a half-normal distribution parameterized by scale where:
X ~ Normal(0, scale) Y = |X| ~ HalfNormal(scale)
Example:
>>> m = HalfNormal(torch.tensor([1.0])) >>> m.sample() # half-normal distributed with scale=1 tensor([ 0.1046])
- Parameters
scale (float or Tensor) – scale of the full Normal distribution
Independent
- class Independent(base_distribution, reinterpreted_batch_ndims, validate_args=None)[source]
Wraps
torch.distributions.independent.IndependentwithTorchDistributionMixin.Reinterprets some of the batch dims of a distribution as event dims.
This is mainly useful for changing the shape of the result of
log_prob(). For example to create a diagonal Normal distribution with the same shape as a Multivariate Normal distribution (so they are interchangeable), you can:>>> from torch.distributions.multivariate_normal import MultivariateNormal >>> from torch.distributions.normal import Normal >>> loc = torch.zeros(3) >>> scale = torch.ones(3) >>> mvn = MultivariateNormal(loc, scale_tril=torch.diag(scale)) >>> [mvn.batch_shape, mvn.event_shape] [torch.Size([]), torch.Size([3])] >>> normal = Normal(loc, scale) >>> [normal.batch_shape, normal.event_shape] [torch.Size([3]), torch.Size([])] >>> diagn = Independent(normal, 1) >>> [diagn.batch_shape, diagn.event_shape] [torch.Size([]), torch.Size([3])]
- Parameters
base_distribution (torch.distributions.distribution.Distribution) – a base distribution
reinterpreted_batch_ndims (int) – the number of batch dims to reinterpret as event dims
Kumaraswamy
- class Kumaraswamy(concentration1, concentration0, validate_args=None)
Wraps
torch.distributions.kumaraswamy.KumaraswamywithTorchDistributionMixin.Samples from a Kumaraswamy distribution.
Example:
>>> m = Kumaraswamy(torch.tensor([1.0]), torch.tensor([1.0])) >>> m.sample() # sample from a Kumaraswamy distribution with concentration alpha=1 and beta=1 tensor([ 0.1729])
LKJCholesky
- class LKJCholesky(dim, concentration=1.0, validate_args=None)
Wraps
torch.distributions.lkj_cholesky.LKJCholeskywithTorchDistributionMixin.LKJ distribution for lower Cholesky factor of correlation matrices. The distribution is controlled by
concentrationparameter \(\eta\) to make the probability of the correlation matrix \(M\) generated from a Cholesky factor proportional to \(\det(M)^{\eta - 1}\). Because of that, whenconcentration == 1, we have a uniform distribution over Cholesky factors of correlation matrices:L ~ LKJCholesky(dim, concentration) X = L @ L' ~ LKJCorr(dim, concentration)
Note that this distribution samples the Cholesky factor of correlation matrices and not the correlation matrices themselves and thereby differs slightly from the derivations in [1] for the LKJCorr distribution. For sampling, this uses the Onion method from [1] Section 3.
Example:
>>> l = LKJCholesky(3, 0.5) >>> l.sample() # l @ l.T is a sample of a correlation 3x3 matrix tensor([[ 1.0000, 0.0000, 0.0000], [ 0.3516, 0.9361, 0.0000], [-0.1899, 0.4748, 0.8593]])
- Parameters
dimension (dim) – dimension of the matrices
concentration (float or Tensor) – concentration/shape parameter of the distribution (often referred to as eta)
References
[1] Generating random correlation matrices based on vines and extended onion method (2009), Daniel Lewandowski, Dorota Kurowicka, Harry Joe. Journal of Multivariate Analysis. 100. 10.1016/j.jmva.2009.04.008
Laplace
- class Laplace(loc, scale, validate_args=None)
Wraps
torch.distributions.laplace.LaplacewithTorchDistributionMixin.Creates a Laplace distribution parameterized by
locandscale.Example:
>>> m = Laplace(torch.tensor([0.0]), torch.tensor([1.0])) >>> m.sample() # Laplace distributed with loc=0, scale=1 tensor([ 0.1046])
LogNormal
- class LogNormal(loc, scale, validate_args=None)[source]
Wraps
torch.distributions.log_normal.LogNormalwithTorchDistributionMixin.Creates a log-normal distribution parameterized by
locandscalewhere:X ~ Normal(loc, scale) Y = exp(X) ~ LogNormal(loc, scale)
Example:
>>> m = LogNormal(torch.tensor([0.0]), torch.tensor([1.0])) >>> m.sample() # log-normal distributed with mean=0 and stddev=1 tensor([ 0.1046])
LogisticNormal
- class LogisticNormal(loc, scale, validate_args=None)
Wraps
torch.distributions.logistic_normal.LogisticNormalwithTorchDistributionMixin.Creates a logistic-normal distribution parameterized by
locandscalethat define the base Normal distribution transformed with the StickBreakingTransform such that:X ~ LogisticNormal(loc, scale) Y = log(X / (1 - X.cumsum(-1)))[..., :-1] ~ Normal(loc, scale)
- Parameters
Example:
>>> # logistic-normal distributed with mean=(0, 0, 0) and stddev=(1, 1, 1) >>> # of the base Normal distribution >>> m = LogisticNormal(torch.tensor([0.0] * 3), torch.tensor([1.0] * 3)) >>> m.sample() tensor([ 0.7653, 0.0341, 0.0579, 0.1427])
LowRankMultivariateNormal
- class LowRankMultivariateNormal(loc, cov_factor, cov_diag, validate_args=None)[source]
Wraps
torch.distributions.lowrank_multivariate_normal.LowRankMultivariateNormalwithTorchDistributionMixin.Creates a multivariate normal distribution with covariance matrix having a low-rank form parameterized by
cov_factorandcov_diag:covariance_matrix = cov_factor @ cov_factor.T + cov_diag
Example
>>> m = LowRankMultivariateNormal(torch.zeros(2), torch.tensor([[1.], [0.]]), torch.ones(2)) >>> m.sample() # normally distributed with mean=`[0,0]`, cov_factor=`[[1],[0]]`, cov_diag=`[1,1]` tensor([-0.2102, -0.5429])
- Parameters
loc (Tensor) – mean of the distribution with shape batch_shape + event_shape
cov_factor (Tensor) – factor part of low-rank form of covariance matrix with shape batch_shape + event_shape + (rank,)
cov_diag (Tensor) – diagonal part of low-rank form of covariance matrix with shape batch_shape + event_shape
Note
The computation for determinant and inverse of covariance matrix is avoided when cov_factor.shape[1] << cov_factor.shape[0] thanks to Woodbury matrix identity and matrix determinant lemma. Thanks to these formulas, we just need to compute the determinant and inverse of the small size “capacitance” matrix:
capacitance = I + cov_factor.T @ inv(cov_diag) @ cov_factor
MixtureSameFamily
- class MixtureSameFamily(mixture_distribution, component_distribution, validate_args=None)
Wraps
torch.distributions.mixture_same_family.MixtureSameFamilywithTorchDistributionMixin.The MixtureSameFamily distribution implements a (batch of) mixture distribution where all component are from different parameterizations of the same distribution type. It is parameterized by a Categorical “selecting distribution” (over k component) and a component distribution, i.e., a Distribution with a rightmost batch shape (equal to [k]) which indexes each (batch of) component.
Examples:
>>> # Construct Gaussian Mixture Model in 1D consisting of 5 equally >>> # weighted normal distributions >>> mix = D.Categorical(torch.ones(5,)) >>> comp = D.Normal(torch.randn(5,), torch.rand(5,)) >>> gmm = MixtureSameFamily(mix, comp) >>> # Construct Gaussian Mixture Modle in 2D consisting of 5 equally >>> # weighted bivariate normal distributions >>> mix = D.Categorical(torch.ones(5,)) >>> comp = D.Independent(D.Normal( ... torch.randn(5,2), torch.rand(5,2)), 1) >>> gmm = MixtureSameFamily(mix, comp) >>> # Construct a batch of 3 Gaussian Mixture Models in 2D each >>> # consisting of 5 random weighted bivariate normal distributions >>> mix = D.Categorical(torch.rand(3,5)) >>> comp = D.Independent(D.Normal( ... torch.randn(3,5,2), torch.rand(3,5,2)), 1) >>> gmm = MixtureSameFamily(mix, comp)
- Parameters
mixture_distribution – torch.distributions.Categorical-like instance. Manages the probability of selecting component. The number of categories must match the rightmost batch dimension of the component_distribution. Must have either scalar batch_shape or batch_shape matching component_distribution.batch_shape[:-1]
component_distribution – torch.distributions.Distribution-like instance. Right-most batch dimension indexes component.
Multinomial
- class Multinomial(total_count=1, probs=None, logits=None, validate_args=None)[source]
Wraps
torch.distributions.multinomial.MultinomialwithTorchDistributionMixin.Creates a Multinomial distribution parameterized by
total_countand eitherprobsorlogits(but not both). The innermost dimension ofprobsindexes over categories. All other dimensions index over batches.Note that
total_countneed not be specified if onlylog_prob()is called (see example below)Note
The probs argument must be non-negative, finite and have a non-zero sum, and it will be normalized to sum to 1 along the last dimension.
probswill return this normalized value. The logits argument will be interpreted as unnormalized log probabilities and can therefore be any real number. It will likewise be normalized so that the resulting probabilities sum to 1 along the last dimension.logitswill return this normalized value.sample()requires a single shared total_count for all parameters and samples.log_prob()allows different total_count for each parameter and sample.
Example:
>>> m = Multinomial(100, torch.tensor([ 1., 1., 1., 1.])) >>> x = m.sample() # equal probability of 0, 1, 2, 3 tensor([ 21., 24., 30., 25.]) >>> Multinomial(probs=torch.tensor([1., 1., 1., 1.])).log_prob(x) tensor([-4.1338])
- Parameters
total_count (int) – number of trials
probs (Tensor) – event probabilities
logits (Tensor) – event log probabilities (unnormalized)
MultivariateNormal
- class MultivariateNormal(loc, covariance_matrix=None, precision_matrix=None, scale_tril=None, validate_args=None)[source]
Wraps
torch.distributions.multivariate_normal.MultivariateNormalwithTorchDistributionMixin.Creates a multivariate normal (also called Gaussian) distribution parameterized by a mean vector and a covariance matrix.
The multivariate normal distribution can be parameterized either in terms of a positive definite covariance matrix \(\mathbf{\Sigma}\) or a positive definite precision matrix \(\mathbf{\Sigma}^{-1}\) or a lower-triangular matrix \(\mathbf{L}\) with positive-valued diagonal entries, such that \(\mathbf{\Sigma} = \mathbf{L}\mathbf{L}^\top\). This triangular matrix can be obtained via e.g. Cholesky decomposition of the covariance.
Example
>>> m = MultivariateNormal(torch.zeros(2), torch.eye(2)) >>> m.sample() # normally distributed with mean=`[0,0]` and covariance_matrix=`I` tensor([-0.2102, -0.5429])
- Parameters
loc (Tensor) – mean of the distribution
covariance_matrix (Tensor) – positive-definite covariance matrix
precision_matrix (Tensor) – positive-definite precision matrix
scale_tril (Tensor) – lower-triangular factor of covariance, with positive-valued diagonal
Note
Only one of
covariance_matrixorprecision_matrixorscale_trilcan be specified.Using
scale_trilwill be more efficient: all computations internally are based onscale_tril. Ifcovariance_matrixorprecision_matrixis passed instead, it is only used to compute the corresponding lower triangular matrices using a Cholesky decomposition.
NegativeBinomial
- class NegativeBinomial(total_count, probs=None, logits=None, validate_args=None)
Wraps
torch.distributions.negative_binomial.NegativeBinomialwithTorchDistributionMixin.Creates a Negative Binomial distribution, i.e. distribution of the number of successful independent and identical Bernoulli trials before
total_countfailures are achieved. The probability of success of each Bernoulli trial isprobs.- Parameters
total_count (float or Tensor) – non-negative number of negative Bernoulli trials to stop, although the distribution is still valid for real valued count
probs (Tensor) – Event probabilities of success in the half open interval [0, 1)
logits (Tensor) – Event log-odds for probabilities of success
Normal
- class Normal(loc, scale, validate_args=None)[source]
Wraps
torch.distributions.normal.NormalwithTorchDistributionMixin.Creates a normal (also called Gaussian) distribution parameterized by
locandscale.Example:
>>> m = Normal(torch.tensor([0.0]), torch.tensor([1.0])) >>> m.sample() # normally distributed with loc=0 and scale=1 tensor([ 0.1046])
OneHotCategorical
- class OneHotCategorical(probs=None, logits=None, validate_args=None)[source]
Wraps
torch.distributions.one_hot_categorical.OneHotCategoricalwithTorchDistributionMixin.Creates a one-hot categorical distribution parameterized by
probsorlogits.Samples are one-hot coded vectors of size
probs.size(-1).Note
The probs argument must be non-negative, finite and have a non-zero sum, and it will be normalized to sum to 1 along the last dimension.
probswill return this normalized value. The logits argument will be interpreted as unnormalized log probabilities and can therefore be any real number. It will likewise be normalized so that the resulting probabilities sum to 1 along the last dimension.logitswill return this normalized value.See also:
torch.distributions.Categorical()for specifications ofprobsandlogits.Example:
>>> m = OneHotCategorical(torch.tensor([ 0.25, 0.25, 0.25, 0.25 ])) >>> m.sample() # equal probability of 0, 1, 2, 3 tensor([ 0., 0., 0., 1.])
- Parameters
probs (Tensor) – event probabilities
logits (Tensor) – event log probabilities (unnormalized)
OneHotCategoricalStraightThrough
- class OneHotCategoricalStraightThrough(probs=None, logits=None, validate_args=None)
Wraps
torch.distributions.one_hot_categorical.OneHotCategoricalStraightThroughwithTorchDistributionMixin.Creates a reparameterizable
OneHotCategoricaldistribution based on the straight- through gradient estimator from [1].[1] Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation (Bengio et al, 2013)
Pareto
- class Pareto(scale, alpha, validate_args=None)
Wraps
torch.distributions.pareto.ParetowithTorchDistributionMixin.Samples from a Pareto Type 1 distribution.
Example:
>>> m = Pareto(torch.tensor([1.0]), torch.tensor([1.0])) >>> m.sample() # sample from a Pareto distribution with scale=1 and alpha=1 tensor([ 1.5623])
Poisson
- class Poisson(rate, *, is_sparse=False, validate_args=None)[source]
Wraps
torch.distributions.poisson.PoissonwithTorchDistributionMixin.Creates a Poisson distribution parameterized by
rate, the rate parameter.Samples are nonnegative integers, with a pmf given by
\[\mathrm{rate}^k \frac{e^{-\mathrm{rate}}}{k!}\]Example:
>>> m = Poisson(torch.tensor([4])) >>> m.sample() tensor([ 3.])
- Parameters
rate (Number, Tensor) – the rate parameter
RelaxedBernoulli
- class RelaxedBernoulli(temperature, probs=None, logits=None, validate_args=None)
Wraps
torch.distributions.relaxed_bernoulli.RelaxedBernoulliwithTorchDistributionMixin.Creates a RelaxedBernoulli distribution, parametrized by
temperature, and eitherprobsorlogits(but not both). This is a relaxed version of the Bernoulli distribution, so the values are in (0, 1), and has reparametrizable samples.Example:
>>> m = RelaxedBernoulli(torch.tensor([2.2]), ... torch.tensor([0.1, 0.2, 0.3, 0.99])) >>> m.sample() tensor([ 0.2951, 0.3442, 0.8918, 0.9021])
- Parameters
temperature (Tensor) – relaxation temperature
probs (Number, Tensor) – the probability of sampling 1
logits (Number, Tensor) – the log-odds of sampling 1
RelaxedOneHotCategorical
- class RelaxedOneHotCategorical(temperature, probs=None, logits=None, validate_args=None)
Wraps
torch.distributions.relaxed_categorical.RelaxedOneHotCategoricalwithTorchDistributionMixin.Creates a RelaxedOneHotCategorical distribution parametrized by
temperature, and eitherprobsorlogits. This is a relaxed version of theOneHotCategoricaldistribution, so its samples are on simplex, and are reparametrizable.Example:
>>> m = RelaxedOneHotCategorical(torch.tensor([2.2]), ... torch.tensor([0.1, 0.2, 0.3, 0.4])) >>> m.sample() tensor([ 0.1294, 0.2324, 0.3859, 0.2523])
- Parameters
temperature (Tensor) – relaxation temperature
probs (Tensor) – event probabilities
logits (Tensor) – unnormalized log probability for each event
StudentT
- class StudentT(df, loc=0.0, scale=1.0, validate_args=None)
Wraps
torch.distributions.studentT.StudentTwithTorchDistributionMixin.Creates a Student’s t-distribution parameterized by degree of freedom
df, meanlocand scalescale.Example:
>>> m = StudentT(torch.tensor([2.0])) >>> m.sample() # Student's t-distributed with degrees of freedom=2 tensor([ 0.1046])
TransformedDistribution
- class TransformedDistribution(base_distribution, transforms, validate_args=None)
Wraps
torch.distributions.transformed_distribution.TransformedDistributionwithTorchDistributionMixin.Extension of the Distribution class, which applies a sequence of Transforms to a base distribution. Let f be the composition of transforms applied:
X ~ BaseDistribution Y = f(X) ~ TransformedDistribution(BaseDistribution, f) log p(Y) = log p(X) + log |det (dX/dY)|
Note that the
.event_shapeof aTransformedDistributionis the maximum shape of its base distribution and its transforms, since transforms can introduce correlations among events.An example for the usage of
TransformedDistributionwould be:# Building a Logistic Distribution # X ~ Uniform(0, 1) # f = a + b * logit(X) # Y ~ f(X) ~ Logistic(a, b) base_distribution = Uniform(0, 1) transforms = [SigmoidTransform().inv, AffineTransform(loc=a, scale=b)] logistic = TransformedDistribution(base_distribution, transforms)
For more examples, please look at the implementations of
Gumbel,HalfCauchy,HalfNormal,LogNormal,Pareto,Weibull,RelaxedBernoulliandRelaxedOneHotCategorical
Uniform
- class Uniform(low, high, validate_args=None)[source]
Wraps
torch.distributions.uniform.UniformwithTorchDistributionMixin.Generates uniformly distributed random samples from the half-open interval
[low, high).Example:
>>> m = Uniform(torch.tensor([0.0]), torch.tensor([5.0])) >>> m.sample() # uniformly distributed in the range [0.0, 5.0) tensor([ 2.3418])
VonMises
- class VonMises(loc, concentration, validate_args=None)
Wraps
torch.distributions.von_mises.VonMiseswithTorchDistributionMixin.A circular von Mises distribution.
This implementation uses polar coordinates. The
locandvalueargs can be any real number (to facilitate unconstrained optimization), but are interpreted as angles modulo 2 pi.- Example::
>>> m = VonMises(torch.tensor([1.0]), torch.tensor([1.0])) >>> m.sample() # von Mises distributed with loc=1 and concentration=1 tensor([1.9777])
- Parameters
loc (torch.Tensor) – an angle in radians.
concentration (torch.Tensor) – concentration parameter
Weibull
- class Weibull(scale, concentration, validate_args=None)
Wraps
torch.distributions.weibull.WeibullwithTorchDistributionMixin.Samples from a two-parameter Weibull distribution.
Example
>>> m = Weibull(torch.tensor([1.0]), torch.tensor([1.0])) >>> m.sample() # sample from a Weibull distribution with scale=1, concentration=1 tensor([ 0.4784])
Wishart
- class Wishart(df: Union[torch.Tensor, numbers.Number], covariance_matrix: torch.Tensor = None, precision_matrix: torch.Tensor = None, scale_tril: torch.Tensor = None, validate_args=None)
Wraps
torch.distributions.wishart.WishartwithTorchDistributionMixin.Creates a Wishart distribution parameterized by a symmetric positive definite matrix \(\Sigma\), or its Cholesky decomposition \(\mathbf{\Sigma} = \mathbf{L}\mathbf{L}^\top\)
Example
>>> m = Wishart(torch.eye(2), torch.Tensor([2])) >>> m.sample() # Wishart distributed with mean=`df * I` and >>> # variance(x_ij)=`df` for i != j and variance(x_ij)=`2 * df` for i == j
- Parameters
covariance_matrix (Tensor) – positive-definite covariance matrix
precision_matrix (Tensor) – positive-definite precision matrix
scale_tril (Tensor) – lower-triangular factor of covariance, with positive-valued diagonal
df (float or Tensor) – real-valued parameter larger than the (dimension of Square matrix) - 1
Note
Only one of
covariance_matrixorprecision_matrixorscale_trilcan be specified. Usingscale_trilwill be more efficient: all computations internally are based onscale_tril. Ifcovariance_matrixorprecision_matrixis passed instead, it is only used to compute the corresponding lower triangular matrices using a Cholesky decomposition. ‘torch.distributions.LKJCholesky’ is a restricted Wishart distribution.[1]References
[1] Wang, Z., Wu, Y. and Chu, H., 2018. On equivalence of the LKJ distribution and the restricted Wishart distribution. [2] Sawyer, S., 2007. Wishart Distributions and Inverse-Wishart Sampling. [3] Anderson, T. W., 2003. An Introduction to Multivariate Statistical Analysis (3rd ed.). [4] Odell, P. L. & Feiveson, A. H., 1966. A Numerical Procedure to Generate a SampleCovariance Matrix. JASA, 61(313):199-203. [5] Ku, Y.-C. & Bloomfield, P., 2010. Generating Random Wishart Matrices with Fractional Degrees of Freedom in OX.
Pyro Distributions
Abstract Distribution
- class Distribution(*args, **kwargs)[source]
Bases:
objectBase class for parameterized probability distributions.
Distributions in Pyro are stochastic function objects with
sample()andlog_prob()methods. Distribution are stochastic functions with fixed parameters:d = dist.Bernoulli(param) x = d() # Draws a random sample. p = d.log_prob(x) # Evaluates log probability of x.
Implementing New Distributions:
Derived classes must implement the methods:
sample(),log_prob().Examples:
Take a look at the examples to see how they interact with inference algorithms.
- has_rsample = False
- has_enumerate_support = False
- __call__(*args, **kwargs)[source]
Samples a random value (just an alias for
.sample(*args, **kwargs)).For tensor distributions, the returned tensor should have the same
.shapeas the parameters.- Returns
A random value.
- Return type
- abstract sample(*args, **kwargs)[source]
Samples a random value.
For tensor distributions, the returned tensor should have the same
.shapeas the parameters, unless otherwise noted.- Parameters
sample_shape (torch.Size) – the size of the iid batch to be drawn from the distribution.
- Returns
A random value or batch of random values (if parameters are batched). The shape of the result should be
self.shape().- Return type
- abstract log_prob(x, *args, **kwargs)[source]
Evaluates log probability densities for each of a batch of samples.
- Parameters
x (torch.Tensor) – A single value or a batch of values batched along axis 0.
- Returns
log probability densities as a one-dimensional
Tensorwith same batch size as value and params. The shape of the result should beself.batch_size.- Return type
- score_parts(x, *args, **kwargs)[source]
Computes ingredients for stochastic gradient estimators of ELBO.
The default implementation is correct both for non-reparameterized and for fully reparameterized distributions. Partially reparameterized distributions should override this method to compute correct .score_function and .entropy_term parts.
Setting
.has_rsampleon a distribution instance will determine whether inference engines likeSVIuse reparameterized samplers or the score function estimator.- Parameters
x (torch.Tensor) – A single value or batch of values.
- Returns
A ScoreParts object containing parts of the ELBO estimator.
- Return type
ScoreParts
- enumerate_support(expand: bool = True) torch.Tensor[source]
Returns a representation of the parametrized distribution’s support, along the first dimension. This is implemented only by discrete distributions.
Note that this returns support values of all the batched RVs in lock-step, rather than the full cartesian product.
- Parameters
expand (bool) – whether to expand the result to a tensor of shape
(n,) + batch_shape + event_shape. If false, the return value has unexpanded shape(n,) + (1,)*len(batch_shape) + event_shapewhich can be broadcasted to the full shape.- Returns
An iterator over the distribution’s discrete support.
- Return type
iterator
- conjugate_update(other)[source]
EXPERIMENTAL Creates an updated distribution fusing information from another compatible distribution. This is supported by only a few conjugate distributions.
This should satisfy the equation:
fg, log_normalizer = f.conjugate_update(g) assert f.log_prob(x) + g.log_prob(x) == fg.log_prob(x) + log_normalizer
Note this is equivalent to
funsor.ops.addonFunsordistributions, but we return a lazy sum(updated, log_normalizer)because PyTorch distributions must be normalized. Thusconjugate_update()should commute withdist_to_funsor()andtensor_to_funsor()dist_to_funsor(f) + dist_to_funsor(g) == dist_to_funsor(fg) + tensor_to_funsor(log_normalizer)
- Parameters
other – A distribution representing
p(data|latent)but normalized overlatentrather thandata. Herelatentis a candidate sample fromselfanddatais a ground observation of unrelated type.- Returns
a pair
(updated,log_normalizer)whereupdatedis an updated distribution of typetype(self), andlog_normalizeris aTensorrepresenting the normalization factor.
- has_rsample_(value)[source]
Force reparameterized or detached sampling on a single distribution instance. This sets the
.has_rsampleattribute in-place.This is useful to instruct inference algorithms to avoid reparameterized gradients for variables that discontinuously determine downstream control flow.
- Parameters
value (bool) – Whether samples will be pathwise differentiable.
- Returns
self
- Return type
- property rv
EXPERIMENTAL Switch to the Random Variable DSL for applying transformations to random variables. Supports either chaining operations or arithmetic operator overloading.
Example usage:
# This should be equivalent to an Exponential distribution. Uniform(0, 1).rv.log().neg().dist # These two distributions Y1, Y2 should be the same X = Uniform(0, 1).rv Y1 = X.mul(4).pow(0.5).sub(1).abs().neg().dist Y2 = (-abs((4*X)**(0.5) - 1)).dist
- Returns
A :class: ~pyro.contrib.randomvariable.random_variable.RandomVariable object wrapping this distribution.
- Return type
TorchDistributionMixin
- class TorchDistributionMixin(*args, **kwargs)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableMixin to provide Pyro compatibility for PyTorch distributions.
You should instead use TorchDistribution for new distribution classes.
This is mainly useful for wrapping existing PyTorch distributions for use in Pyro. Derived classes must first inherit from
torch.distributions.distribution.Distributionand then inherit fromTorchDistributionMixin.- __call__(sample_shape: torch.Size = torch.Size([])) torch.Tensor[source]
Samples a random value.
This is reparameterized whenever possible, calling
rsample()for reparameterized distributions andsample()for non-reparameterized distributions.- Parameters
sample_shape (torch.Size) – the size of the iid batch to be drawn from the distribution.
- Returns
A random value or batch of random values (if parameters are batched). The shape of the result should be self.shape().
- Return type
- property batch_shape: torch.Size
The shape over which parameters are batched. :rtype: torch.Size
- Type
return
- property event_shape: torch.Size
The shape of a single sample from the distribution (without batching). :rtype: torch.Size
- Type
return
- shape(sample_shape=torch.Size([]))[source]
The tensor shape of samples from this distribution.
Samples are of shape:
d.shape(sample_shape) == sample_shape + d.batch_shape + d.event_shape
- Parameters
sample_shape (torch.Size) – the size of the iid batch to be drawn from the distribution.
- Returns
Tensor shape of samples.
- Return type
- classmethod infer_shapes(**arg_shapes)[source]
Infers
batch_shapeandevent_shapegiven shapes of args to__init__().Note
This assumes distribution shape depends only on the shapes of tensor inputs, not in the data contained in those inputs.
- Parameters
**arg_shapes – Keywords mapping name of input arg to
torch.Sizeor tuple representing the sizes of each tensor input.- Returns
A pair
(batch_shape, event_shape)of the shapes of a distribution that would be created with input args of the given shapes.- Return type
- expand(batch_shape, _instance=None) pyro.distributions.torch_distribution.ExpandedDistribution[source]
Returns a new
ExpandedDistributioninstance with batch dimensions expanded to batch_shape.- Parameters
batch_shape (tuple) – batch shape to expand to.
_instance – unused argument for compatibility with
torch.distributions.Distribution.expand()
- Returns
an instance of ExpandedDistribution.
- Return type
ExpandedDistribution
- expand_by(sample_shape)[source]
Expands a distribution by adding
sample_shapeto the left side of itsbatch_shape.To expand internal dims of
self.batch_shapefrom 1 to something larger, useexpand()instead.- Parameters
sample_shape (torch.Size) – The size of the iid batch to be drawn from the distribution.
- Returns
An expanded version of this distribution.
- Return type
ExpandedDistribution
- to_event(reinterpreted_batch_ndims=None)[source]
Reinterprets the
nrightmost dimensions of this distributionsbatch_shapeas event dims, adding them to the left side ofevent_shape.Example
>>> [d1.batch_shape, d1.event_shape] [torch.Size([2, 3]), torch.Size([4, 5])] >>> d2 = d1.to_event(1) >>> [d2.batch_shape, d2.event_shape] [torch.Size([2]), torch.Size([3, 4, 5])] >>> d3 = d1.to_event(2) >>> [d3.batch_shape, d3.event_shape] [torch.Size([]), torch.Size([2, 3, 4, 5])]
- Parameters
reinterpreted_batch_ndims (int) – The number of batch dimensions to reinterpret as event dimensions. May be negative to remove dimensions from an
pyro.distributions.torch.Independent. If None, convert all dimensions to event dimensions.- Returns
A reshaped version of this distribution.
- Return type
- mask(mask)[source]
Masks a distribution by a boolean or boolean-valued tensor that is broadcastable to the distributions
batch_shape.- Parameters
mask (bool or torch.Tensor) – A boolean or boolean valued tensor.
- Returns
A masked copy of this distribution.
- Return type
TorchDistribution
- class TorchDistribution(batch_shape: torch.Size = torch.Size([]), event_shape: torch.Size = torch.Size([]), validate_args: Optional[bool] = None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableBase class for PyTorch-compatible distributions with Pyro support.
This should be the base class for almost all new Pyro distributions.
Note
Parameters and data should be of type
Tensorand all methods return typeTensorunless otherwise noted.Tensor Shapes:
TorchDistributions provide a method
.shape()for the tensor shape of samples:x = d.sample(sample_shape) assert x.shape == d.shape(sample_shape)
Pyro follows the same distribution shape semantics as PyTorch. It distinguishes between three different roles for tensor shapes of samples:
sample shape corresponds to the shape of the iid samples drawn from the distribution. This is taken as an argument by the distribution’s sample method.
batch shape corresponds to non-identical (independent) parameterizations of the distribution, inferred from the distribution’s parameter shapes. This is fixed for a distribution instance.
event shape corresponds to the event dimensions of the distribution, which is fixed for a distribution class. These are collapsed when we try to score a sample from the distribution via d.log_prob(x).
These shapes are related by the equation:
assert d.shape(sample_shape) == sample_shape + d.batch_shape + d.event_shape
Distributions provide a vectorized
log_prob()method that evaluates the log probability density of each event in a batch independently, returning a tensor of shapesample_shape + d.batch_shape:x = d.sample(sample_shape) assert x.shape == d.shape(sample_shape) log_p = d.log_prob(x) assert log_p.shape == sample_shape + d.batch_shape
Implementing New Distributions:
Derived classes must implement the methods
sample()(orrsample()if.has_rsample == True) andlog_prob(), and must implement the propertiesbatch_shape, andevent_shape. Discrete classes may also implement theenumerate_support()method to improve gradient estimates and set.has_enumerate_support = True.- expand(batch_shape, _instance=None) pyro.distributions.torch_distribution.ExpandedDistribution
Returns a new
ExpandedDistributioninstance with batch dimensions expanded to batch_shape.- Parameters
batch_shape (tuple) – batch shape to expand to.
_instance – unused argument for compatibility with
torch.distributions.Distribution.expand()
- Returns
an instance of ExpandedDistribution.
- Return type
ExpandedDistribution
AffineBeta
- class AffineBeta(concentration1, concentration0, loc, scale, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableBeta distribution scaled by
scaleand shifted byloc:X ~ Beta(concentration1, concentration0) f(X) = loc + scale * X Y = f(X) ~ AffineBeta(concentration1, concentration0, loc, scale)
- Parameters
concentration1 (float or torch.Tensor) – 1st concentration parameter (alpha) for the Beta distribution.
concentration0 (float or torch.Tensor) – 2nd concentration parameter (beta) for the Beta distribution.
loc (float or torch.Tensor) – location parameter.
scale (float or torch.Tensor) – scale parameter.
- arg_constraints: Dict[str, torch.distributions.constraints.Constraint] = {'concentration0': GreaterThan(lower_bound=0.0), 'concentration1': GreaterThan(lower_bound=0.0), 'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}
- property concentration0
- property concentration1
- property high
- property loc
- property low
- property mean
- rsample(sample_shape=torch.Size([]))[source]
Generates a sample from Beta distribution and applies AffineTransform. Additionally clamps the output in order to avoid NaN and Inf values in the gradients.
- sample(sample_shape=torch.Size([]))[source]
Generates a sample from Beta distribution and applies AffineTransform. Additionally clamps the output in order to avoid NaN and Inf values in the gradients.
- property sample_size
- property scale
- property support
- property variance
AsymmetricLaplace
- class AsymmetricLaplace(loc, scale, asymmetry, *, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableAsymmetric version of the
Laplacedistribution.To the left of
locthis acts like an-Exponential(1/(asymmetry*scale)); to the right oflocthis acts like anExponential(asymmetry/scale). The density is continuous so the left and right densities atlocagree.- Parameters
loc – Location parameter, i.e. the mode.
scale – Scale parameter = geometric mean of left and right scales.
asymmetry – Square of ratio of left to right scales.
- arg_constraints = {'asymmetry': GreaterThan(lower_bound=0.0), 'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}
- has_rsample = True
- property left_scale
- property mean
- property right_scale
- support = Real()
- property variance
AVFMultivariateNormal
- class AVFMultivariateNormal(loc, scale_tril, control_var)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableMultivariate normal (Gaussian) distribution with transport equation inspired control variates (adaptive velocity fields).
A distribution over vectors in which all the elements have a joint Gaussian density.
- Parameters
loc (torch.Tensor) – D-dimensional mean vector.
scale_tril (torch.Tensor) – Cholesky of Covariance matrix; D x D matrix.
control_var (torch.Tensor) – 2 x L x D tensor that parameterizes the control variate; L is an arbitrary positive integer. This parameter needs to be learned (i.e. adapted) to achieve lower variance gradients. In a typical use case this parameter will be adapted concurrently with the loc and scale_tril that define the distribution.
Example usage:
control_var = torch.tensor(0.1 * torch.ones(2, 1, D), requires_grad=True) opt_cv = torch.optim.Adam([control_var], lr=0.1, betas=(0.5, 0.999)) for _ in range(1000): d = AVFMultivariateNormal(loc, scale_tril, control_var) z = d.rsample() cost = torch.pow(z, 2.0).sum() cost.backward() opt_cv.step() opt_cv.zero_grad()
- arg_constraints = {'control_var': Real(), 'loc': Real(), 'scale_tril': LowerTriangular()}
BetaBinomial
- class BetaBinomial(concentration1, concentration0, total_count=1, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableCompound distribution comprising of a beta-binomial pair. The probability of success (
probsfor theBinomialdistribution) is unknown and randomly drawn from aBetadistribution prior to a certain number of Bernoulli trials given bytotal_count.- Parameters
concentration1 (float or torch.Tensor) – 1st concentration parameter (alpha) for the Beta distribution.
concentration0 (float or torch.Tensor) – 2nd concentration parameter (beta) for the Beta distribution.
total_count (float or torch.Tensor) – Number of Bernoulli trials.
- approx_log_prob_tol = 0.0
- arg_constraints = {'concentration0': GreaterThan(lower_bound=0.0), 'concentration1': GreaterThan(lower_bound=0.0), 'total_count': IntegerGreaterThan(lower_bound=0)}
- property concentration0
- property concentration1
- has_enumerate_support = True
- property mean
- property support
- property variance
CoalescentTimes
- class CoalescentTimes(leaf_times, rate=1.0, *, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableDistribution over sorted coalescent times given irregular sampled
leaf_timesand constant population size.Sample values will be sorted sets of binary coalescent times. Each sample
valuewill have cardinalityvalue.size(-1) = leaf_times.size(-1) - 1, so that phylogenies are complete binary trees. This distribution can thus be batched over multiple samples of phylogenies given fixed (number of) leaf times, e.g. over phylogeny samples from BEAST or MrBayes.References
- [1] J.F.C. Kingman (1982)
“On the Genealogy of Large Populations” Journal of Applied Probability
- [2] J.F.C. Kingman (1982)
“The Coalescent” Stochastic Processes and their Applications
- Parameters
leaf_times (torch.Tensor) – Vector of times of sampling events, i.e. leaf nodes in the phylogeny. These can be arbitrary real numbers with arbitrary order and duplicates.
rate (torch.Tensor) – Base coalescent rate (pairwise rate of coalescence) under a constant population size model. Defaults to 1.
- arg_constraints = {'leaf_times': Real(), 'rate': GreaterThan(lower_bound=0.0)}
- property support
CoalescentTimesWithRate
- class CoalescentTimesWithRate(leaf_times, rate_grid, *, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableDistribution over coalescent times given irregular sampled
leaf_timesand piecewise constant coalescent rates defined on a regular time grid.This assumes a piecewise constant base coalescent rate specified on time intervals
(-inf,1],[1,2], …,[T-1,inf), whereT = rate_grid.size(-1). Leaves may be sampled at arbitrary real times, but are commonly sampled in the interval[0, T].Sample values will be sorted sets of binary coalescent times. Each sample
valuewill have cardinalityvalue.size(-1) = leaf_times.size(-1) - 1, so that phylogenies are complete binary trees. This distribution can thus be batched over multiple samples of phylogenies given fixed (number of) leaf times, e.g. over phylogeny samples from BEAST or MrBayes.This distribution implements
log_prob()but not.sample().See also
CoalescentRateLikelihood.References
- [1] J.F.C. Kingman (1982)
“On the Genealogy of Large Populations” Journal of Applied Probability
- [2] J.F.C. Kingman (1982)
“The Coalescent” Stochastic Processes and their Applications
- [3] A. Popinga, T. Vaughan, T. Statler, A.J. Drummond (2014)
“Inferring epidemiological dynamics with Bayesian coalescent inference: The merits of deterministic and stochastic models” https://arxiv.org/pdf/1407.1792.pdf
- Parameters
leaf_times (torch.Tensor) – Tensor of times of sampling events, i.e. leaf nodes in the phylogeny. These can be arbitrary real numbers with arbitrary order and duplicates.
rate_grid (torch.Tensor) – Tensor of base coalescent rates (pairwise rate of coalescence). For example in a simple SIR model this might be
beta S / I. The rightmost dimension is time, and this tensor represents a (batch of) rates that are piecewise constant in time.
- arg_constraints = {'leaf_times': Real(), 'rate_grid': GreaterThan(lower_bound=0.0)}
- property duration
- log_prob(value)[source]
Computes likelihood as in equations 7-8 of [3].
This has time complexity
O(T + S N log(N))whereTis the number of time steps,Nis the number of leaves, andS = sample_shape.numel()is the number of samples ofvalue.- Parameters
value (torch.Tensor) – A tensor of coalescent times. These denote sets of size
leaf_times.size(-1) - 1along the trailing dimension and should be sorted along that dimension.- Returns
Likelihood
p(coal_times | leaf_times, rate_grid)- Return type
- property support
ConditionalDistribution
ConditionalTransformedDistribution
Delta
- class Delta(v, log_density=0.0, event_dim=0, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableDegenerate discrete distribution (a single point).
Discrete distribution that assigns probability one to the single element in its support. Delta distribution parameterized by a random choice should not be used with MCMC based inference, as doing so produces incorrect results.
- Parameters
v (torch.Tensor) – The single support element.
log_density (torch.Tensor) – An optional density for this Delta. This is useful to keep the class of
Deltadistributions closed under differentiable transformation.event_dim (int) – Optional event dimension, defaults to zero.
- arg_constraints = {'log_density': Real(), 'v': Dependent()}
- has_rsample = True
- property mean
- property support
- property variance
DirichletMultinomial
- class DirichletMultinomial(concentration, total_count=1, is_sparse=False, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableCompound distribution comprising of a dirichlet-multinomial pair. The probability of classes (
probsfor theMultinomialdistribution) is unknown and randomly drawn from aDirichletdistribution prior to a certain number of Categorical trials given bytotal_count.- Parameters
concentration (float or torch.Tensor) – concentration parameter (alpha) for the Dirichlet distribution.
total_count (int or torch.Tensor) – number of Categorical trials.
is_sparse (bool) – Whether to assume value is mostly zero when computing
log_prob(), which can speed up computation when data is sparse.
- arg_constraints = {'concentration': IndependentConstraint(GreaterThan(lower_bound=0.0), 1), 'total_count': IntegerGreaterThan(lower_bound=0)}
- property concentration
- property mean
- property support
- property variance
DiscreteHMM
- class DiscreteHMM(initial_logits, transition_logits, observation_dist, validate_args=None, duration=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableHidden Markov Model with discrete latent state and arbitrary observation distribution.
This uses [1] to parallelize over time, achieving O(log(time)) parallel complexity for computing
log_prob(),filter(), andsample().The event_shape of this distribution includes time on the left:
event_shape = (num_steps,) + observation_dist.event_shape
This distribution supports any combination of homogeneous/heterogeneous time dependency of
transition_logitsandobservation_dist. However, because time is included in this distribution’s event_shape, the homogeneous+homogeneous case will have a broadcastable event_shape withnum_steps = 1, allowinglog_prob()to work with arbitrary length data:# homogeneous + homogeneous case: event_shape = (1,) + observation_dist.event_shape
References:
- [1] Simo Sarkka, Angel F. Garcia-Fernandez (2019)
“Temporal Parallelization of Bayesian Filters and Smoothers” https://arxiv.org/pdf/1905.13002.pdf
- Parameters
initial_logits (Tensor) – A logits tensor for an initial categorical distribution over latent states. Should have rightmost size
state_dimand be broadcastable tobatch_shape + (state_dim,).transition_logits (Tensor) – A logits tensor for transition conditional distributions between latent states. Should have rightmost shape
(state_dim, state_dim)(old, new), and be broadcastable tobatch_shape + (num_steps, state_dim, state_dim).observation_dist (Distribution) – A conditional distribution of observed data conditioned on latent state. The
.batch_shapeshould have rightmost sizestate_dimand be broadcastable tobatch_shape + (num_steps, state_dim). The.event_shapemay be arbitrary.duration (int) – Optional size of the time axis
event_shape[0]. This is required when sampling from homogeneous HMMs whose parameters are not expanded along the time axis.
- arg_constraints = {'initial_logits': Real(), 'transition_logits': Real()}
- filter(value)[source]
Compute posterior over final state given a sequence of observations.
- Parameters
value (Tensor) – A sequence of observations.
- Returns
A posterior distribution over latent states at the final time step.
result.logitscan then be used asinitial_logitsin a sequential Pyro model for prediction.- Return type
- property support
EmpiricalDistribution
- class Empirical(samples, log_weights, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableEmpirical distribution associated with the sampled data. Note that the shape requirement for log_weights is that its shape must match the leftmost shape of samples. Samples are aggregated along the
aggregation_dim, which is the rightmost dim of log_weights.Example:
>>> emp_dist = Empirical(torch.randn(2, 3, 10), torch.ones(2, 3)) >>> emp_dist.batch_shape torch.Size([2]) >>> emp_dist.event_shape torch.Size([10])
>>> single_sample = emp_dist.sample() >>> single_sample.shape torch.Size([2, 10]) >>> batch_sample = emp_dist.sample((100,)) >>> batch_sample.shape torch.Size([100, 2, 10])
>>> emp_dist.log_prob(single_sample).shape torch.Size([2]) >>> # Vectorized samples cannot be scored by log_prob. >>> with pyro.validation_enabled(): ... emp_dist.log_prob(batch_sample).shape Traceback (most recent call last): ... ValueError: ``value.shape`` must be torch.Size([2, 10])
- Parameters
samples (torch.Tensor) – samples from the empirical distribution.
log_weights (torch.Tensor) – log weights (optional) corresponding to the samples.
- arg_constraints = {}
- enumerate_support(expand=True)[source]
See
pyro.distributions.torch_distribution.TorchDistribution.enumerate_support()
- property event_shape
See
pyro.distributions.torch_distribution.TorchDistribution.event_shape()
- has_enumerate_support = True
- log_prob(value)[source]
Returns the log of the probability mass function evaluated at
value. Note that this currently only supports scoring values with emptysample_shape.- Parameters
value (torch.Tensor) – scalar or tensor value to be scored.
- property log_weights
- property mean
See
pyro.distributions.torch_distribution.TorchDistribution.mean()
- sample(sample_shape=torch.Size([]))[source]
See
pyro.distributions.torch_distribution.TorchDistribution.sample()
- property sample_size
Number of samples that constitute the empirical distribution.
- Return int
number of samples collected.
- support = Real()
- property variance
See
pyro.distributions.torch_distribution.TorchDistribution.variance()
ExtendedBetaBinomial
- class ExtendedBetaBinomial(concentration1, concentration0, total_count=1, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableEXPERIMENTAL
BetaBinomialdistribution extended to have logical support the entire integers and to allow arbitrary integertotal_count. Numerical support is still the integer interval[0, total_count].- arg_constraints = {'concentration0': GreaterThan(lower_bound=0.0), 'concentration1': GreaterThan(lower_bound=0.0), 'total_count': Integer}
- support = Integer
ExtendedBinomial
- class ExtendedBinomial(total_count=1, probs=None, logits=None, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableEXPERIMENTAL
Binomialdistribution extended to have logical support the entire integers and to allow arbitrary integertotal_count. Numerical support is still the integer interval[0, total_count].- arg_constraints = {'logits': Real(), 'probs': Interval(lower_bound=0.0, upper_bound=1.0), 'total_count': Integer}
- support = Integer
FoldedDistribution
- class FoldedDistribution(base_dist, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableEquivalent to
TransformedDistribution(base_dist, AbsTransform()), but additionally supportslog_prob().- Parameters
base_dist (Distribution) – The distribution to reflect.
- support = GreaterThan(lower_bound=0.0)
GammaGaussianHMM
- class GammaGaussianHMM(scale_dist, initial_dist, transition_matrix, transition_dist, observation_matrix, observation_dist, validate_args=None, duration=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableHidden Markov Model with the joint distribution of initial state, hidden state, and observed state is a
MultivariateStudentTdistribution along the line of references [2] and [3]. This adapts [1] to parallelize over time to achieve O(log(time)) parallel complexity.This GammaGaussianHMM class corresponds to the generative model:
s = Gamma(df/2, df/2).sample() z = scale(initial_dist, s).sample() x = [] for t in range(num_events): z = z @ transition_matrix + scale(transition_dist, s).sample() x.append(z @ observation_matrix + scale(observation_dist, s).sample())
where scale(mvn(loc, precision), s) := mvn(loc, s * precision).
The event_shape of this distribution includes time on the left:
event_shape = (num_steps,) + observation_dist.event_shape
This distribution supports any combination of homogeneous/heterogeneous time dependency of
transition_distandobservation_dist. However, because time is included in this distribution’s event_shape, the homogeneous+homogeneous case will have a broadcastable event_shape withnum_steps = 1, allowinglog_prob()to work with arbitrary length data:event_shape = (1, obs_dim) # homogeneous + homogeneous case
References:
- [1] Simo Sarkka, Angel F. Garcia-Fernandez (2019)
“Temporal Parallelization of Bayesian Filters and Smoothers” https://arxiv.org/pdf/1905.13002.pdf
- [2] F. J. Giron and J. C. Rojano (1994)
“Bayesian Kalman filtering with elliptically contoured errors”
- [3] Filip Tronarp, Toni Karvonen, and Simo Sarkka (2019)
“Student’s t-filters for noise scale estimation” https://users.aalto.fi/~ssarkka/pub/SPL2019.pdf
- Variables
- Parameters
scale_dist (Gamma) – Prior of the mixing distribution.
initial_dist (MultivariateNormal) – A distribution with unit scale mixing over initial states. This should have batch_shape broadcastable to
self.batch_shape. This should have event_shape(hidden_dim,).transition_matrix (Tensor) – A linear transformation of hidden state. This should have shape broadcastable to
self.batch_shape + (num_steps, hidden_dim, hidden_dim)where the rightmost dims are ordered(old, new).transition_dist (MultivariateNormal) – A process noise distribution with unit scale mixing. This should have batch_shape broadcastable to
self.batch_shape + (num_steps,). This should have event_shape(hidden_dim,).observation_matrix (Tensor) – A linear transformation from hidden to observed state. This should have shape broadcastable to
self.batch_shape + (num_steps, hidden_dim, obs_dim).observation_dist (MultivariateNormal) – An observation noise distribution with unit scale mixing. This should have batch_shape broadcastable to
self.batch_shape + (num_steps,). This should have event_shape(obs_dim,).duration (int) – Optional size of the time axis
event_shape[0]. This is required when sampling from homogeneous HMMs whose parameters are not expanded along the time axis.
- arg_constraints = {}
- filter(value)[source]
Compute posteriors over the multiplier and the final state given a sequence of observations. The posterior is a pair of Gamma and MultivariateNormal distributions (i.e. a GammaGaussian instance).
- Parameters
value (Tensor) – A sequence of observations.
- Returns
A pair of posterior distributions over the mixing and the latent state at the final time step.
- Return type
a tuple of ~pyro.distributions.Gamma and ~pyro.distributions.MultivariateNormal
- support = IndependentConstraint(Real(), 2)
GammaPoisson
- class GammaPoisson(concentration, rate, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableCompound distribution comprising of a gamma-poisson pair, also referred to as a gamma-poisson mixture. The
rateparameter for thePoissondistribution is unknown and randomly drawn from aGammadistribution.Note
This can be treated as an alternate parametrization of the
NegativeBinomial(total_count,probs) distribution, with concentration = total_count and rate = (1 - probs) / probs.- Parameters
concentration (float or torch.Tensor) – shape parameter (alpha) of the Gamma distribution.
rate (float or torch.Tensor) – rate parameter (beta) for the Gamma distribution.
- arg_constraints = {'concentration': GreaterThan(lower_bound=0.0), 'rate': GreaterThan(lower_bound=0.0)}
- property concentration
- property mean
- property rate
- support = IntegerGreaterThan(lower_bound=0)
- property variance
GaussianHMM
- class GaussianHMM(initial_dist, transition_matrix, transition_dist, observation_matrix, observation_dist, validate_args=None, duration=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableHidden Markov Model with Gaussians for initial, transition, and observation distributions. This adapts [1] to parallelize over time to achieve O(log(time)) parallel complexity, however it differs in that it tracks the log normalizer to ensure
log_prob()is differentiable.This corresponds to the generative model:
z = initial_distribution.sample() x = [] for t in range(num_events): z = z @ transition_matrix + transition_dist.sample() x.append(z @ observation_matrix + observation_dist.sample())
The event_shape of this distribution includes time on the left:
event_shape = (num_steps,) + observation_dist.event_shape
This distribution supports any combination of homogeneous/heterogeneous time dependency of
transition_distandobservation_dist. However, because time is included in this distribution’s event_shape, the homogeneous+homogeneous case will have a broadcastable event_shape withnum_steps = 1, allowinglog_prob()to work with arbitrary length data:event_shape = (1, obs_dim) # homogeneous + homogeneous case
References:
- [1] Simo Sarkka, Angel F. Garcia-Fernandez (2019)
“Temporal Parallelization of Bayesian Filters and Smoothers” https://arxiv.org/pdf/1905.13002.pdf
- Variables
- Parameters
initial_dist (MultivariateNormal) – A distribution over initial states. This should have batch_shape broadcastable to
self.batch_shape. This should have event_shape(hidden_dim,).transition_matrix (Tensor) – A linear transformation of hidden state. This should have shape broadcastable to
self.batch_shape + (num_steps, hidden_dim, hidden_dim)where the rightmost dims are ordered(old, new).transition_dist (MultivariateNormal) – A process noise distribution. This should have batch_shape broadcastable to
self.batch_shape + (num_steps,). This should have event_shape(hidden_dim,).observation_matrix (Tensor) – A linear transformation from hidden to observed state. This should have shape broadcastable to
self.batch_shape + (num_steps, hidden_dim, obs_dim).observation_dist (MultivariateNormal or Normal) – An observation noise distribution. This should have batch_shape broadcastable to
self.batch_shape + (num_steps,). This should have event_shape(obs_dim,).duration (int) – Optional size of the time axis
event_shape[0]. This is required when sampling from homogeneous HMMs whose parameters are not expanded along the time axis.
- arg_constraints = {}
- conjugate_update(other)[source]
EXPERIMENTAL Creates an updated
GaussianHMMfusing information from another compatible distribution.This should satisfy:
fg, log_normalizer = f.conjugate_update(g) assert f.log_prob(x) + g.log_prob(x) == fg.log_prob(x) + log_normalizer
- Parameters
other (MultivariateNormal or Normal) – A distribution representing
p(data|self.probs)but normalized overself.probsrather thandata.- Returns
a pair
(updated,log_normalizer)whereupdatedis an updatedGaussianHMM, andlog_normalizeris aTensorrepresenting the normalization factor.
- filter(value)[source]
Compute posterior over final state given a sequence of observations.
- Parameters
value (Tensor) – A sequence of observations.
- Returns
A posterior distribution over latent states at the final time step.
resultcan then be used asinitial_distin a sequential Pyro model for prediction.- Return type
- has_rsample = True
- prefix_condition(data)[source]
EXPERIMENTAL Given self has
event_shape == (t+f, d)and dataxof shapebatch_shape + (t, d), compute a conditional distribution of event_shape(f, d). Typicallytis the number of training time steps,fis the number of forecast time steps, anddis the data dimension.- Parameters
data (Tensor) – data of dimension at least 2.
- rsample_posterior(value, sample_shape=torch.Size([]))[source]
EXPERIMENTAL Sample from the latent state conditioned on observation.
- support = IndependentConstraint(Real(), 2)
GaussianMRF
- class GaussianMRF(initial_dist, transition_dist, observation_dist, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableTemporal Markov Random Field with Gaussian factors for initial, transition, and observation distributions. This adapts [1] to parallelize over time to achieve O(log(time)) parallel complexity, however it differs in that it tracks the log normalizer to ensure
log_prob()is differentiable.The event_shape of this distribution includes time on the left:
event_shape = (num_steps,) + observation_dist.event_shape
This distribution supports any combination of homogeneous/heterogeneous time dependency of
transition_distandobservation_dist. However, because time is included in this distribution’s event_shape, the homogeneous+homogeneous case will have a broadcastable event_shape withnum_steps = 1, allowinglog_prob()to work with arbitrary length data:event_shape = (1, obs_dim) # homogeneous + homogeneous case
References:
- [1] Simo Sarkka, Angel F. Garcia-Fernandez (2019)
“Temporal Parallelization of Bayesian Filters and Smoothers” https://arxiv.org/pdf/1905.13002.pdf
- Variables
- Parameters
initial_dist (MultivariateNormal) – A distribution over initial states. This should have batch_shape broadcastable to
self.batch_shape. This should have event_shape(hidden_dim,).transition_dist (MultivariateNormal) – A joint distribution factor over a pair of successive time steps. This should have batch_shape broadcastable to
self.batch_shape + (num_steps,). This should have event_shape(hidden_dim + hidden_dim,)(old+new).observation_dist (MultivariateNormal) – A joint distribution factor over a hidden and an observed state. This should have batch_shape broadcastable to
self.batch_shape + (num_steps,). This should have event_shape(hidden_dim + obs_dim,).
- arg_constraints = {}
- property support
GaussianScaleMixture
- class GaussianScaleMixture(coord_scale, component_logits, component_scale)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableMixture of Normal distributions with zero mean and diagonal covariance matrices.
That is, this distribution is a mixture with K components, where each component distribution is a D-dimensional Normal distribution with zero mean and a D-dimensional diagonal covariance matrix. The K different covariance matrices are controlled by the parameters coord_scale and component_scale. That is, the covariance matrix of the k’th component is given by
Sigma_ii = (component_scale_k * coord_scale_i) ** 2 (i = 1, …, D)
where component_scale_k is a positive scale factor and coord_scale_i are positive scale parameters shared between all K components. The mixture weights are controlled by a K-dimensional vector of softmax logits, component_logits. This distribution implements pathwise derivatives for samples from the distribution. This distribution does not currently support batched parameters.
See reference [1] for details on the implementations of the pathwise derivative. Please consider citing this reference if you use the pathwise derivative in your research.
[1] Pathwise Derivatives for Multivariate Distributions, Martin Jankowiak & Theofanis Karaletsos. arXiv:1806.01856
Note that this distribution supports both even and odd dimensions, but the former should be more a bit higher precision, since it doesn’t use any erfs in the backward call. Also note that this distribution does not support D = 1.
- Parameters
coord_scale (torch.tensor) – D-dimensional vector of scales
component_logits (torch.tensor) – K-dimensional vector of logits
component_scale (torch.tensor) – K-dimensional vector of scale multipliers
- arg_constraints = {'component_logits': Real(), 'component_scale': GreaterThan(lower_bound=0.0), 'coord_scale': GreaterThan(lower_bound=0.0)}
- has_rsample = True
GroupedNormalNormal
- class GroupedNormalNormal(prior_loc, prior_scale, obs_scale, group_idx, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableThis likelihood, which operates on groups of real-valued scalar observations, is obtained by integrating out a latent mean for each group. Both the prior on each latent mean as well as the observation likelihood for each data point are univariate Normal distributions. The prior means are controlled by prior_loc and prior_scale. The observation noise of the Normal likelihood is controlled by obs_scale, which is allowed to vary from observation to observation. The tensor of indices group_idx connects each observation to one of the groups specified by prior_loc and prior_scale.
See e.g. Eqn. (55) in ref. [1] for relevant expressions in a simpler case with scalar obs_scale.
Example:
>>> num_groups = 3 >>> num_data = 4 >>> prior_loc = torch.randn(num_groups) >>> prior_scale = torch.rand(num_groups) >>> obs_scale = torch.rand(num_data) >>> group_idx = torch.tensor([1, 0, 2, 1]).long() >>> values = torch.randn(num_data) >>> gnn = GroupedNormalNormal(prior_loc, prior_scale, obs_scale, group_idx) >>> assert gnn.log_prob(values).shape == ()
References: [1] “Conjugate Bayesian analysis of the Gaussian distribution,” Kevin P. Murphy.
- Parameters
prior_loc (torch.Tensor) – Tensor of shape (num_groups,) specifying the prior mean of the latent of each group.
prior_scale (torch.Tensor) – Tensor of shape (num_groups,) specifying the prior scale of the latent of each group.
obs_scale (torch.Tensor) – Tensor of shape (num_data,) specifying the scale of the observation noise of each observation.
group_idx (torch.LongTensor) – Tensor of indices of shape (num_data,) linking each observation to one of the num_groups groups that are specified in prior_loc and prior_scale.
- arg_constraints = {'obs_scale': GreaterThan(lower_bound=0.0), 'prior_loc': Real(), 'prior_scale': GreaterThan(lower_bound=0.0)}
- get_posterior(value)[source]
Get a pyro.distributions.Normal distribution that encodes the posterior distribution over the vector of latents specified by prior_loc and prior_scale conditioned on the observed data specified by value.
- support = Real()
ImproperUniform
- class ImproperUniform(support, batch_shape, event_shape)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableImproper distribution with zero
log_prob()and undefinedsample().This is useful for transforming a model from generative dag form to factor graph form for use in HMC. For example the following are equal in distribution:
# Version 1. a generative dag x = pyro.sample("x", Normal(0, 1)) y = pyro.sample("y", Normal(x, 1)) z = pyro.sample("z", Normal(y, 1)) # Version 2. a factor graph xyz = pyro.sample("xyz", ImproperUniform(constraints.real, (), (3,))) x, y, z = xyz.unbind(-1) pyro.sample("x", Normal(0, 1), obs=x) pyro.sample("y", Normal(x, 1), obs=y) pyro.sample("z", Normal(y, 1), obs=z)
Note this distribution errors when
sample()is called. To create a similar distribution that instead samples from a specified distribution consider using.mask(False)as in:xyz = dist.Normal(0, 1).expand([3]).to_event(1).mask(False)
- Parameters
support (Constraint) – The support of the distribution.
batch_shape (torch.Size) – The batch shape.
event_shape (torch.Size) – The event shape.
- arg_constraints = {}
- property support
IndependentHMM
- class IndependentHMM(base_dist)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableWrapper class to treat a batch of independent univariate HMMs as a single multivariate distribution. This converts distribution shapes as follows:
.batch_shape
.event_shape
base_dist
shape + (obs_dim,)
(duration, 1)
result
shape
(duration, obs_dim)
- Parameters
base_dist (HiddenMarkovModel) – A base hidden Markov model instance.
- arg_constraints = {}
- property duration
- property has_rsample
- property support
InverseGamma
- class InverseGamma(concentration, rate, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableCreates an inverse-gamma distribution parameterized by concentration and rate.
X ~ Gamma(concentration, rate) Y = 1/X ~ InverseGamma(concentration, rate)
- Parameters
concentration (torch.Tensor) – the concentration parameter (i.e. alpha).
rate (torch.Tensor) – the rate parameter (i.e. beta).
- arg_constraints: Dict[str, torch.distributions.constraints.Constraint] = {'concentration': GreaterThan(lower_bound=0.0), 'rate': GreaterThan(lower_bound=0.0)}
- property concentration
- has_rsample = True
- property rate
- support = GreaterThan(lower_bound=0.0)
LinearHMM
- class LinearHMM(initial_dist, transition_matrix, transition_dist, observation_matrix, observation_dist, validate_args=None, duration=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableHidden Markov Model with linear dynamics and observations and arbitrary noise for initial, transition, and observation distributions. Each of those distributions can be e.g.
MultivariateNormalorIndependentofNormal,StudentT, orStable. Additionally the observation distribution may be constrained, e.g.LogNormalThis corresponds to the generative model:
z = initial_distribution.sample() x = [] for t in range(num_events): z = z @ transition_matrix + transition_dist.sample() y = z @ observation_matrix + obs_base_dist.sample() x.append(obs_transform(y))
where
observation_distis split intoobs_base_distand an optionalobs_transform(defaulting to the identity).This implements a reparameterized
rsample()method but does not implement alog_prob()method. Derived classes may implementlog_prob().Inference without
log_prob()can be performed using either reparameterization withLinearHMMReparamor likelihood-free algorithms such asEnergyDistance. Note that while stable processes generally require a common shared stability parameter \(\alpha\) , this distribution and the above inference algorithms allow heterogeneous stability parameters.The event_shape of this distribution includes time on the left:
event_shape = (num_steps,) + observation_dist.event_shape
This distribution supports any combination of homogeneous/heterogeneous time dependency of
transition_distandobservation_dist. However at least one of the distributions or matrices must be expanded to contain the time dimension.- Variables
- Parameters
initial_dist – A distribution over initial states. This should have batch_shape broadcastable to
self.batch_shape. This should have event_shape(hidden_dim,).transition_matrix (Tensor) – A linear transformation of hidden state. This should have shape broadcastable to
self.batch_shape + (num_steps, hidden_dim, hidden_dim)where the rightmost dims are ordered(old, new).transition_dist – A distribution over process noise. This should have batch_shape broadcastable to
self.batch_shape + (num_steps,). This should have event_shape(hidden_dim,).observation_matrix (Tensor) – A linear transformation from hidden to observed state. This should have shape broadcastable to
self.batch_shape + (num_steps, hidden_dim, obs_dim).observation_dist – A observation noise distribution. This should have batch_shape broadcastable to
self.batch_shape + (num_steps,). This should have event_shape(obs_dim,).duration (int) – Optional size of the time axis
event_shape[0]. This is required when sampling from homogeneous HMMs whose parameters are not expanded along the time axis.
- arg_constraints = {}
- has_rsample = True
- property support
LKJ
- class LKJ(dim, concentration=1.0, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableLKJ distribution for correlation matrices. The distribution is controlled by
concentrationparameter \(\eta\) to make the probability of the correlation matrix \(M\) propotional to \(\det(M)^{\eta - 1}\). Because of that, whenconcentration == 1, we have a uniform distribution over correlation matrices.When
concentration > 1, the distribution favors samples with large large determinent. This is useful when we know a priori that the underlying variables are not correlated. Whenconcentration < 1, the distribution favors samples with small determinent. This is useful when we know a priori that some underlying variables are correlated.- Parameters
dimension (int) – dimension of the matrices
concentration (ndarray) – concentration/shape parameter of the distribution (often referred to as eta)
References
[1] Generating random correlation matrices based on vines and extended onion method, Daniel Lewandowski, Dorota Kurowicka, Harry Joe
- arg_constraints: Dict[str, torch.distributions.constraints.Constraint] = {'concentration': GreaterThan(lower_bound=0.0)}
- property mean
- support = CorrMatrix()
LKJCorrCholesky
- class LKJCorrCholesky(d, eta, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,Callable
LogNormalNegativeBinomial
- class LogNormalNegativeBinomial(total_count, logits, multiplicative_noise_scale, *, num_quad_points=8, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableA three-parameter generalization of the Negative Binomial distribution [1]. It can be understood as a continuous mixture of Negative Binomial distributions in which we inject Normally-distributed noise into the logits of the Negative Binomial distribution:
\[\begin{split}\begin{eqnarray} &\rm{LNNB}(y | \rm{total\_count}=\nu, \rm{logits}=\ell, \rm{multiplicative\_noise\_scale}=sigma) = \\ &\int d\epsilon \mathcal{N}(\epsilon | 0, \sigma) \rm{NB}(y | \rm{total\_count}=\nu, \rm{logits}=\ell + \epsilon) \end{eqnarray}\end{split}\]where \(y \ge 0\) is a non-negative integer. Thus while a Negative Binomial distribution can be formulated as a Poisson distribution with a Gamma-distributed rate, this distribution adds an additional level of variability by also modulating the rate by Log Normally-distributed multiplicative noise.
This distribution has a mean given by
\[\mathbb{E}[y] = \nu e^{\ell} = e^{\ell + \log \nu + \tfrac{1}{2}\sigma^2}\]and a variance given by
\[\rm{Var}[y] = \mathbb{E}[y] + \left( e^{\sigma^2} (1 + 1/\nu) - 1 \right) \left( \mathbb{E}[y] \right)^2\]Thus while a given mean and variance together uniquely characterize a Negative Binomial distribution, there is a one-dimensional family of Log Normal Negative Binomial distributions with a given mean and variance.
Note that in some applications it may be useful to parameterize the logits as
\[\ell = \ell^\prime - \log \nu - \tfrac{1}{2}\sigma^2\]so that the mean is given by \(\mathbb{E}[y] = e^{\ell^\prime}\) and does not depend on \(\nu\) and \(\sigma\), which serve to determine the higher moments.
References:
[1] “Lognormal and Gamma Mixed Negative Binomial Regression,” Mingyuan Zhou, Lingbo Li, David Dunson, and Lawrence Carin.
- Parameters
total_count (float or torch.Tensor) – non-negative number of negative Bernoulli trials. The variance decreases as total_count increases.
logits (torch.Tensor) – Event log-odds for probabilities of success for underlying Negative Binomial distribution.
multiplicative_noise_scale (torch.Tensor) – Controls the level of the injected Normal logit noise.
num_quad_points (int) – Number of quadrature points used to compute the (approximate) log_prob. Defaults to 8.
- arg_constraints = {'logits': Real(), 'multiplicative_noise_scale': GreaterThan(lower_bound=0.0), 'total_count': GreaterThanEq(lower_bound=0)}
- property mean
- support = IntegerGreaterThan(lower_bound=0)
- property variance
Logistic
- class Logistic(loc, scale, *, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableLogistic distribution.
This is a smooth distribution with symmetric asymptotically exponential tails and a concave log density. For standard
loc=0,scale=1, the density is given by\[p(x) = \frac {e^{-x}} {(1 + e^{-x})^2}\]Like the
Laplacedensity, this density has the heaviest possible tails (asymptotically) while still being log-convex. Unlike theLaplacedistribution, this distribution is infinitely differentiable everywhere, and is thus suitable for constructing Laplace approximations.- Parameters
loc – Location parameter.
scale – Scale parameter.
- arg_constraints = {'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}
- has_rsample = True
- property mean
- support = Real()
- property variance
MaskedDistribution
- class MaskedDistribution(base_dist, mask)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableMasks a distribution by a boolean tensor that is broadcastable to the distribution’s
batch_shape.In the special case
mask is False, computation oflog_prob(),score_parts(), andkl_divergence()is skipped, and constant zero values are returned instead.- Parameters
mask (torch.Tensor or bool) – A boolean or boolean-valued tensor.
- arg_constraints = {}
- property has_enumerate_support
- property has_rsample
- property mean
- property support
- property variance
MaskedMixture
- class MaskedMixture(mask, component0, component1, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableA masked deterministic mixture of two distributions.
This is useful when the mask is sampled from another distribution, possibly correlated across the batch. Often the mask can be marginalized out via enumeration.
Example:
change_point = pyro.sample("change_point", dist.Categorical(torch.ones(len(data) + 1)), infer={'enumerate': 'parallel'}) mask = torch.arange(len(data), dtype=torch.long) >= changepoint with pyro.plate("data", len(data)): pyro.sample("obs", MaskedMixture(mask, dist1, dist2), obs=data)
- Parameters
mask (torch.Tensor) – A boolean tensor toggling between
component0andcomponent1.component0 (pyro.distributions.TorchDistribution) – a distribution for batch elements
mask == False.component1 (pyro.distributions.TorchDistribution) – a distribution for batch elements
mask == True.
- arg_constraints = {}
- property has_rsample
- property mean
- property support
- property variance
MixtureOfDiagNormals
- class MixtureOfDiagNormals(locs, coord_scale, component_logits)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableMixture of Normal distributions with arbitrary means and arbitrary diagonal covariance matrices.
That is, this distribution is a mixture with K components, where each component distribution is a D-dimensional Normal distribution with a D-dimensional mean parameter and a D-dimensional diagonal covariance matrix. The K different component means are gathered into the K x D dimensional parameter locs and the K different scale parameters are gathered into the K x D dimensional parameter coord_scale. The mixture weights are controlled by a K-dimensional vector of softmax logits, component_logits. This distribution implements pathwise derivatives for samples from the distribution.
See reference [1] for details on the implementations of the pathwise derivative. Please consider citing this reference if you use the pathwise derivative in your research. Note that this distribution does not support dimension D = 1.
[1] Pathwise Derivatives for Multivariate Distributions, Martin Jankowiak & Theofanis Karaletsos. arXiv:1806.01856
- Parameters
locs (torch.Tensor) – K x D mean matrix
coord_scale (torch.Tensor) – K x D scale matrix
component_logits (torch.Tensor) – K-dimensional vector of softmax logits
- arg_constraints = {'component_logits': Real(), 'coord_scale': GreaterThan(lower_bound=0.0), 'locs': Real()}
- has_rsample = True
- support = IndependentConstraint(Real(), 1)
MultivariateStudentT
- class MultivariateStudentT(df, loc, scale_tril, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableCreates a multivariate Student’s t-distribution parameterized by degree of freedom
df, meanlocand scalescale_tril.- Parameters
- arg_constraints = {'df': GreaterThan(lower_bound=0.0), 'loc': IndependentConstraint(Real(), 1), 'scale_tril': LowerCholesky()}
- property covariance_matrix
- has_rsample = True
- property mean
- property precision_matrix
- property scale_tril
- support = IndependentConstraint(Real(), 1)
- property variance
NanMaskedNormal
- class NanMaskedNormal(loc, scale, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableWrapper around
Normalto allow partially observed data as specified by NAN elements inlog_prob(); thelog_probof these elements will be zero. This is useful for likelihoods with missing data.Example:
from math import nan data = torch.tensor([0.5, 0.1, nan, 0.9]) with pyro.plate("data", len(data)): pyro.sample("obs", NanMaskedNormal(0, 1), obs=data)
- log_prob(value: torch.Tensor) torch.Tensor[source]
NanMaskedMultivariateNormal
- class NanMaskedMultivariateNormal(loc, covariance_matrix=None, precision_matrix=None, scale_tril=None, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableWrapper around
MultivariateNormalto allow partially observed data as specified by NAN elements in the argument tolog_prob(). Thelog_probof these events will marginalize over the NAN elements. This is useful for likelihoods with missing data.Example:
from math import nan data = torch.tensor([ [0.1, 0.2, 3.4], [0.5, 0.1, nan], [0.6, nan, nan], [nan, 0.5, nan], [nan, nan, nan], ]) with pyro.plate("data", len(data)): pyro.sample( "obs", NanMaskedMultivariateNormal(torch.zeros(3), torch.eye(3)), obs=data, )
- log_prob(value: torch.Tensor) torch.Tensor[source]
OMTMultivariateNormal
- class OMTMultivariateNormal(loc, scale_tril)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableMultivariate normal (Gaussian) distribution with OMT gradients w.r.t. both parameters. Note the gradient computation w.r.t. the Cholesky factor has cost O(D^3), although the resulting gradient variance is generally expected to be lower.
A distribution over vectors in which all the elements have a joint Gaussian density.
- Parameters
loc (torch.Tensor) – Mean.
scale_tril (torch.Tensor) – Cholesky of Covariance matrix.
- arg_constraints = {'loc': Real(), 'scale_tril': LowerTriangular()}
OneOneMatching
- class OneOneMatching(logits, *, bp_iters=None, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableRandom perfect matching from
Nsources toNdestinations where each source matches exactly one destination and each destination matches exactly one source.Samples are represented as long tensors of shape
(N,)taking values in{0,...,N-1}and satisfying the above one-one constraint. The log probability of a samplevis the sum of edge logits, up to the log partition functionlog Z:\[\log p(v) = \sum_s \text{logits}[s, v[s]] - \log Z\]Exact computations are expensive. To enable tractable approximations, set a number of belief propagation iterations via the
bp_itersargument. Thelog_partition_function()andlog_prob()methods use a Bethe approximation [1,2,3,4].References:
- [1] Michael Chertkov, Lukas Kroc, Massimo Vergassola (2008)
“Belief propagation and beyond for particle tracking” https://arxiv.org/pdf/0806.1199.pdf
- [2] Bert Huang, Tony Jebara (2009)
“Approximating the Permanent with Belief Propagation” https://arxiv.org/pdf/0908.1769.pdf
- [3] Pascal O. Vontobel (2012)
“The Bethe Permanent of a Non-Negative Matrix” https://arxiv.org/pdf/1107.4196.pdf
- [4] M Chertkov, AB Yedidia (2013)
“Approximating the permanent with fractional belief propagation” http://www.jmlr.org/papers/volume14/chertkov13a/chertkov13a.pdf
- Parameters
logits (Tensor) – An
(N, N)-shaped tensor of edge logits.bp_iters (int) – Optional number of belief propagation iterations. If unspecified or
Noneexpensive exact algorithms will be used.
- arg_constraints = {'logits': Real()}
- has_enumerate_support = True
- property log_partition_function
- property support
OneTwoMatching
- class OneTwoMatching(logits, *, bp_iters=None, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableRandom matching from
2*Nsources toNdestinations where each source matches exactly one destination and each destination matches exactly two sources.Samples are represented as long tensors of shape
(2*N,)taking values in{0,...,N-1}and satisfying the above one-two constraint. The log probability of a samplevis the sum of edge logits, up to the log partition functionlog Z:\[\log p(v) = \sum_s \text{logits}[s, v[s]] - \log Z\]Exact computations are expensive. To enable tractable approximations, set a number of belief propagation iterations via the
bp_itersargument. Thelog_partition_function()andlog_prob()methods use a Bethe approximation [1,2,3,4].References:
- [1] Michael Chertkov, Lukas Kroc, Massimo Vergassola (2008)
“Belief propagation and beyond for particle tracking” https://arxiv.org/pdf/0806.1199.pdf
- [2] Bert Huang, Tony Jebara (2009)
“Approximating the Permanent with Belief Propagation” https://arxiv.org/pdf/0908.1769.pdf
- [3] Pascal O. Vontobel (2012)
“The Bethe Permanent of a Non-Negative Matrix” https://arxiv.org/pdf/1107.4196.pdf
- [4] M Chertkov, AB Yedidia (2013)
“Approximating the permanent with fractional belief propagation” http://www.jmlr.org/papers/volume14/chertkov13a/chertkov13a.pdf
- Parameters
logits (Tensor) – An
(2 * N, N)-shaped tensor of edge logits.bp_iters (int) – Optional number of belief propagation iterations. If unspecified or
Noneexpensive exact algorithms will be used.
- arg_constraints = {'logits': Real()}
- has_enumerate_support = True
- property log_partition_function
- property support
OrderedLogistic
- class OrderedLogistic(predictor, cutpoints, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableAlternative parametrization of the distribution over a categorical variable.
Instead of the typical parametrization of a categorical variable in terms of the probability mass of the individual categories
p, this provides an alternative that is useful in specifying ordered categorical models. This accepts a vector ofcutpointswhich are an ordered vector of real numbers denoting baseline cumulative log-odds of the individual categories, and a model vectorpredictorwhich modifies the baselines for each sample individually.These cumulative log-odds are then transformed into a discrete cumulative probability distribution, that is finally differenced to return the probability mass matrix
pthat specifies the categorical distribution.- Parameters
predictor (Tensor) – A tensor of predictor variables of arbitrary shape. The output shape of non-batched samples from this distribution will be the same shape as
predictor.cutpoints (Tensor) – A tensor of cutpoints that are used to determine the cumulative probability of each entry in
predictorbelonging to a given category. The first cutpoints.ndim-1 dimensions must be broadcastable topredictor, and the -1 dimension is monotonically increasing.
- arg_constraints = {'cutpoints': OrderedVector(), 'predictor': Real()}
ProjectedNormal
- class ProjectedNormal(concentration, *, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableProjected isotropic normal distribution of arbitrary dimension.
This distribution over directional data is qualitatively similar to the von Mises and von Mises-Fisher distributions, but permits tractable variational inference via reparametrized gradients.
To use this distribution with autoguides, use
poutine.reparamwith aProjectedNormalReparamreparametrizer in the model, e.g.:@poutine.reparam(config={"direction": ProjectedNormalReparam()}) def model(): direction = pyro.sample("direction", ProjectedNormal(torch.zeros(3))) ...
or simply wrap in
MinimalReparamorAutoReparam, e.g.:@MinimalReparam() def model(): ...
Note
This implements
log_prob()only for dimensions {2,3}.- [1] D. Hernandez-Stumpfhauser, F.J. Breidt, M.J. van der Woerd (2017)
“The General Projected Normal Distribution of Arbitrary Dimension: Modeling and Bayesian Inference” https://projecteuclid.org/euclid.ba/1453211962
- Parameters
concentration (torch.Tensor) – A combined location-and-concentration vector. The direction of this vector is the location, and its magnitude is the concentration.
- arg_constraints = {'concentration': IndependentConstraint(Real(), 1)}
- has_rsample = True
- property mean
Note this is the mean in the sense of a centroid in the submanifold that minimizes expected squared geodesic distance.
- property mode
- support = Sphere
RelaxedBernoulliStraightThrough
- class RelaxedBernoulliStraightThrough(temperature, probs=None, logits=None, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableAn implementation of
RelaxedBernoulliwith a straight-through gradient estimator.This distribution has the following properties:
The samples returned by the
rsample()method are discrete/quantized.The
log_prob()method returns the log probability of the relaxed/unquantized sample using the GumbelSoftmax distribution.In the backward pass the gradient of the sample with respect to the parameters of the distribution uses the relaxed/unquantized sample.
References:
- [1] The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables,
Chris J. Maddison, Andriy Mnih, Yee Whye Teh
- [2] Categorical Reparameterization with Gumbel-Softmax,
Eric Jang, Shixiang Gu, Ben Poole
RelaxedOneHotCategoricalStraightThrough
- class RelaxedOneHotCategoricalStraightThrough(temperature, probs=None, logits=None, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableAn implementation of
RelaxedOneHotCategoricalwith a straight-through gradient estimator.This distribution has the following properties:
The samples returned by the
rsample()method are discrete/quantized.The
log_prob()method returns the log probability of the relaxed/unquantized sample using the GumbelSoftmax distribution.In the backward pass the gradient of the sample with respect to the parameters of the distribution uses the relaxed/unquantized sample.
References:
- [1] The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables,
Chris J. Maddison, Andriy Mnih, Yee Whye Teh
- [2] Categorical Reparameterization with Gumbel-Softmax,
Eric Jang, Shixiang Gu, Ben Poole
Rejector
- class Rejector(propose, log_prob_accept, log_scale, *, batch_shape=None, event_shape=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableRejection sampled distribution given an acceptance rate function.
- Parameters
propose (Distribution) – A proposal distribution that samples batched proposals via
propose().rsample()supports asample_shapearg only ifpropose()supports asample_shapearg.log_prob_accept (callable) – A callable that inputs a batch of proposals and returns a batch of log acceptance probabilities.
log_scale – Total log probability of acceptance.
- arg_constraints = {}
- has_rsample = True
SineBivariateVonMises
- class SineBivariateVonMises(phi_loc, psi_loc, phi_concentration, psi_concentration, correlation=None, weighted_correlation=None, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableUnimodal distribution of two dependent angles on the 2-torus (S^1 ⨂ S^1) given by
\[C^{-1}\exp(\kappa_1\cos(x-\mu_1) + \kappa_2\cos(x_2 -\mu_2) + \rho\sin(x_1 - \mu_1)\sin(x_2 - \mu_2))\]and
\[C = (2\pi)^2 \sum_{i=0} {2i \choose i} \left(\frac{\rho^2}{4\kappa_1\kappa_2}\right)^i I_i(\kappa_1)I_i(\kappa_2),\]where I_i(cdot) is the modified bessel function of first kind, mu’s are the locations of the distribution, kappa’s are the concentration and rho gives the correlation between angles x_1 and x_2.
This distribution is a submodel of the Bivariate von Mises distribution, called the Sine Distribution [2] in directional statistics.
This distribution is helpful for modeling coupled angles such as torsion angles in peptide chains. To infer parameters, use
NUTSorHMCwith priors that avoid parameterizations where the distribution becomes bimodal; see note below.Note
Sample efficiency drops as
\[\frac{\rho^2}{\kappa_1\kappa_2} \rightarrow 1\]because the distribution becomes increasingly bimodal. To avoid inefficient sampling use the weighted_correlation parameter with a skew away from one (e.g., TransformedDistribution(Beta(5,5), AffineTransform(loc=-1, scale=2))). The weighted_correlation should be in [-1,1].
Note
The correlation and weighted_correlation params are mutually exclusive.
Note
In the context of
SVI, this distribution can be used as a likelihood but not for latent variables.Note
Normalization remains accurate up to concentrations of 10,000.
- ** References: **
Probabilistic model for two dependent circular variables Singh, H., Hnizdo, V., and Demchuck, E. (2002)
Protein Bioinformatics and Mixtures of Bivariate von Mises Distributions for Angular Data, Mardia, K. V, Taylor, T. C., and Subramaniam, G. (2007)
- Parameters
phi_loc (torch.Tensor) – location of first angle
psi_loc (torch.Tensor) – location of second angle
phi_concentration (torch.Tensor) – concentration of first angle
psi_concentration (torch.Tensor) – concentration of second angle
correlation (torch.Tensor) – correlation between the two angles
weighted_correlation (torch.Tensor) – set correlation to weighted_corr * sqrt(phi_conc*psi_conc) to avoid bimodality (see note). The weighted_correlation should be in [-1,1].
- arg_constraints = {'correlation': Real(), 'phi_concentration': GreaterThan(lower_bound=0.0), 'phi_loc': Real(), 'psi_concentration': GreaterThan(lower_bound=0.0), 'psi_loc': Real()}
- max_sample_iter = 1000
- property mean
- property norm_const
- sample(sample_shape=torch.Size([]))[source]
- ** References: **
A New Unified Approach for the Simulation of aWide Class of Directional Distributions John T. Kent, Asaad M. Ganeiber & Kanti V. Mardia (2018)
- support = IndependentConstraint(Real(), 1)
SineSkewed
- class SineSkewed(base_dist: pyro.distributions.torch_distribution.TorchDistribution, skewness, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableSine Skewing [1] is a procedure for producing a distribution that breaks pointwise symmetry on a torus distribution. The new distribution is called the Sine Skewed X distribution, where X is the name of the (symmetric) base distribution.
Torus distributions are distributions with support on products of circles (i.e., ⨂^d S^1 where S^1=[-pi,pi) ). So, a 0-torus is a point, the 1-torus is a circle, and the 2-torus is commonly associated with the donut shape.
The Sine Skewed X distribution is parameterized by a weight parameter for each dimension of the event of X. For example with a von Mises distribution over a circle (1-torus), the Sine Skewed von Mises Distribution has one skew parameter. The skewness parameters can be inferred using
HMCorNUTS. For example, the following will produce a uniform prior over skewness for the 2-torus,:def model(obs): # Sine priors phi_loc = pyro.sample('phi_loc', VonMises(pi, 2.)) psi_loc = pyro.sample('psi_loc', VonMises(-pi / 2, 2.)) phi_conc = pyro.sample('phi_conc', Beta(halpha_phi, beta_prec_phi - halpha_phi)) psi_conc = pyro.sample('psi_conc', Beta(halpha_psi, beta_prec_psi - halpha_psi)) corr_scale = pyro.sample('corr_scale', Beta(2., 5.)) # SS prior skew_phi = pyro.sample('skew_phi', Uniform(-1., 1.)) psi_bound = 1 - skew_phi.abs() skew_psi = pyro.sample('skew_psi', Uniform(-1., 1.)) skewness = torch.stack((skew_phi, psi_bound * skew_psi), dim=-1) assert skewness.shape == (num_mix_comp, 2) with pyro.plate('obs_plate'): sine = SineBivariateVonMises(phi_loc=phi_loc, psi_loc=psi_loc, phi_concentration=1000 * phi_conc, psi_concentration=1000 * psi_conc, weighted_correlation=corr_scale) return pyro.sample('phi_psi', SineSkewed(sine, skewness), obs=obs)
To ensure the skewing does not alter the normalization constant of the (Sine Bivaraite von Mises) base distribution the skewness parameters are constraint. The constraint requires the sum of the absolute values of skewness to be less than or equal to one. So for the above snippet it must hold that:
skew_phi.abs()+skew_psi.abs() <= 1
We handle this in the prior by computing psi_bound and use it to scale skew_psi. We do not use psi_bound as:
skew_psi = pyro.sample('skew_psi', Uniform(-psi_bound, psi_bound))
as it would make the support for the Uniform distribution dynamic.
In the context of
SVI, this distribution can freely be used as a likelihood, but use as latent variables it will lead to slow inference for 2 and higher dim toruses. This is because the base_dist cannot be reparameterized.Note
An event in the base distribution must be on a d-torus, so the event_shape must be (d,).
Note
For the skewness parameter, it must hold that the sum of the absolute value of its weights for an event must be less than or equal to one. See eq. 2.1 in [1].
- ** References: **
Sine-skewed toroidal distributions and their application in protein bioinformatics Ameijeiras-Alonso, J., Ley, C. (2019)
- Parameters
base_dist (torch.distributions.Distribution) – base density on a d-dimensional torus. Supported base distributions include: 1D
VonMises,SineBivariateVonMises, 1DProjectedNormal, andUniform(-pi, pi).skewness (torch.tensor) – skewness of the distribution.
- arg_constraints = {'skewness': IndependentConstraint(Interval(lower_bound=-1.0, upper_bound=1.0), 1)}
- support = IndependentConstraint(Real(), 1)
SkewLogistic
- class SkewLogistic(loc, scale, asymmetry=1.0, *, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableSkewed generalization of the Logistic distribution (Type I in [1]).
This is a smooth distribution with asymptotically exponential tails and a concave log density. For standard
loc=0,scale=1,asymmetry=αthe density is given by\[p(x;\alpha) = \frac {\alpha e^{-x}} {(1 + e^{-x})^{\alpha+1}}\]Like the
AsymmetricLaplacedensity, this density has the heaviest possible tails (asymptotically) while still being log-convex. Unlike theAsymmetricLaplacedistribution, this distribution is infinitely differentiable everywhere, and is thus suitable for constructing Laplace approximations.References
- [1] Generalized logistic distribution
https://en.wikipedia.org/wiki/Generalized_logistic_distribution
- Parameters
loc – Location parameter.
scale – Scale parameter.
asymmetry – Asymmetry parameter (positive). The distribution skews right when
asymmetry > 1and left whenasymmetry < 1. Defaults toasymmetry = 1corresponding to the standard Logistic distribution.
- arg_constraints = {'asymmetry': GreaterThan(lower_bound=0.0), 'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}
- has_rsample = True
- support = Real()
SoftAsymmetricLaplace
- class SoftAsymmetricLaplace(loc, scale, asymmetry=1.0, softness=1.0, *, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableSoft asymmetric version of the
Laplacedistribution.This has a smooth (infinitely differentiable) density with two asymmetric asymptotically exponential tails, one on the left and one on the right. In the limit of
softness → 0, this converges in distribution to theAsymmetricLaplacedistribution.This is equivalent to the sum of three random variables
z - u + vwhere:z ~ Normal(loc, scale * softness) u ~ Exponential(1 / (scale * asymmetry)) v ~ Exponential(asymetry / scale)
This is also equivalent the sum of two random variables
z + awhere:z ~ Normal(loc, scale * softness) a ~ AsymmetricLaplace(0, scale, asymmetry)
- Parameters
loc – Location parameter, i.e. the mode.
scale – Scale parameter = geometric mean of left and right scales.
asymmetry – Square of ratio of left to right scales. Defaults to 1.
softness – Scale parameter of the Gaussian smoother. Defaults to 1.
- arg_constraints = {'asymmetry': GreaterThan(lower_bound=0.0), 'loc': Real(), 'scale': GreaterThan(lower_bound=0.0), 'softness': GreaterThan(lower_bound=0.0)}
- has_rsample = True
- property left_scale
- property mean
- property right_scale
- property soft_scale
- support = Real()
- property variance
SoftLaplace
- class SoftLaplace(loc, scale, *, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableSmooth distribution with Laplace-like tail behavior.
This distribution corresponds to the log-convex density:
z = (value - loc) / scale log_prob = log(2 / pi) - log(scale) - logaddexp(z, -z)
Like the Laplace density, this density has the heaviest possible tails (asymptotically) while still being log-convex. Unlike the Laplace distribution, this distribution is infinitely differentiable everywhere, and is thus suitable for constructing Laplace approximations.
- Parameters
loc – Location parameter.
scale – Scale parameter.
- arg_constraints = {'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}
- has_rsample = True
- property mean
- support = Real()
- property variance
SpanningTree
- class SpanningTree(edge_logits, sampler_options=None, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableDistribution over spanning trees on a fixed number
Vof vertices.A tree is represented as
torch.LongTensoredgesof shape(V-1,2)satisfying the following properties:The edges constitute a tree, i.e. are connected and cycle free.
Each edge
(v1,v2) = edges[e]is sorted, i.e.v1 < v2.The entire tensor is sorted in colexicographic order.
Use
validate_edges()to verify edges are correctly formed.The
edge_logitstensor has one entry for each of theV*(V-1)//2edges in the complete graph onVvertices, where edges are each sorted and the edge order is colexicographic:(0,1), (0,2), (1,2), (0,3), (1,3), (2,3), (0,4), (1,4), (2,4), ...
This ordering corresponds to the size-independent pairing function:
k = v1 + v2 * (v2 - 1) // 2
where
kis the rank of the edge(v1,v2)in the complete graph. To convert a matrix of edge logits to the linear representation used here:assert my_matrix.shape == (V, V) i, j = make_complete_graph(V) edge_logits = my_matrix[i, j]
- Parameters
edge_logits (torch.Tensor) – A tensor of length
V*(V-1)//2containing logits (aka negative energies) of all edges in the complete graph onVvertices. See above comment for edge ordering.sampler_options (dict) – An optional dict of sampler options including:
mcmc_stepsdefaulting to a single MCMC step (which is pretty good);initial_edgesdefaulting to a cheap approximate sample;backendone of “python” or “cpp”, defaulting to “python”.
- arg_constraints = {'edge_logits': Real()}
- property edge_mean
Computes marginal probabilities of each edge being active.
Note
This is similar to other distributions’
.mean()method, but with a different shape because this distribution’s values are not encoded as binary matrices.- Returns
A symmetric square
(V,V)-shaped matrix with values in[0,1]denoting the marginal probability of each edge being in a sampled value.- Return type
Tensor
- enumerate_support(expand=True)[source]
This is implemented for trees with up to 6 vertices (and 5 edges).
- has_enumerate_support = True
- property log_partition_function
- property mode
The maximum weight spanning tree. :rtype: Tensor
- Type
returns
- sample(sample_shape=torch.Size([]))[source]
This sampler is implemented using MCMC run for a small number of steps after being initialized by a cheap approximate sampler. This sampler is approximate and cubic time. This is faster than the classic Aldous-Broder sampler [1,2], especially for graphs with large mixing time. Recent research [3,4] proposes samplers that run in sub-matrix-multiply time but are more complex to implement.
References
- [1] Generating random spanning trees
Andrei Broder (1989)
- [2] The Random Walk Construction of Uniform Spanning Trees and Uniform Labelled Trees,
David J. Aldous (1990)
- [3] Sampling Random Spanning Trees Faster than Matrix Multiplication,
David Durfee, Rasmus Kyng, John Peebles, Anup B. Rao, Sushant Sachdeva (2017) https://arxiv.org/abs/1611.07451
- [4] An almost-linear time algorithm for uniform random spanning tree generation,
Aaron Schild (2017) https://arxiv.org/abs/1711.06455
- support = IntegerGreaterThan(lower_bound=0)
- validate_edges(edges)[source]
Validates a batch of
edgestensors, as returned bysample()orenumerate_support()or as input tolog_prob().- Parameters
edges (torch.LongTensor) – A batch of edges.
- Raises
ValueError
- Returns
None
Stable
- class Stable(stability, skew, scale=1.0, loc=0.0, coords='S0', validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableLevy \(\alpha\)-stable distribution. See [1] for a review.
This uses Nolan’s parametrization [2] of the
locparameter, which is required for continuity and differentiability. This corresponds to the notation \(S^0_\alpha(\beta,\sigma,\mu_0)\) of [1], where \(\alpha\) = stability, \(\beta\) = skew, \(\sigma\) = scale, and \(\mu_0\) = loc. To instead use the S parameterization as in scipy, passcoords="S", but BEWARE this is discontinuous atstability=1and has poor geometry for inference.This implements a reparametrized sampler
rsample(), and a relatively expensivelog_prob()calculation by numerical integration which makes inference slow (compared to other distributions) , but with better convergence properties especially for \(\alpha\)-stable distributions that are skewed (see theskewparameter below). Faster inference can be performed using either likelihood-free algorithms such asEnergyDistance, or reparameterization via thereparam()handler with one of the reparameterizersLatentStableReparam,SymmetricStableReparam, orStableReparame.g.:with poutine.reparam(config={"x": StableReparam()}): pyro.sample("x", Stable(stability, skew, scale, loc))
or simply wrap in
MinimalReparamorAutoReparam, e.g.:@MinimalReparam() def model(): ...
- [1] S. Borak, W. Hardle, R. Weron (2005).
Stable distributions. https://edoc.hu-berlin.de/bitstream/handle/18452/4526/8.pdf
- [2] J.P. Nolan (1997).
Numerical calculation of stable densities and distribution functions.
- [3] Rafal Weron (1996).
On the Chambers-Mallows-Stuck Method for Simulating Skewed Stable Random Variables.
- [4] J.P. Nolan (2017).
Stable Distributions: Models for Heavy Tailed Data. https://edspace.american.edu/jpnolan/wp-content/uploads/sites/1720/2020/09/Chap1.pdf
- Parameters
stability (Tensor) – Levy stability parameter \(\alpha\in(0,2]\) .
skew (Tensor) – Skewness \(\beta\in[-1,1]\) .
scale (Tensor) – Scale \(\sigma > 0\) . Defaults to 1.
loc (Tensor) – Location \(\mu_0\) when using Nolan’s S0 parametrization [2], or \(\mu\) when using the S parameterization. Defaults to 0.
coords (str) – Either “S0” (default) to use Nolan’s continuous S0 parametrization, or “S” to use the discontinuous parameterization.
- arg_constraints = {'loc': Real(), 'scale': GreaterThan(lower_bound=0.0), 'skew': Interval(lower_bound=-1, upper_bound=1), 'stability': Interval(lower_bound=0, upper_bound=2)}
- has_rsample = True
- log_prob(value)[source]
Implemented by numerical integration that is based on the algorithm proposed by Chambers, Mallows and Stuck (CMS) for simulating the Levy \(\alpha\)-stable distribution. The CMS algorithm involves a nonlinear transformation of two independent random variables into one stable random variable. The first random variable is uniformly distributed while the second is exponentially distributed. The numerical integration is performed over the first uniformly distributed random variable.
- property mean
- support = Real()
- property variance
StableWithLogProb
- class StableWithLogProb(stability, skew, scale=1.0, loc=0.0, coords='S0', validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableSame as
Stablebut will not undergo reparameterization byMinimalReparamand will fail reparametrization byLatentStableReparam,SymmetricStableReparam, orStableReparam.
TruncatedPolyaGamma
- class TruncatedPolyaGamma(prototype, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableThis is a PolyaGamma(1, 0) distribution truncated to have finite support in the interval (0, 2.5). See [1] for details. As a consequence of the truncation the log_prob method is only accurate to about six decimal places. In addition the provided sampler is a rough approximation that is only meant to be used in contexts where sample accuracy is not important (e.g. in initialization). Broadly, this implementation is only intended for usage in cases where good approximations of the log_prob are sufficient, as is the case e.g. in HMC.
- Parameters
prototype (tensor) – A prototype tensor of arbitrary shape used to determine the dtype and device returned by sample and log_prob.
References
- [1] ‘Bayesian inference for logistic models using Polya-Gamma latent variables’
Nicholas G. Polson, James G. Scott, Jesse Windle.
- arg_constraints = {}
- has_rsample = False
- num_gamma_variates = 8
- num_log_prob_terms = 7
- support = Interval(lower_bound=0.0, upper_bound=2.5)
- truncation_point = 2.5
Unit
- class Unit(log_factor, *, has_rsample=None, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableTrivial nonnormalized distribution representing the unit type.
The unit type has a single value with no data, i.e.
value.numel() == 0.This is used for
pyro.factor()statements.- arg_constraints = {'log_factor': Real()}
- support = Real()
VonMises3D
- class VonMises3D(concentration, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableSpherical von Mises distribution.
This implementation combines the direction parameter and concentration parameter into a single combined parameter that contains both direction and magnitude. The
valuearg is represented in cartesian coordinates: it must be a normalized 3-vector that lies on the 2-sphere.See
VonMisesfor a 2D polar coordinate cousin of this distribution. Seeprojected_normalfor a qualitatively similar distribution but implementing more functionality.Currently only
log_prob()is implemented.- Parameters
concentration (torch.Tensor) – A combined location-and-concentration vector. The direction of this vector is the location, and its magnitude is the concentration.
- arg_constraints = {'concentration': Real()}
- support = Sphere
ZeroInflatedDistribution
- class ZeroInflatedDistribution(base_dist, *, gate=None, gate_logits=None, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableGeneric Zero Inflated distribution.
This can be used directly or can be used as a base class as e.g. for
ZeroInflatedPoissonandZeroInflatedNegativeBinomial.- Parameters
base_dist (TorchDistribution) – the base distribution.
gate (torch.Tensor) – probability of extra zeros given via a Bernoulli distribution.
gate_logits (torch.Tensor) – logits of extra zeros given via a Bernoulli distribution.
- arg_constraints = {'gate': Interval(lower_bound=0.0, upper_bound=1.0), 'gate_logits': Real()}
- property gate
- property gate_logits
- property mean
- property support
- property variance
ZeroInflatedNegativeBinomial
- class ZeroInflatedNegativeBinomial(total_count, *, probs=None, logits=None, gate=None, gate_logits=None, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableA Zero Inflated Negative Binomial distribution.
- Parameters
total_count (float or torch.Tensor) – non-negative number of negative Bernoulli trials.
probs (torch.Tensor) – Event probabilities of success in the half open interval [0, 1).
logits (torch.Tensor) – Event log-odds for probabilities of success.
gate (torch.Tensor) – probability of extra zeros.
gate_logits (torch.Tensor) – logits of extra zeros.
- arg_constraints = {'gate': Interval(lower_bound=0.0, upper_bound=1.0), 'gate_logits': Real(), 'logits': Real(), 'probs': HalfOpenInterval(lower_bound=0.0, upper_bound=1.0), 'total_count': GreaterThanEq(lower_bound=0)}
- property logits
- property probs
- support = IntegerGreaterThan(lower_bound=0)
- property total_count
ZeroInflatedPoisson
- class ZeroInflatedPoisson(rate, *, gate=None, gate_logits=None, validate_args=None)[source]
Bases:
pyro.distributions.distribution.Distribution,CallableA Zero Inflated Poisson distribution.
- Parameters
rate (torch.Tensor) – rate of poisson distribution.
gate (torch.Tensor) – probability of extra zeros.
gate_logits (torch.Tensor) – logits of extra zeros.
- arg_constraints = {'gate': Interval(lower_bound=0.0, upper_bound=1.0), 'gate_logits': Real(), 'rate': GreaterThan(lower_bound=0.0)}
- property rate
- support = IntegerGreaterThan(lower_bound=0)
Transforms
ConditionalTransform
CholeskyTransform
- class CholeskyTransform(cache_size=0)[source]
Bases:
torch.distributions.transforms.TransformTransform via the mapping \(y = safe_cholesky(x)\), where x is a positive definite matrix.
- bijective = True
- codomain: torch.distributions.constraints.Constraint = LowerCholesky()
- domain: torch.distributions.constraints.Constraint = PositiveDefinite()
CorrMatrixCholeskyTransform
- class CorrMatrixCholeskyTransform(cache_size=0)[source]
Bases:
pyro.distributions.transforms.cholesky.CholeskyTransformTransform via the mapping \(y = safe_cholesky(x)\), where x is a correlation matrix.
- bijective = True
- codomain: torch.distributions.constraints.Constraint = CorrCholesky()
- domain: torch.distributions.constraints.Constraint = CorrMatrix()
DiscreteCosineTransform
- class DiscreteCosineTransform(dim=- 1, smooth=0.0, cache_size=0)[source]
Bases:
torch.distributions.transforms.TransformDiscrete Cosine Transform of type-II.
This uses
dct()andidct()to compute orthonormal DCT and inverse DCT transforms. The jacobian is 1.- Parameters
dim (int) – Dimension along which to transform. Must be negative. This is an absolute dim counting from the right.
smooth (float) – Smoothing parameter. When 0, this transforms white noise to white noise; when 1 this transforms Brownian noise to to white noise; when -1 this transforms violet noise to white noise; etc. Any real number is allowed. https://en.wikipedia.org/wiki/Colors_of_noise.
- bijective = True
- property codomain
- property domain
ELUTransform
- class ELUTransform(cache_size=0)[source]
Bases:
torch.distributions.transforms.TransformBijective transform via the mapping \(y = \text{ELU}(x)\).
- bijective = True
- codomain: torch.distributions.constraints.Constraint = GreaterThan(lower_bound=0.0)
- domain: torch.distributions.constraints.Constraint = Real()
- sign = 1
HaarTransform
- class HaarTransform(dim=- 1, flip=False, cache_size=0)[source]
Bases:
torch.distributions.transforms.TransformDiscrete Haar transform.
This uses
haar_transform()andinverse_haar_transform()to compute (orthonormal) Haar and inverse Haar transforms. The jacobian is 1. For sequences with length T not a power of two, this implementation is equivalent to a block-structured Haar transform in which block sizes decrease by factors of one half from left to right.- Parameters
- bijective = True
- property codomain
- property domain
LeakyReLUTransform
- class LeakyReLUTransform(cache_size=0)[source]
Bases:
torch.distributions.transforms.TransformBijective transform via the mapping \(y = \text{LeakyReLU}(x)\).
- bijective = True
- codomain: torch.distributions.constraints.Constraint = Real()
- domain: torch.distributions.constraints.Constraint = Real()
- sign = 1
LowerCholeskyAffine
- class LowerCholeskyAffine(loc, scale_tril, cache_size=0)[source]
Bases:
torch.distributions.transforms.TransformA bijection of the form,
\(\mathbf{y} = \mathbf{L} \mathbf{x} + \mathbf{r}\)
where mathbf{L} is a lower triangular matrix and mathbf{r} is a vector.
- Parameters
loc (torch.tensor) – the fixed D-dimensional vector to shift the input by.
scale_tril (torch.tensor) – the D x D lower triangular matrix used in the transformation.
- bijective = True
- codomain: torch.distributions.constraints.Constraint = IndependentConstraint(Real(), 1)
- domain: torch.distributions.constraints.Constraint = IndependentConstraint(Real(), 1)
- log_abs_det_jacobian(x, y)[source]
Calculates the elementwise determinant of the log Jacobian, i.e. log(abs(dy/dx)).
- volume_preserving = False
Normalize
- class Normalize(p=2, cache_size=0)[source]
Bases:
torch.distributions.transforms.TransformSafely project a vector onto the sphere wrt the
pnorm. This avoids the singularity at zero by mapping to the vector[1, 0, 0, ..., 0].- bijective = False
- codomain: torch.distributions.constraints.Constraint = Sphere
- domain: torch.distributions.constraints.Constraint = IndependentConstraint(Real(), 1)
OrderedTransform
- class OrderedTransform(cache_size=0)[source]
Bases:
torch.distributions.transforms.TransformTransforms a real vector into an ordered vector.
Specifically, enforces monotonically increasing order on the last dimension of a given tensor via the transformation \(y_0 = x_0\), \(y_i = \sum_{1 \le j \le i} \exp(x_i)\)
- bijective = True
- codomain: torch.distributions.constraints.Constraint = OrderedVector()
- domain: torch.distributions.constraints.Constraint = IndependentConstraint(Real(), 1)
Permute
- class Permute(permutation, *, dim=- 1, cache_size=1)[source]
Bases:
torch.distributions.transforms.TransformA bijection that reorders the input dimensions, that is, multiplies the input by a permutation matrix. This is useful in between
AffineAutoregressivetransforms to increase the flexibility of the resulting distribution and stabilize learning. Whilst not being an autoregressive transform, the log absolute determinate of the Jacobian is easily calculable as 0. Note that reordering the input dimension between two layers ofAffineAutoregressiveis not equivalent to reordering the dimension inside the MADE networks that those IAFs use; using aPermutetransform results in a distribution with more flexibility.Example usage:
>>> from pyro.nn import AutoRegressiveNN >>> from pyro.distributions.transforms import AffineAutoregressive, Permute >>> base_dist = dist.Normal(torch.zeros(10), torch.ones(10)) >>> iaf1 = AffineAutoregressive(AutoRegressiveNN(10, [40])) >>> ff = Permute(torch.randperm(10, dtype=torch.long)) >>> iaf2 = AffineAutoregressive(AutoRegressiveNN(10, [40])) >>> flow_dist = dist.TransformedDistribution(base_dist, [iaf1, ff, iaf2]) >>> flow_dist.sample()
- Parameters
permutation (torch.LongTensor) – a permutation ordering that is applied to the inputs.
dim (int) – the tensor dimension to permute. This value must be negative and defines the event dim as abs(dim).
- bijective = True
- property codomain
- property domain
- property inv_permutation
- log_abs_det_jacobian(x, y)[source]
Calculates the elementwise determinant of the log Jacobian, i.e. log(abs([dy_0/dx_0, …, dy_{N-1}/dx_{N-1}])). Note that this type of transform is not autoregressive, so the log Jacobian is not the sum of the previous expression. However, it turns out it’s always 0 (since the determinant is -1 or +1), and so returning a vector of zeros works.
- volume_preserving = True
PositivePowerTransform
- class PositivePowerTransform(exponent, *, cache_size=0, validate_args=None)[source]
Bases:
torch.distributions.transforms.TransformTransform via the mapping \(y=\operatorname{sign}(x)|x|^{\text{exponent}}\).
Whereas
PowerTransformallows arbitraryexponentand restricts domain and codomain to postive values, this class restrictsexponent > 0and allows real domain and codomain.Warning
The Jacobian is typically zero or infinite at the origin.
- bijective = True
- codomain: torch.distributions.constraints.Constraint = Real()
- domain: torch.distributions.constraints.Constraint = Real()
- sign = 1
SimplexToOrderedTransform
- class SimplexToOrderedTransform(anchor_point=None)[source]
Bases:
torch.distributions.transforms.TransformTransform a simplex into an ordered vector (via difference in Logistic CDF between cutpoints) Used in [1] to induce a prior on latent cutpoints via transforming ordered category probabilities.
- Parameters
anchor_point – Anchor point is a nuisance parameter to improve the identifiability of the transform. For simplicity, we assume it is a scalar value, but it is broadcastable x.shape[:-1]. For more details please refer to Section 2.2 in [1]
References:
Ordinal Regression Case Study, section 2.2, M. Betancourt, https://betanalpha.github.io/assets/case_studies/ordinal_regression.html
- codomain: torch.distributions.constraints.Constraint = OrderedVector()
- domain: torch.distributions.constraints.Constraint = Simplex()
SoftplusLowerCholeskyTransform
- class SoftplusLowerCholeskyTransform(cache_size=0)[source]
Bases:
torch.distributions.transforms.TransformTransform from unconstrained matrices to lower-triangular matrices with nonnegative diagonal entries. This is useful for parameterizing positive definite matrices in terms of their Cholesky factorization.
- codomain: torch.distributions.constraints.Constraint = LowerCholesky()
- domain: torch.distributions.constraints.Constraint = IndependentConstraint(Real(), 2)
SoftplusTransform
- class SoftplusTransform(cache_size=0)[source]
Bases:
torch.distributions.transforms.TransformTransform via the mapping \(\text{Softplus}(x) = \log(1 + \exp(x))\).
- bijective = True
- codomain: torch.distributions.constraints.Constraint = GreaterThan(lower_bound=0.0)
- domain: torch.distributions.constraints.Constraint = Real()
- sign = 1
UnitLowerCholeskyTransform
- class UnitLowerCholeskyTransform(cache_size=0)[source]
Bases:
torch.distributions.transforms.TransformTransform from unconstrained matrices to lower-triangular matrices with all ones diagonals.
- codomain: torch.distributions.constraints.Constraint = UnitLowerCholesky()
- domain: torch.distributions.constraints.Constraint = IndependentConstraint(Real(), 2)
TransformModules
AffineAutoregressive
- class AffineAutoregressive(autoregressive_nn, log_scale_min_clip=- 5.0, log_scale_max_clip=3.0, sigmoid_bias=2.0, stable=False)[source]
Bases:
pyro.distributions.torch_transform.TransformModuleAn implementation of the bijective transform of Inverse Autoregressive Flow (IAF), using by default Eq (10) from Kingma Et Al., 2016,
\(\mathbf{y} = \mu_t + \sigma_t\odot\mathbf{x}\)
where \(\mathbf{x}\) are the inputs, \(\mathbf{y}\) are the outputs, \(\mu_t,\sigma_t\) are calculated from an autoregressive network on \(\mathbf{x}\), and \(\sigma_t>0\).
If the stable keyword argument is set to True then the transformation used is,
\(\mathbf{y} = \sigma_t\odot\mathbf{x} + (1-\sigma_t)\odot\mu_t\)
where \(\sigma_t\) is restricted to \((0,1)\). This variant of IAF is claimed by the authors to be more numerically stable than one using Eq (10), although in practice it leads to a restriction on the distributions that can be represented, presumably since the input is restricted to rescaling by a number on \((0,1)\).
Together with
TransformedDistributionthis provides a way to create richer variational approximations.Example usage:
>>> from pyro.nn import AutoRegressiveNN >>> base_dist = dist.Normal(torch.zeros(10), torch.ones(10)) >>> transform = AffineAutoregressive(AutoRegressiveNN(10, [40])) >>> pyro.module("my_transform", transform) >>> flow_dist = dist.TransformedDistribution(base_dist, [transform]) >>> flow_dist.sample()
The inverse of the Bijector is required when, e.g., scoring the log density of a sample with
TransformedDistribution. This implementation caches the inverse of the Bijector when its forward operation is called, e.g., when sampling fromTransformedDistribution. However, if the cached value isn’t available, either because it was overwritten during sampling a new value or an arbitrary value is being scored, it will calculate it manually. Note that this is an operation that scales as O(D) where D is the input dimension, and so should be avoided for large dimensional uses. So in general, it is cheap to sample from IAF and score a value that was sampled by IAF, but expensive to score an arbitrary value.- Parameters
autoregressive_nn (callable) – an autoregressive neural network whose forward call returns a real-valued mean and logit-scale as a tuple
log_scale_min_clip (float) – The minimum value for clipping the log(scale) from the autoregressive NN
log_scale_max_clip (float) – The maximum value for clipping the log(scale) from the autoregressive NN
sigmoid_bias (float) – A term to add the logit of the input when using the stable tranform.
stable (bool) – When true, uses the alternative “stable” version of the transform (see above).
References:
[1] Diederik P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, Max Welling. Improving Variational Inference with Inverse Autoregressive Flow. [arXiv:1606.04934]
[2] Danilo Jimenez Rezende, Shakir Mohamed. Variational Inference with Normalizing Flows. [arXiv:1505.05770]
[3] Mathieu Germain, Karol Gregor, Iain Murray, Hugo Larochelle. MADE: Masked Autoencoder for Distribution Estimation. [arXiv:1502.03509]
- autoregressive = True
- bijective = True
- codomain: torch.distributions.constraints.Constraint = IndependentConstraint(Real(), 1)
- domain: torch.distributions.constraints.Constraint = IndependentConstraint(Real(), 1)
- sign = 1
AffineCoupling
- class AffineCoupling(split_dim, hypernet, *, dim=- 1, log_scale_min_clip=- 5.0, log_scale_max_clip=3.0)[source]
Bases:
pyro.distributions.torch_transform.TransformModuleAn implementation of the affine coupling layer of RealNVP (Dinh et al., 2017) that uses the bijective transform,
\(\mathbf{y}_{1:d} = \mathbf{x}_{1:d}\) \(\mathbf{y}_{(d+1):D} = \mu + \sigma\odot\mathbf{x}_{(d+1):D}\)
where \(\mathbf{x}\) are the inputs, \(\mathbf{y}\) are the outputs, e.g. \(\mathbf{x}_{1:d}\) represents the first \(d\) elements of the inputs, and \(\mu,\sigma\) are shift and translation parameters calculated as the output of a function inputting only \(\mathbf{x}_{1:d}\).
That is, the first \(d\) components remain unchanged, and the subsequent \(D-d\) are shifted and translated by a function of the previous components.
Together with
TransformedDistributionthis provides a way to create richer variational approximations.Example usage:
>>> from pyro.nn import DenseNN >>> input_dim = 10 >>> split_dim = 6 >>> base_dist = dist.Normal(torch.zeros(input_dim), torch.ones(input_dim)) >>> param_dims = [input_dim-split_dim, input_dim-split_dim] >>> hypernet = DenseNN(split_dim, [10*input_dim], param_dims) >>> transform = AffineCoupling(split_dim, hypernet) >>> pyro.module("my_transform", transform) >>> flow_dist = dist.TransformedDistribution(base_dist, [transform]) >>> flow_dist.sample()
The inverse of the Bijector is required when, e.g., scoring the log density of a sample with
TransformedDistribution. This implementation caches the inverse of the Bijector when its forward operation is called, e.g., when sampling fromTransformedDistribution. However, if the cached value isn’t available, either because it was overwritten during sampling a new value or an arbitary value is being scored, it will calculate it manually.This is an operation that scales as O(1), i.e. constant in the input dimension. So in general, it is cheap to sample and score (an arbitrary value) from
AffineCoupling.- Parameters
split_dim (int) – Zero-indexed dimension \(d\) upon which to perform input/ output split for transformation.
hypernet (callable) – a neural network whose forward call returns a real-valued mean and logit-scale as a tuple. The input should have final dimension split_dim and the output final dimension input_dim-split_dim for each member of the tuple.
dim (int) – the tensor dimension on which to split. This value must be negative and defines the event dim as abs(dim).
log_scale_min_clip (float) – The minimum value for clipping the log(scale) from the autoregressive NN
log_scale_max_clip (float) – The maximum value for clipping the log(scale) from the autoregressive NN
References:
[1] Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using Real NVP. ICLR 2017.
- bijective = True
- property codomain
- property domain
BatchNorm
- class BatchNorm(input_dim, momentum=0.1, epsilon=1e-05)[source]
Bases:
pyro.distributions.torch_transform.TransformModuleA type of batch normalization that can be used to stabilize training in normalizing flows. The inverse operation is defined as
\(x = (y - \hat{\mu}) \oslash \sqrt{\hat{\sigma^2}} \otimes \gamma + \beta\)
that is, the standard batch norm equation, where \(x\) is the input, \(y\) is the output, \(\gamma,\beta\) are learnable parameters, and \(\hat{\mu}\)/\(\hat{\sigma^2}\) are smoothed running averages of the sample mean and variance, respectively. The constraint \(\gamma>0\) is enforced to ease calculation of the log-det-Jacobian term.
This is an element-wise transform, and when applied to a vector, learns two parameters (\(\gamma,\beta\)) for each dimension of the input.
When the module is set to training mode, the moving averages of the sample mean and variance are updated every time the inverse operator is called, e.g., when a normalizing flow scores a minibatch with the log_prob method.
Also, when the module is set to training mode, the sample mean and variance on the current minibatch are used in place of the smoothed averages, \(\hat{\mu}\) and \(\hat{\sigma^2}\), for the inverse operator. For this reason it is not the case that \(x=g(g^{-1}(x))\) during training, i.e., that the inverse operation is the inverse of the forward one.
Example usage:
>>> from pyro.nn import AutoRegressiveNN >>> from pyro.distributions.transforms import AffineAutoregressive >>> base_dist = dist.Normal(torch.zeros(10), torch.ones(10)) >>> iafs = [AffineAutoregressive(AutoRegressiveNN(10, [40])) for _ in range(2)] >>> bn = BatchNorm(10) >>> flow_dist = dist.TransformedDistribution(base_dist, [iafs[0], bn, iafs[1]]) >>> flow_dist.sample()
- Parameters
References:
[1] Sergey Ioffe and Christian Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In International Conference on Machine Learning, 2015. https://arxiv.org/abs/1502.03167
[2] Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density Estimation using Real NVP. In International Conference on Learning Representations, 2017. https://arxiv.org/abs/1605.08803
[3] George Papamakarios, Theo Pavlakou, and Iain Murray. Masked Autoregressive Flow for Density Estimation. In Neural Information Processing Systems, 2017. https://arxiv.org/abs/1705.07057
- bijective = True
- codomain: torch.distributions.constraints.Constraint = Real()
- property constrained_gamma
- domain: torch.distributions.constraints.Constraint = Real()
BlockAutoregressive
- class BlockAutoregressive(input_dim, hidden_factors=[8, 8], activation='tanh', residual=None)[source]
Bases:
pyro.distributions.torch_transform.TransformModuleAn implementation of Block Neural Autoregressive Flow (block-NAF) (De Cao et al., 2019) bijective transform. Block-NAF uses a similar transformation to deep dense NAF, building the autoregressive NN into the structure of the transform, in a sense.
Together with
TransformedDistributionthis provides a way to create richer variational approximations.Example usage:
>>> base_dist = dist.Normal(torch.zeros(10), torch.ones(10)) >>> naf = BlockAutoregressive(input_dim=10) >>> pyro.module("my_naf", naf) >>> naf_dist = dist.TransformedDistribution(base_dist, [naf]) >>> naf_dist.sample()
The inverse operation is not implemented. This would require numerical inversion, e.g., using a root finding method - a possibility for a future implementation.
- Parameters
input_dim (int) – The dimensionality of the input and output variables.
hidden_factors (list) – Hidden layer i has hidden_factors[i] hidden units per input dimension. This corresponds to both \(a\) and \(b\) in De Cao et al. (2019). The elements of hidden_factors must be integers.
activation (string) – Activation function to use. One of ‘ELU’, ‘LeakyReLU’, ‘sigmoid’, or ‘tanh’.
residual (string) – Type of residual connections to use. Choices are “None”, “normal” for \(\mathbf{y}+f(\mathbf{y})\), and “gated” for \(\alpha\mathbf{y} + (1 - \alpha\mathbf{y})\) for learnable parameter \(\alpha\).
References:
[1] Nicola De Cao, Ivan Titov, Wilker Aziz. Block Neural Autoregressive Flow. [arXiv:1904.04676]
- autoregressive = True
- bijective = True
- codomain: torch.distributions.constraints.Constraint = IndependentConstraint(Real(), 1)
- domain: torch.distributions.constraints.Constraint = IndependentConstraint(Real(), 1)
ConditionalAffineAutoregressive
- class ConditionalAffineAutoregressive(autoregressive_nn, **kwargs)[source]
Bases:
pyro.distributions.conditional.ConditionalTransformModuleAn implementation of the bijective transform of Inverse Autoregressive Flow (IAF) that conditions on an additional context variable and uses, by default, Eq (10) from Kingma Et Al., 2016,
\(\mathbf{y} = \mu_t + \sigma_t\odot\mathbf{x}\)
where \(\mathbf{x}\) are the inputs, \(\mathbf{y}\) are the outputs, \(\mu_t,\sigma_t\) are calculated from an autoregressive network on \(\mathbf{x}\) and context \(\mathbf{z}\in\mathbb{R}^M\), and \(\sigma_t>0\).
If the stable keyword argument is set to True then the transformation used is,
\(\mathbf{y} = \sigma_t\odot\mathbf{x} + (1-\sigma_t)\odot\mu_t\)
where \(\sigma_t\) is restricted to \((0,1)\). This variant of IAF is claimed by the authors to be more numerically stable than one using Eq (10), although in practice it leads to a restriction on the distributions that can be represented, presumably since the input is restricted to rescaling by a number on \((0,1)\).
Together with
ConditionalTransformedDistributionthis provides a way to create richer variational approximations.Example usage:
>>> from pyro.nn import ConditionalAutoRegressiveNN >>> input_dim = 10 >>> context_dim = 4 >>> batch_size = 3 >>> hidden_dims = [10*input_dim, 10*input_dim] >>> base_dist = dist.Normal(torch.zeros(input_dim), torch.ones(input_dim)) >>> hypernet = ConditionalAutoRegressiveNN(input_dim, context_dim, hidden_dims) >>> transform = ConditionalAffineAutoregressive(hypernet) >>> pyro.module("my_transform", transform) >>> z = torch.rand(batch_size, context_dim) >>> flow_dist = dist.ConditionalTransformedDistribution(base_dist, ... [transform]).condition(z) >>> flow_dist.sample(sample_shape=torch.Size([batch_size]))
The inverse of the Bijector is required when, e.g., scoring the log density of a sample with
TransformedDistribution. This implementation caches the inverse of the Bijector when its forward operation is called, e.g., when sampling fromTransformedDistribution. However, if the cached value isn’t available, either because it was overwritten during sampling a new value or an arbitrary value is being scored, it will calculate it manually. Note that this is an operation that scales as O(D) where D is the input dimension, and so should be avoided for large dimensional uses. So in general, it is cheap to sample from IAF and score a value that was sampled by IAF, but expensive to score an arbitrary value.- Parameters
autoregressive_nn (nn.Module) – an autoregressive neural network whose forward call returns a real-valued mean and logit-scale as a tuple
log_scale_min_clip (float) – The minimum value for clipping the log(scale) from the autoregressive NN
log_scale_max_clip (float) – The maximum value for clipping the log(scale) from the autoregressive NN
sigmoid_bias (float) – A term to add the logit of the input when using the stable tranform.
stable (bool) – When true, uses the alternative “stable” version of the transform (see above).
References:
[1] Diederik P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, Max Welling. Improving Variational Inference with Inverse Autoregressive Flow. [arXiv:1606.04934]
[2] Danilo Jimenez Rezende, Shakir Mohamed. Variational Inference with Normalizing Flows. [arXiv:1505.05770]
[3] Mathieu Germain, Karol Gregor, Iain Murray, Hugo Larochelle. MADE: Masked Autoencoder for Distribution Estimation. [arXiv:1502.03509]
- bijective = True
- codomain = IndependentConstraint(Real(), 1)
- condition(context)[source]
Conditions on a context variable, returning a non-conditional transform of of type
AffineAutoregressive.
- domain = IndependentConstraint(Real(), 1)
ConditionalAffineCoupling
- class ConditionalAffineCoupling(split_dim, hypernet, **kwargs)[source]
Bases:
pyro.distributions.conditional.ConditionalTransformModuleAn implementation of the affine coupling layer of RealNVP (Dinh et al., 2017) that conditions on an additional context variable and uses the bijective transform,
\(\mathbf{y}_{1:d} = \mathbf{x}_{1:d}\) \(\mathbf{y}_{(d+1):D} = \mu + \sigma\odot\mathbf{x}_{(d+1):D}\)
where \(\mathbf{x}\) are the inputs, \(\mathbf{y}\) are the outputs, e.g. \(\mathbf{x}_{1:d}\) represents the first \(d\) elements of the inputs, and \(\mu,\sigma\) are shift and translation parameters calculated as the output of a function input \(\mathbf{x}_{1:d}\) and a context variable \(\mathbf{z}\in\mathbb{R}^M\).
That is, the first \(d\) components remain unchanged, and the subsequent \(D-d\) are shifted and translated by a function of the previous components.
Together with
ConditionalTransformedDistributionthis provides a way to create richer variational approximations.Example usage:
>>> from pyro.nn import ConditionalDenseNN >>> input_dim = 10 >>> split_dim = 6 >>> context_dim = 4 >>> batch_size = 3 >>> base_dist = dist.Normal(torch.zeros(input_dim), torch.ones(input_dim)) >>> param_dims = [input_dim-split_dim, input_dim-split_dim] >>> hypernet = ConditionalDenseNN(split_dim, context_dim, [10*input_dim], ... param_dims) >>> transform = ConditionalAffineCoupling(split_dim, hypernet) >>> pyro.module("my_transform", transform) >>> z = torch.rand(batch_size, context_dim) >>> flow_dist = dist.ConditionalTransformedDistribution(base_dist, ... [transform]).condition(z) >>> flow_dist.sample(sample_shape=torch.Size([batch_size]))
The inverse of the Bijector is required when, e.g., scoring the log density of a sample with
ConditionalTransformedDistribution. This implementation caches the inverse of the Bijector when its forward operation is called, e.g., when sampling fromConditionalTransformedDistribution. However, if the cached value isn’t available, either because it was overwritten during sampling a new value or an arbitary value is being scored, it will calculate it manually.This is an operation that scales as O(1), i.e. constant in the input dimension. So in general, it is cheap to sample and score (an arbitrary value) from
ConditionalAffineCoupling.- Parameters
split_dim (int) – Zero-indexed dimension \(d\) upon which to perform input/ output split for transformation.
hypernet (callable) – A neural network whose forward call returns a real-valued mean and logit-scale as a tuple. The input should have final dimension split_dim and the output final dimension input_dim-split_dim for each member of the tuple. The network also inputs a context variable as a keyword argument in order to condition the output upon it.
log_scale_min_clip (float) – The minimum value for clipping the log(scale) from the NN
log_scale_max_clip (float) – The maximum value for clipping the log(scale) from the NN
References:
Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using Real NVP. ICLR 2017.
- bijective = True
- codomain = IndependentConstraint(Real(), 1)
- condition(context)[source]
See
pyro.distributions.conditional.ConditionalTransformModule.condition()
- domain = IndependentConstraint(Real(), 1)
ConditionalGeneralizedChannelPermute
- class ConditionalGeneralizedChannelPermute(nn, channels=3, permutation=None)[source]
Bases:
pyro.distributions.conditional.ConditionalTransformModuleA bijection that generalizes a permutation on the channels of a batch of 2D image in \([\ldots,C,H,W]\) format conditioning on an additional context variable. Specifically this transform performs the operation,
\(\mathbf{y} = \text{torch.nn.functional.conv2d}(\mathbf{x}, W)\)
where \(\mathbf{x}\) are the inputs, \(\mathbf{y}\) are the outputs, and \(W\sim C\times C\times 1\times 1\) is the filter matrix for a 1x1 convolution with \(C\) input and output channels.
Ignoring the final two dimensions, \(W\) is restricted to be the matrix product,
\(W = PLU\)
where \(P\sim C\times C\) is a permutation matrix on the channel dimensions, and \(LU\sim C\times C\) is an invertible product of a lower triangular and an upper triangular matrix that is the output of an NN with input \(z\in\mathbb{R}^{M}\) representing the context variable to condition on.
The input \(\mathbf{x}\) and output \(\mathbf{y}\) both have shape […,C,H,W], where C is the number of channels set at initialization.
This operation was introduced in [1] for Glow normalizing flow, and is also known as 1x1 invertible convolution. It appears in other notable work such as [2,3], and corresponds to the class tfp.bijectors.MatvecLU of TensorFlow Probability.
Example usage:
>>> from pyro.nn.dense_nn import DenseNN >>> context_dim = 5 >>> batch_size = 3 >>> channels = 3 >>> base_dist = dist.Normal(torch.zeros(channels, 32, 32), ... torch.ones(channels, 32, 32)) >>> hidden_dims = [context_dim*10, context_dim*10] >>> nn = DenseNN(context_dim, hidden_dims, param_dims=[channels*channels]) >>> transform = ConditionalGeneralizedChannelPermute(nn, channels=channels) >>> z = torch.rand(batch_size, context_dim) >>> flow_dist = dist.ConditionalTransformedDistribution(base_dist, ... [transform]).condition(z) >>> flow_dist.sample(sample_shape=torch.Size([batch_size]))
- Parameters
nn – a function inputting the context variable and outputting real-valued parameters of dimension \(C^2\).
channels (int) – Number of channel dimensions in the input.
[1] Diederik P. Kingma, Prafulla Dhariwal. Glow: Generative Flow with Invertible 1x1 Convolutions. [arXiv:1807.03039]
[2] Ryan Prenger, Rafael Valle, Bryan Catanzaro. WaveGlow: A Flow-based Generative Network for Speech Synthesis. [arXiv:1811.00002]
[3] Conor Durkan, Artur Bekasov, Iain Murray, George Papamakarios. Neural Spline Flows. [arXiv:1906.04032]
- bijective = True
- codomain = IndependentConstraint(Real(), 3)
- condition(context)[source]
See
pyro.distributions.conditional.ConditionalTransformModule.condition()
- domain = IndependentConstraint(Real(), 3)
ConditionalHouseholder
- class ConditionalHouseholder(input_dim, nn, count_transforms=1)[source]
Bases:
pyro.distributions.conditional.ConditionalTransformModuleRepresents multiple applications of the Householder bijective transformation conditioning on an additional context. A single Householder transformation takes the form,
\(\mathbf{y} = (I - 2*\frac{\mathbf{u}\mathbf{u}^T}{||\mathbf{u}||^2})\mathbf{x}\)
where \(\mathbf{x}\) are the inputs with dimension \(D\), \(\mathbf{y}\) are the outputs, and \(\mathbf{u}\in\mathbb{R}^D\) is the output of a function, e.g. a NN, with input \(z\in\mathbb{R}^{M}\) representing the context variable to condition on.
The transformation represents the reflection of \(\mathbf{x}\) through the plane passing through the origin with normal \(\mathbf{u}\).
\(D\) applications of this transformation are able to transform standard i.i.d. standard Gaussian noise into a Gaussian variable with an arbitrary covariance matrix. With \(K<D\) transformations, one is able to approximate a full-rank Gaussian distribution using a linear transformation of rank \(K\).
Together with
ConditionalTransformedDistributionthis provides a way to create richer variational approximations.Example usage:
>>> from pyro.nn.dense_nn import DenseNN >>> input_dim = 10 >>> context_dim = 5 >>> batch_size = 3 >>> base_dist = dist.Normal(torch.zeros(input_dim), torch.ones(input_dim)) >>> param_dims = [input_dim] >>> hypernet = DenseNN(context_dim, [50, 50], param_dims) >>> transform = ConditionalHouseholder(input_dim, hypernet) >>> z = torch.rand(batch_size, context_dim) >>> flow_dist = dist.ConditionalTransformedDistribution(base_dist, ... [transform]).condition(z) >>> flow_dist.sample(sample_shape=torch.Size([batch_size]))
- Parameters
References:
[1] Jakub M. Tomczak, Max Welling. Improving Variational Auto-Encoders using Householder Flow. [arXiv:1611.09630]
- bijective = True
- codomain = IndependentConstraint(Real(), 1)
- condition(context)[source]
See
pyro.distributions.conditional.ConditionalTransformModule.condition()
- domain = IndependentConstraint(Real(), 1)
ConditionalMatrixExponential
- class ConditionalMatrixExponential(input_dim, nn, iterations=8, normalization='none', bound=None)[source]
Bases:
pyro.distributions.conditional.ConditionalTransformModuleA dense matrix exponential bijective transform (Hoogeboom et al., 2020) that conditions on an additional context variable with equation,
\(\mathbf{y} = \exp(M)\mathbf{x}\)
where \(\mathbf{x}\) are the inputs, \(\mathbf{y}\) are the outputs, \(\exp(\cdot)\) represents the matrix exponential, and \(M\in\mathbb{R}^D\times\mathbb{R}^D\) is the output of a neural network conditioning on a context variable \(\mathbf{z}\) for input dimension \(D\). In general, \(M\) is not required to be invertible.
Due to the favourable mathematical properties of the matrix exponential, the transform has an exact inverse and a log-determinate-Jacobian that scales in time-complexity as \(O(D)\). Both the forward and reverse operations are approximated with a truncated power series. For numerical stability, the norm of \(M\) can be restricted with the normalization keyword argument.
Example usage:
>>> from pyro.nn.dense_nn import DenseNN >>> input_dim = 10 >>> context_dim = 5 >>> batch_size = 3 >>> base_dist = dist.Normal(torch.zeros(input_dim), torch.ones(input_dim)) >>> param_dims = [input_dim*input_dim] >>> hypernet = DenseNN(context_dim, [50, 50], param_dims) >>> transform = ConditionalMatrixExponential(input_dim, hypernet) >>> z = torch.rand(batch_size, context_dim) >>> flow_dist = dist.ConditionalTransformedDistribution(base_dist, ... [transform]).condition(z) >>> flow_dist.sample(sample_shape=torch.Size([batch_size]))
- Parameters
input_dim (int) – the dimension of the input (and output) variable.
iterations (int) – the number of terms to use in the truncated power series that approximates matrix exponentiation.
normalization (string) – One of [‘none’, ‘weight’, ‘spectral’] normalization that selects what type of normalization to apply to the weight matrix. weight corresponds to weight normalization (Salimans and Kingma, 2016) and spectral to spectral normalization (Miyato et al, 2018).
bound (float) – a bound on either the weight or spectral norm, when either of those two types of regularization are chosen by the normalization argument. A lower value for this results in fewer required terms of the truncated power series to closely approximate the exact value of the matrix exponential.
References:
- [1] Emiel Hoogeboom, Victor Garcia Satorras, Jakub M. Tomczak, Max Welling. The
Convolution Exponential and Generalized Sylvester Flows. [arXiv:2006.01910]
- [2] Tim Salimans, Diederik P. Kingma. Weight Normalization: A Simple
Reparameterization to Accelerate Training of Deep Neural Networks. [arXiv:1602.07868]
- [3] Takeru Miyato, Toshiki Kataoka, Masanori Koyama, Yuichi Yoshida. Spectral
Normalization for Generative Adversarial Networks. ICLR 2018.
- bijective = True
- codomain = IndependentConstraint(Real(), 1)
- condition(context)[source]
See
pyro.distributions.conditional.ConditionalTransformModule.condition()
- domain = IndependentConstraint(Real(), 1)
ConditionalNeuralAutoregressive
- class ConditionalNeuralAutoregressive(autoregressive_nn, **kwargs)[source]
Bases:
pyro.distributions.conditional.ConditionalTransformModuleAn implementation of the deep Neural Autoregressive Flow (NAF) bijective transform of the “IAF flavour” conditioning on an additiona context variable that can be used for sampling and scoring samples drawn from it (but not arbitrary ones).
Example usage:
>>> from pyro.nn import ConditionalAutoRegressiveNN >>> input_dim = 10 >>> context_dim = 5 >>> batch_size = 3 >>> base_dist = dist.Normal(torch.zeros(input_dim), torch.ones(input_dim)) >>> arn = ConditionalAutoRegressiveNN(input_dim, context_dim, [40], ... param_dims=[16]*3) >>> transform = ConditionalNeuralAutoregressive(arn, hidden_units=16) >>> pyro.module("my_transform", transform) >>> z = torch.rand(batch_size, context_dim) >>> flow_dist = dist.ConditionalTransformedDistribution(base_dist, ... [transform]).condition(z) >>> flow_dist.sample(sample_shape=torch.Size([batch_size]))
The inverse operation is not implemented. This would require numerical inversion, e.g., using a root finding method - a possibility for a future implementation.
- Parameters
autoregressive_nn (nn.Module) – an autoregressive neural network whose forward call returns a tuple of three real-valued tensors, whose last dimension is the input dimension, and whose penultimate dimension is equal to hidden_units.
hidden_units (int) – the number of hidden units to use in the NAF transformation (see Eq (8) in reference)
activation (string) – Activation function to use. One of ‘ELU’, ‘LeakyReLU’, ‘sigmoid’, or ‘tanh’.
Reference:
[1] Chin-Wei Huang, David Krueger, Alexandre Lacoste, Aaron Courville. Neural Autoregressive Flows. [arXiv:1804.00779]
- bijective = True
- codomain = IndependentConstraint(Real(), 1)
- condition(context)[source]
Conditions on a context variable, returning a non-conditional transform of of type
NeuralAutoregressive.
- domain = IndependentConstraint(Real(), 1)
ConditionalPlanar
- class ConditionalPlanar(nn)[source]
Bases:
pyro.distributions.conditional.ConditionalTransformModuleA conditional ‘planar’ bijective transform using the equation,
\(\mathbf{y} = \mathbf{x} + \mathbf{u}\tanh(\mathbf{w}^T\mathbf{z}+b)\)
where \(\mathbf{x}\) are the inputs with dimension \(D\), \(\mathbf{y}\) are the outputs, and the pseudo-parameters \(b\in\mathbb{R}\), \(\mathbf{u}\in\mathbb{R}^D\), and \(\mathbf{w}\in\mathbb{R}^D\) are the output of a function, e.g. a NN, with input \(z\in\mathbb{R}^{M}\) representing the context variable to condition on. For this to be an invertible transformation, the condition \(\mathbf{w}^T\mathbf{u}>-1\) is enforced.
Together with
ConditionalTransformedDistributionthis provides a way to create richer variational approximations.Example usage:
>>> from pyro.nn.dense_nn import DenseNN >>> input_dim = 10 >>> context_dim = 5 >>> batch_size = 3 >>> base_dist = dist.Normal(torch.zeros(input_dim), torch.ones(input_dim)) >>> param_dims = [1, input_dim, input_dim] >>> hypernet = DenseNN(context_dim, [50, 50], param_dims) >>> transform = ConditionalPlanar(hypernet) >>> z = torch.rand(batch_size, context_dim) >>> flow_dist = dist.ConditionalTransformedDistribution(base_dist, ... [transform]).condition(z) >>> flow_dist.sample(sample_shape=torch.Size([batch_size]))
The inverse of this transform does not possess an analytical solution and is left unimplemented. However, the inverse is cached when the forward operation is called during sampling, and so samples drawn using the planar transform can be scored.
- Parameters
nn (callable) – a function inputting the context variable and outputting a triplet of real-valued parameters of dimensions \((1, D, D)\).
References: [1] Variational Inference with Normalizing Flows [arXiv:1505.05770] Danilo Jimenez Rezende, Shakir Mohamed
- bijective = True
- codomain = IndependentConstraint(Real(), 1)
- condition(context)[source]
See
pyro.distributions.conditional.ConditionalTransformModule.condition()
- domain = IndependentConstraint(Real(), 1)
ConditionalRadial
- class ConditionalRadial(nn)[source]
Bases:
pyro.distributions.conditional.ConditionalTransformModuleA conditional ‘radial’ bijective transform context using the equation,
\(\mathbf{y} = \mathbf{x} + \beta h(\alpha,r)(\mathbf{x} - \mathbf{x}_0)\)
where \(\mathbf{x}\) are the inputs, \(\mathbf{y}\) are the outputs, and \(\alpha\in\mathbb{R}^+\), \(\beta\in\mathbb{R}\), and \(\mathbf{x}_0\in\mathbb{R}^D\), are the output of a function, e.g. a NN, with input \(z\in\mathbb{R}^{M}\) representing the context variable to condition on. The input dimension is \(D\), \(r=||\mathbf{x}-\mathbf{x}_0||_2\), and \(h(\alpha,r)=1/(\alpha+r)\). For this to be an invertible transformation, the condition \(\beta>-\alpha\) is enforced.
Example usage:
>>> from pyro.nn.dense_nn import DenseNN >>> input_dim = 10 >>> context_dim = 5 >>> batch_size = 3 >>> base_dist = dist.Normal(torch.zeros(input_dim), torch.ones(input_dim)) >>> param_dims = [input_dim, 1, 1] >>> hypernet = DenseNN(context_dim, [50, 50], param_dims) >>> transform = ConditionalRadial(hypernet) >>> z = torch.rand(batch_size, context_dim) >>> flow_dist = dist.ConditionalTransformedDistribution(base_dist, ... [transform]).condition(z) >>> flow_dist.sample(sample_shape=torch.Size([batch_size]))
The inverse of this transform does not possess an analytical solution and is left unimplemented. However, the inverse is cached when the forward operation is called during sampling, and so samples drawn using the radial transform can be scored.
- Parameters
input_dim (int) – the dimension of the input (and output) variable.
References:
[1] Danilo Jimenez Rezende, Shakir Mohamed. Variational Inference with Normalizing Flows. [arXiv:1505.05770]
- bijective = True
- codomain = IndependentConstraint(Real(), 1)
- condition(context)[source]
See
pyro.distributions.conditional.ConditionalTransformModule.condition()
- domain = IndependentConstraint(Real(), 1)
ConditionalSpline
- class ConditionalSpline(nn, input_dim, count_bins, bound=3.0, order='linear')[source]
Bases:
pyro.distributions.conditional.ConditionalTransformModuleAn implementation of the element-wise rational spline bijections of linear and quadratic order (Durkan et al., 2019; Dolatabadi et al., 2020) conditioning on an additional context variable.
Rational splines are functions that are comprised of segments that are the ratio of two polynomials. For instance, for the \(d\)-th dimension and the \(k\)-th segment on the spline, the function will take the form,
\(y_d = \frac{\alpha^{(k)}(x_d)}{\beta^{(k)}(x_d)},\)
where \(\alpha^{(k)}\) and \(\beta^{(k)}\) are two polynomials of order \(d\) whose parameters are the output of a function, e.g. a NN, with input \(z\\in\\mathbb{R}^{M}\) representing the context variable to condition on.. For \(d=1\), we say that the spline is linear, and for \(d=2\), quadratic. The spline is constructed on the specified bounding box, \([-K,K]\times[-K,K]\), with the identity function used elsewhere.
Rational splines offer an excellent combination of functional flexibility whilst maintaining a numerically stable inverse that is of the same computational and space complexities as the forward operation. This element-wise transform permits the accurate represention of complex univariate distributions.
Example usage:
>>> from pyro.nn.dense_nn import DenseNN >>> input_dim = 10 >>> context_dim = 5 >>> batch_size = 3 >>> count_bins = 8 >>> base_dist = dist.Normal(torch.zeros(input_dim), torch.ones(input_dim)) >>> param_dims = [input_dim * count_bins, input_dim * count_bins, ... input_dim * (count_bins - 1), input_dim * count_bins] >>> hypernet = DenseNN(context_dim, [50, 50], param_dims) >>> transform = ConditionalSpline(hypernet, input_dim, count_bins) >>> z = torch.rand(batch_size, context_dim) >>> flow_dist = dist.ConditionalTransformedDistribution(base_dist, ... [transform]).condition(z) >>> flow_dist.sample(sample_shape=torch.Size([batch_size]))
- Parameters
input_dim (int) – Dimension of the input vector. This is required so we know how many parameters to store.
count_bins (int) – The number of segments comprising the spline.
bound (float) – The quantity \(K\) determining the bounding box, \([-K,K]\times[-K,K]\), of the spline.
order (string) – One of [‘linear’, ‘quadratic’] specifying the order of the spline.
References:
Conor Durkan, Artur Bekasov, Iain Murray, George Papamakarios. Neural Spline Flows. NeurIPS 2019.
Hadi M. Dolatabadi, Sarah Erfani, Christopher Leckie. Invertible Generative Modeling using Linear Rational Splines. AISTATS 2020.
- bijective = True
- codomain = Real()
- condition(context)[source]
See
pyro.distributions.conditional.ConditionalTransformModule.condition()
- domain = Real()
ConditionalSplineAutoregressive
- class ConditionalSplineAutoregressive(input_dim, autoregressive_nn, **kwargs)[source]
Bases:
pyro.distributions.conditional.ConditionalTransformModuleAn implementation of the autoregressive layer with rational spline bijections of linear and quadratic order (Durkan et al., 2019; Dolatabadi et al., 2020) that conditions on an additional context variable. Rational splines are functions that are comprised of segments that are the ratio of two polynomials (see
Spline).The autoregressive layer uses the transformation,
\(y_d = g_{\theta_d}(x_d)\ \ \ d=1,2,\ldots,D\)
where \(\mathbf{x}=(x_1,x_2,\ldots,x_D)\) are the inputs, \(\mathbf{y}=(y_1,y_2,\ldots,y_D)\) are the outputs, \(g_{\theta_d}\) is an elementwise rational monotonic spline with parameters \(\theta_d\), and \(\theta=(\theta_1,\theta_2,\ldots,\theta_D)\) is the output of a conditional autoregressive NN inputting \(\mathbf{x}\) and conditioning on the context variable \(\mathbf{z}\).
Example usage:
>>> from pyro.nn import ConditionalAutoRegressiveNN >>> input_dim = 10 >>> count_bins = 8 >>> context_dim = 5 >>> batch_size = 3 >>> base_dist = dist.Normal(torch.zeros(input_dim), torch.ones(input_dim)) >>> hidden_dims = [input_dim * 10, input_dim * 10] >>> param_dims = [count_bins, count_bins, count_bins - 1, count_bins] >>> hypernet = ConditionalAutoRegressiveNN(input_dim, context_dim, hidden_dims, ... param_dims=param_dims) >>> transform = ConditionalSplineAutoregressive(input_dim, hypernet, ... count_bins=count_bins) >>> pyro.module("my_transform", transform) >>> z = torch.rand(batch_size, context_dim) >>> flow_dist = dist.ConditionalTransformedDistribution(base_dist, ... [transform]).condition(z) >>> flow_dist.sample(sample_shape=torch.Size([batch_size]))
- Parameters
input_dim (int) – Dimension of the input vector. Despite operating element-wise, this is required so we know how many parameters to store.
autoregressive_nn (callable) – an autoregressive neural network whose forward call returns tuple of the spline parameters
count_bins (int) – The number of segments comprising the spline.
bound (float) – The quantity \(K\) determining the bounding box, \([-K,K]\times[-K,K]\), of the spline.
order (string) – One of [‘linear’, ‘quadratic’] specifying the order of the spline.
References:
Conor Durkan, Artur Bekasov, Iain Murray, George Papamakarios. Neural Spline Flows. NeurIPS 2019.
Hadi M. Dolatabadi, Sarah Erfani, Christopher Leckie. Invertible Generative Modeling using Linear Rational Splines. AISTATS 2020.
- bijective = True
- codomain = IndependentConstraint(Real(), 1)
- condition(context)[source]
Conditions on a context variable, returning a non-conditional transform of of type
SplineAutoregressive.
- domain = IndependentConstraint(Real(), 1)
ConditionalTransformModule
- class ConditionalTransformModule(*args, **kwargs)[source]
Bases:
pyro.distributions.conditional.ConditionalTransform,torch.nn.modules.module.ModuleConditional transforms with learnable parameters such as normalizing flows should inherit from this class rather than
ConditionalTransformso they are also a subclass ofModuleand inherit all the useful methods of that class.
GeneralizedChannelPermute
- class GeneralizedChannelPermute(channels=3, permutation=None)[source]
Bases:
pyro.distributions.transforms.generalized_channel_permute.ConditionedGeneralizedChannelPermute,pyro.distributions.torch_transform.TransformModuleA bijection that generalizes a permutation on the channels of a batch of 2D image in \([\ldots,C,H,W]\) format. Specifically this transform performs the operation,
\(\mathbf{y} = \text{torch.nn.functional.conv2d}(\mathbf{x}, W)\)
where \(\mathbf{x}\) are the inputs, \(\mathbf{y}\) are the outputs, and \(W\sim C\times C\times 1\times 1\) is the filter matrix for a 1x1 convolution with \(C\) input and output channels.
Ignoring the final two dimensions, \(W\) is restricted to be the matrix product,
\(W = PLU\)
where \(P\sim C\times C\) is a permutation matrix on the channel dimensions, \(L\sim C\times C\) is a lower triangular matrix with ones on the diagonal, and \(U\sim C\times C\) is an upper triangular matrix. \(W\) is initialized to a random orthogonal matrix. Then, \(P\) is fixed and the learnable parameters set to \(L,U\).
The input \(\mathbf{x}\) and output \(\mathbf{y}\) both have shape […,C,H,W], where C is the number of channels set at initialization.
This operation was introduced in [1] for Glow normalizing flow, and is also known as 1x1 invertible convolution. It appears in other notable work such as [2,3], and corresponds to the class tfp.bijectors.MatvecLU of TensorFlow Probability.
Example usage:
>>> channels = 3 >>> base_dist = dist.Normal(torch.zeros(channels, 32, 32), ... torch.ones(channels, 32, 32)) >>> inv_conv = GeneralizedChannelPermute(channels=channels) >>> flow_dist = dist.TransformedDistribution(base_dist, [inv_conv]) >>> flow_dist.sample()
- Parameters
channels (int) – Number of channel dimensions in the input.
[1] Diederik P. Kingma, Prafulla Dhariwal. Glow: Generative Flow with Invertible 1x1 Convolutions. [arXiv:1807.03039]
[2] Ryan Prenger, Rafael Valle, Bryan Catanzaro. WaveGlow: A Flow-based Generative Network for Speech Synthesis. [arXiv:1811.00002]
[3] Conor Durkan, Artur Bekasov, Iain Murray, George Papamakarios. Neural Spline Flows. [arXiv:1906.04032]
- bijective = True
- codomain: torch.distributions.constraints.Constraint = IndependentConstraint(Real(), 3)
- domain: torch.distributions.constraints.Constraint = IndependentConstraint(Real(), 3)
Householder
- class Householder(input_dim, count_transforms=1)[source]
Bases:
pyro.distributions.transforms.householder.ConditionedHouseholder,pyro.distributions.torch_transform.TransformModuleRepresents multiple applications of the Householder bijective transformation. A single Householder transformation takes the form,
\(\mathbf{y} = (I - 2*\frac{\mathbf{u}\mathbf{u}^T}{||\mathbf{u}||^2})\mathbf{x}\)
where \(\mathbf{x}\) are the inputs, \(\mathbf{y}\) are the outputs, and the learnable parameters are \(\mathbf{u}\in\mathbb{R}^D\) for input dimension \(D\).
The transformation represents the reflection of \(\mathbf{x}\) through the plane passing through the origin with normal \(\mathbf{u}\).
\(D\) applications of this transformation are able to transform standard i.i.d. standard Gaussian noise into a Gaussian variable with an arbitrary covariance matrix. With \(K<D\) transformations, one is able to approximate a full-rank Gaussian distribution using a linear transformation of rank \(K\).
Together with
TransformedDistributionthis provides a way to create richer variational approximations.Example usage:
>>> base_dist = dist.Normal(torch.zeros(10), torch.ones(10)) >>> transform = Householder(10, count_transforms=5) >>> pyro.module("my_transform", p) >>> flow_dist = dist.TransformedDistribution(base_dist, [transform]) >>> flow_dist.sample()
- Parameters
References:
[1] Jakub M. Tomczak, Max Welling. Improving Variational Auto-Encoders using Householder Flow. [arXiv:1611.09630]
- bijective = True
- codomain: torch.distributions.constraints.Constraint = IndependentConstraint(Real(), 1)
- domain: torch.distributions.constraints.Constraint = IndependentConstraint(Real(), 1)
- volume_preserving = True
MatrixExponential
- class MatrixExponential(input_dim, iterations=8, normalization='none', bound=None)[source]
Bases:
pyro.distributions.transforms.matrix_exponential.ConditionedMatrixExponential,pyro.distributions.torch_transform.TransformModuleA dense matrix exponential bijective transform (Hoogeboom et al., 2020) with equation,
\(\mathbf{y} = \exp(M)\mathbf{x}\)
where \(\mathbf{x}\) are the inputs, \(\mathbf{y}\) are the outputs, \(\exp(\cdot)\) represents the matrix exponential, and the learnable parameters are \(M\in\mathbb{R}^D\times\mathbb{R}^D\) for input dimension \(D\). In general, \(M\) is not required to be invertible.
Due to the favourable mathematical properties of the matrix exponential, the transform has an exact inverse and a log-determinate-Jacobian that scales in time-complexity as \(O(D)\). Both the forward and reverse operations are approximated with a truncated power series. For numerical stability, the norm of \(M\) can be restricted with the normalization keyword argument.
Example usage:
>>> base_dist = dist.Normal(torch.zeros(10), torch.ones(10)) >>> transform = MatrixExponential(10) >>> pyro.module("my_transform", transform) >>> flow_dist = dist.TransformedDistribution(base_dist, [transform]) >>> flow_dist.sample()
- Parameters
input_dim (int) – the dimension of the input (and output) variable.
iterations (int) – the number of terms to use in the truncated power series that approximates matrix exponentiation.
normalization (string) – One of [‘none’, ‘weight’, ‘spectral’] normalization that selects what type of normalization to apply to the weight matrix. weight corresponds to weight normalization (Salimans and Kingma, 2016) and spectral to spectral normalization (Miyato et al, 2018).
bound (float) – a bound on either the weight or spectral norm, when either of those two types of regularization are chosen by the normalization argument. A lower value for this results in fewer required terms of the truncated power series to closely approximate the exact value of the matrix exponential.
References:
- [1] Emiel Hoogeboom, Victor Garcia Satorras, Jakub M. Tomczak, Max Welling. The
Convolution Exponential and Generalized Sylvester Flows. [arXiv:2006.01910]
- [2] Tim Salimans, Diederik P. Kingma. Weight Normalization: A Simple
Reparameterization to Accelerate Training of Deep Neural Networks. [arXiv:1602.07868]
- [3] Takeru Miyato, Toshiki Kataoka, Masanori Koyama, Yuichi Yoshida. Spectral
Normalization for Generative Adversarial Networks. ICLR 2018.
- bijective = True
- codomain: torch.distributions.constraints.Constraint = IndependentConstraint(Real(), 1)
- domain: torch.distributions.constraints.Constraint = IndependentConstraint(Real(), 1)
NeuralAutoregressive
- class NeuralAutoregressive(autoregressive_nn, hidden_units=16, activation='sigmoid')[source]
Bases:
pyro.distributions.torch_transform.TransformModuleAn implementation of the deep Neural Autoregressive Flow (NAF) bijective transform of the “IAF flavour” that can be used for sampling and scoring samples drawn from it (but not arbitrary ones).
Example usage:
>>> from pyro.nn import AutoRegressiveNN >>> base_dist = dist.Normal(torch.zeros(10), torch.ones(10)) >>> arn = AutoRegressiveNN(10, [40], param_dims=[16]*3) >>> transform = NeuralAutoregressive(arn, hidden_units=16) >>> pyro.module("my_transform", transform) >>> flow_dist = dist.TransformedDistribution(base_dist, [transform]) >>> flow_dist.sample()
The inverse operation is not implemented. This would require numerical inversion, e.g., using a root finding method - a possibility for a future implementation.
- Parameters
autoregressive_nn (nn.Module) – an autoregressive neural network whose forward call returns a tuple of three real-valued tensors, whose last dimension is the input dimension, and whose penultimate dimension is equal to hidden_units.
hidden_units (int) – the number of hidden units to use in the NAF transformation (see Eq (8) in reference)
activation (string) – Activation function to use. One of ‘ELU’, ‘LeakyReLU’, ‘sigmoid’, or ‘tanh’.
Reference:
[1] Chin-Wei Huang, David Krueger, Alexandre Lacoste, Aaron Courville. Neural Autoregressive Flows. [arXiv:1804.00779]
- autoregressive = True
- bijective = True
- codomain: torch.distributions.constraints.Constraint = IndependentConstraint(Real(), 1)
- domain: torch.distributions.constraints.Constraint = IndependentConstraint(Real(), 1)
- eps = 1e-08
Planar
- class Planar(input_dim)[source]
Bases:
pyro.distributions.transforms.planar.ConditionedPlanar,pyro.distributions.torch_transform.TransformModuleA ‘planar’ bijective transform with equation,
\(\mathbf{y} = \mathbf{x} + \mathbf{u}\tanh(\mathbf{w}^T\mathbf{z}+b)\)
where \(\mathbf{x}\) are the inputs, \(\mathbf{y}\) are the outputs, and the learnable parameters are \(b\in\mathbb{R}\), \(\mathbf{u}\in\mathbb{R}^D\), \(\mathbf{w}\in\mathbb{R}^D\) for input dimension \(D\). For this to be an invertible transformation, the condition \(\mathbf{w}^T\mathbf{u}>-1\) is enforced.
Together with
TransformedDistributionthis provides a way to create richer variational approximations.Example usage:
>>> base_dist = dist.Normal(torch.zeros(10), torch.ones(10)) >>> transform = Planar(10) >>> pyro.module("my_transform", transform) >>> flow_dist = dist.TransformedDistribution(base_dist, [transform]) >>> flow_dist.sample()
The inverse of this transform does not possess an analytical solution and is left unimplemented. However, the inverse is cached when the forward operation is called during sampling, and so samples drawn using the planar transform can be scored.
- Parameters
input_dim (int) – the dimension of the input (and output) variable.
References:
[1] Danilo Jimenez Rezende, Shakir Mohamed. Variational Inference with Normalizing Flows. [arXiv:1505.05770]
- bijective = True
- codomain: torch.distributions.constraints.Constraint = IndependentConstraint(Real(), 1)
- domain: torch.distributions.constraints.Constraint = IndependentConstraint(Real(), 1)
Polynomial
- class Polynomial(autoregressive_nn, input_dim, count_degree, count_sum)[source]
Bases:
pyro.distributions.torch_transform.TransformModuleAn autoregressive bijective transform as described in Jaini et al. (2019) applying following equation element-wise,
\(y_n = c_n + \int^{x_n}_0\sum^K_{k=1}\left(\sum^R_{r=0}a^{(n)}_{r,k}u^r\right)du\)
where \(x_n\) is the \(n\) is the \(n\), \(\left\{a^{(n)}_{r,k}\in\mathbb{R}\right\}\) are learnable parameters that are the output of an autoregressive NN inputting \(x_{\prec n}={x_1,x_2,\ldots,x_{n-1}}\).
Together with
TransformedDistributionthis provides a way to create richer variational approximations.Example usage:
>>> from pyro.nn import AutoRegressiveNN >>> input_dim = 10 >>> count_degree = 4 >>> count_sum = 3 >>> base_dist = dist.Normal(torch.zeros(input_dim), torch.ones(input_dim)) >>> param_dims = [(count_degree + 1)*count_sum] >>> arn = AutoRegressiveNN(input_dim, [input_dim*10], param_dims) >>> transform = Polynomial(arn, input_dim=input_dim, count_degree=count_degree, ... count_sum=count_sum) >>> pyro.module("my_transform", transform) >>> flow_dist = dist.TransformedDistribution(base_dist, [transform]) >>> flow_dist.sample()
The inverse of this transform does not possess an analytical solution and is left unimplemented. However, the inverse is cached when the forward operation is called during sampling, and so samples drawn using a polynomial transform can be scored.
- Parameters
autoregressive_nn (nn.Module) – an autoregressive neural network whose forward call returns a tensor of real-valued numbers of size (batch_size, (count_degree+1)*count_sum, input_dim)
count_degree (int) – The degree of the polynomial to use for each element-wise transformation.
count_sum (int) – The number of polynomials to sum in each element-wise transformation.
References:
[1] Priyank Jaini, Kira A. Shelby, Yaoliang Yu. Sum-of-squares polynomial flow. [arXiv:1905.02325]
- autoregressive = True
- bijective = True
- codomain: torch.distributions.constraints.Constraint = IndependentConstraint(Real(), 1)
- domain: torch.distributions.constraints.Constraint = IndependentConstraint(Real(), 1)
Radial
- class Radial(input_dim)[source]
Bases:
pyro.distributions.transforms.radial.ConditionedRadial,pyro.distributions.torch_transform.TransformModuleA ‘radial’ bijective transform using the equation,
\(\mathbf{y} = \mathbf{x} + \beta h(\alpha,r)(\mathbf{x} - \mathbf{x}_0)\)
where \(\mathbf{x}\) are the inputs, \(\mathbf{y}\) are the outputs, and the learnable parameters are \(\alpha\in\mathbb{R}^+\), \(\beta\in\mathbb{R}\), \(\mathbf{x}_0\in\mathbb{R}^D\), for input dimension \(D\), \(r=||\mathbf{x}-\mathbf{x}_0||_2\), \(h(\alpha,r)=1/(\alpha+r)\). For this to be an invertible transformation, the condition \(\beta>-\alpha\) is enforced.
Example usage:
>>> base_dist = dist.Normal(torch.zeros(10), torch.ones(10)) >>> transform = Radial(10) >>> pyro.module("my_transform", transform) >>> flow_dist = dist.TransformedDistribution(base_dist, [transform]) >>> flow_dist.sample()
The inverse of this transform does not possess an analytical solution and is left unimplemented. However, the inverse is cached when the forward operation is called during sampling, and so samples drawn using the radial transform can be scored.
- Parameters
input_dim (int) – the dimension of the input (and output) variable.
References:
[1] Danilo Jimenez Rezende, Shakir Mohamed. Variational Inference with Normalizing Flows. [arXiv:1505.05770]
- bijective = True
- codomain: torch.distributions.constraints.Constraint = IndependentConstraint(Real(), 1)
- domain: torch.distributions.constraints.Constraint = IndependentConstraint(Real(), 1)
Spline
- class Spline(input_dim, count_bins=8, bound=3.0, order='linear')[source]
Bases:
pyro.distributions.transforms.spline.ConditionedSpline,pyro.distributions.torch_transform.TransformModuleAn implementation of the element-wise rational spline bijections of linear and quadratic order (Durkan et al., 2019; Dolatabadi et al., 2020). Rational splines are functions that are comprised of segments that are the ratio of two polynomials. For instance, for the \(d\)-th dimension and the \(k\)-th segment on the spline, the function will take the form,
\(y_d = \frac{\alpha^{(k)}(x_d)}{\beta^{(k)}(x_d)},\)
where \(\alpha^{(k)}\) and \(\beta^{(k)}\) are two polynomials of order \(d\). For \(d=1\), we say that the spline is linear, and for \(d=2\), quadratic. The spline is constructed on the specified bounding box, \([-K,K]\times[-K,K]\), with the identity function used elsewhere.
Rational splines offer an excellent combination of functional flexibility whilst maintaining a numerically stable inverse that is of the same computational and space complexities as the forward operation. This element-wise transform permits the accurate represention of complex univariate distributions.
Example usage:
>>> base_dist = dist.Normal(torch.zeros(10), torch.ones(10)) >>> transform = Spline(10, count_bins=4, bound=3.) >>> pyro.module("my_transform", transform) >>> flow_dist = dist.TransformedDistribution(base_dist, [transform]) >>> flow_dist.sample()
- Parameters
input_dim (int) – Dimension of the input vector. This is required so we know how many parameters to store.
count_bins (int) – The number of segments comprising the spline.
bound (float) – The quantity \(K\) determining the bounding box, \([-K,K]\times[-K,K]\), of the spline.
order (string) – One of [‘linear’, ‘quadratic’] specifying the order of the spline.
References:
Conor Durkan, Artur Bekasov, Iain Murray, George Papamakarios. Neural Spline Flows. NeurIPS 2019.
Hadi M. Dolatabadi, Sarah Erfani, Christopher Leckie. Invertible Generative Modeling using Linear Rational Splines. AISTATS 2020.
- bijective = True
- codomain: torch.distributions.constraints.Constraint = Real()
- domain: torch.distributions.constraints.Constraint = Real()
SplineAutoregressive
- class SplineAutoregressive(input_dim, autoregressive_nn, count_bins=8, bound=3.0, order='linear')[source]
Bases:
pyro.distributions.torch_transform.TransformModuleAn implementation of the autoregressive layer with rational spline bijections of linear and quadratic order (Durkan et al., 2019; Dolatabadi et al., 2020). Rational splines are functions that are comprised of segments that are the ratio of two polynomials (see
Spline).The autoregressive layer uses the transformation,
\(y_d = g_{\theta_d}(x_d)\ \ \ d=1,2,\ldots,D\)
where \(\mathbf{x}=(x_1,x_2,\ldots,x_D)\) are the inputs, \(\mathbf{y}=(y_1,y_2,\ldots,y_D)\) are the outputs, \(g_{\theta_d}\) is an elementwise rational monotonic spline with parameters \(\theta_d\), and \(\theta=(\theta_1,\theta_2,\ldots,\theta_D)\) is the output of an autoregressive NN inputting \(\mathbf{x}\).
Example usage:
>>> from pyro.nn import AutoRegressiveNN >>> input_dim = 10 >>> count_bins = 8 >>> base_dist = dist.Normal(torch.zeros(input_dim), torch.ones(input_dim)) >>> hidden_dims = [input_dim * 10, input_dim * 10] >>> param_dims = [count_bins, count_bins, count_bins - 1, count_bins] >>> hypernet = AutoRegressiveNN(input_dim, hidden_dims, param_dims=param_dims) >>> transform = SplineAutoregressive(input_dim, hypernet, count_bins=count_bins) >>> pyro.module("my_transform", transform) >>> flow_dist = dist.TransformedDistribution(base_dist, [transform]) >>> flow_dist.sample()
- Parameters
input_dim (int) – Dimension of the input vector. Despite operating element-wise, this is required so we know how many parameters to store.
autoregressive_nn (callable) – an autoregressive neural network whose forward call returns tuple of the spline parameters
count_bins (int) – The number of segments comprising the spline.
bound (float) – The quantity \(K\) determining the bounding box, \([-K,K]\times[-K,K]\), of the spline.
order (string) – One of [‘linear’, ‘quadratic’] specifying the order of the spline.
References:
Conor Durkan, Artur Bekasov, Iain Murray, George Papamakarios. Neural Spline Flows. NeurIPS 2019.
Hadi M. Dolatabadi, Sarah Erfani, Christopher Leckie. Invertible Generative Modeling using Linear Rational Splines. AISTATS 2020.
- autoregressive = True
- bijective = True
- codomain: torch.distributions.constraints.Constraint = IndependentConstraint(Real(), 1)
- domain: torch.distributions.constraints.Constraint = IndependentConstraint(Real(), 1)
SplineCoupling
- class SplineCoupling(input_dim, split_dim, hypernet, count_bins=8, bound=3.0, order='linear', identity=False)[source]
Bases:
pyro.distributions.torch_transform.TransformModuleAn implementation of the coupling layer with rational spline bijections of linear and quadratic order (Durkan et al., 2019; Dolatabadi et al., 2020). Rational splines are functions that are comprised of segments that are the ratio of two polynomials (see
Spline).The spline coupling layer uses the transformation,
\(\mathbf{y}_{1:d} = g_\theta(\mathbf{x}_{1:d})\) \(\mathbf{y}_{(d+1):D} = h_\phi(\mathbf{x}_{(d+1):D};\mathbf{x}_{1:d})\)
where \(\mathbf{x}\) are the inputs, \(\mathbf{y}\) are the outputs, e.g. \(\mathbf{x}_{1:d}\) represents the first \(d\) elements of the inputs, \(g_\theta\) is either the identity function or an elementwise rational monotonic spline with parameters \(\theta\), and \(h_\phi\) is a conditional elementwise spline spline, conditioning on the first \(d\) elements.
Example usage:
>>> from pyro.nn import DenseNN >>> input_dim = 10 >>> split_dim = 6 >>> count_bins = 8 >>> base_dist = dist.Normal(torch.zeros(input_dim), torch.ones(input_dim)) >>> param_dims = [(input_dim - split_dim) * count_bins, ... (input_dim - split_dim) * count_bins, ... (input_dim - split_dim) * (count_bins - 1), ... (input_dim - split_dim) * count_bins] >>> hypernet = DenseNN(split_dim, [10*input_dim], param_dims) >>> transform = SplineCoupling(input_dim, split_dim, hypernet) >>> pyro.module("my_transform", transform) >>> flow_dist = dist.TransformedDistribution(base_dist, [transform]) >>> flow_dist.sample()
- Parameters
input_dim (int) – Dimension of the input vector. Despite operating element-wise, this is required so we know how many parameters to store.
split_dim – Zero-indexed dimension \(d\) upon which to perform input/ output split for transformation.
hypernet (callable) – a neural network whose forward call returns a tuple of spline parameters (see
ConditionalSpline).count_bins (int) – The number of segments comprising the spline.
bound (float) – The quantity \(K\) determining the bounding box, \([-K,K]\times[-K,K]\), of the spline.
order (string) – One of [‘linear’, ‘quadratic’] specifying the order of the spline.
References:
Conor Durkan, Artur Bekasov, Iain Murray, George Papamakarios. Neural Spline Flows. NeurIPS 2019.
Hadi M. Dolatabadi, Sarah Erfani, Christopher Leckie. Invertible Generative Modeling using Linear Rational Splines. AISTATS 2020.
- bijective = True
- codomain: torch.distributions.constraints.Constraint = IndependentConstraint(Real(), 1)
- domain: torch.distributions.constraints.Constraint = IndependentConstraint(Real(), 1)
Sylvester
- class Sylvester(input_dim, count_transforms=1)[source]
Bases:
pyro.distributions.transforms.householder.HouseholderAn implementation of the Sylvester bijective transform of the Householder variety (Van den Berg Et Al., 2018),
\(\mathbf{y} = \mathbf{x} + QR\tanh(SQ^T\mathbf{x}+\mathbf{b})\)
where \(\mathbf{x}\) are the inputs, \(\mathbf{y}\) are the outputs, \(R,S\sim D\times D\) are upper triangular matrices for input dimension \(D\), \(Q\sim D\times D\) is an orthogonal matrix, and \(\mathbf{b}\sim D\) is learnable bias term.
The Sylvester transform is a generalization of
Planar. In the Householder type of the Sylvester transform, the orthogonality of \(Q\) is enforced by representing it as the product of Householder transformations.Together with
TransformedDistributionit provides a way to create richer variational approximations.Example usage:
>>> base_dist = dist.Normal(torch.zeros(10), torch.ones(10)) >>> transform = Sylvester(10, count_transforms=4) >>> pyro.module("my_transform", transform) >>> flow_dist = dist.TransformedDistribution(base_dist, [transform]) >>> flow_dist.sample() tensor([-0.4071, -0.5030, 0.7924, -0.2366, -0.2387, -0.1417, 0.0868, 0.1389, -0.4629, 0.0986])
The inverse of this transform does not possess an analytical solution and is left unimplemented. However, the inverse is cached when the forward operation is called during sampling, and so samples drawn using the Sylvester transform can be scored.
References:
[1] Rianne van den Berg, Leonard Hasenclever, Jakub M. Tomczak, Max Welling. Sylvester Normalizing Flows for Variational Inference. UAI 2018.
- bijective = True
- codomain: torch.distributions.constraints.Constraint = IndependentConstraint(Real(), 1)
- domain: torch.distributions.constraints.Constraint = IndependentConstraint(Real(), 1)
TransformModule
- class TransformModule(*args, **kwargs)[source]
Bases:
torch.distributions.transforms.Transform,torch.nn.modules.module.ModuleTransforms with learnable parameters such as normalizing flows should inherit from this class rather than Transform so they are also a subclass of nn.Module and inherit all the useful methods of that class.
ComposeTransformModule
- class ComposeTransformModule(parts, cache_size=0)[source]
Bases:
torch.distributions.transforms.ComposeTransform,torch.nn.modules.container.ModuleListThis allows us to use a list of TransformModule in the same way as
ComposeTransform. This is needed so that transform parameters are automatically registered by Pyro’s param store when used inPyroModuleinstances.
Transform Factories
Each Transform and TransformModule includes a corresponding helper function in lower case that inputs, at minimum, the input dimensions of the transform, and possibly additional arguments to customize the transform in an intuitive way. The purpose of these helper functions is to hide from the user whether or not the transform requires the construction of a hypernet, and if so, the input and output dimensions of that hypernet.
iterated
- iterated(repeats, base_fn, *args, **kwargs)[source]
Helper function to compose a sequence of bijective transforms with potentially learnable parameters using
ComposeTransformModule.- Parameters
repeats – number of repeated transforms.
base_fn – function to construct the bijective transform.
args – arguments taken by base_fn.
kwargs – keyword arguments taken by base_fn.
- Returns
instance of
TransformModule.
affine_autoregressive
- affine_autoregressive(input_dim, hidden_dims=None, **kwargs)[source]
A helper function to create an
AffineAutoregressiveobject that takes care of constructing an autoregressive network with the correct input/output dimensions.- Parameters
input_dim (int) – Dimension of input variable
hidden_dims (list[int]) – The desired hidden dimensions of the autoregressive network. Defaults to using [3*input_dim + 1]
log_scale_min_clip (float) – The minimum value for clipping the log(scale) from the autoregressive NN
log_scale_max_clip (float) – The maximum value for clipping the log(scale) from the autoregressive NN
sigmoid_bias (float) – A term to add the logit of the input when using the stable tranform.
stable (bool) – When true, uses the alternative “stable” version of the transform (see above).
affine_coupling
- affine_coupling(input_dim, hidden_dims=None, split_dim=None, dim=- 1, **kwargs)[source]
A helper function to create an
AffineCouplingobject that takes care of constructing a dense network with the correct input/output dimensions.- Parameters
input_dim (int) – Dimension(s) of input variable to permute. Note that when dim < -1 this must be a tuple corresponding to the event shape.
hidden_dims (list[int]) – The desired hidden dimensions of the dense network. Defaults to using [10*input_dim]
split_dim (int) – The dimension to split the input on for the coupling transform. Defaults to using input_dim // 2
dim (int) – the tensor dimension on which to split. This value must be negative and defines the event dim as abs(dim).
log_scale_min_clip (float) – The minimum value for clipping the log(scale) from the autoregressive NN
log_scale_max_clip (float) – The maximum value for clipping the log(scale) from the autoregressive NN
batchnorm
block_autoregressive
- block_autoregressive(input_dim, **kwargs)[source]
A helper function to create a
BlockAutoregressiveobject for consistency with other helpers.- Parameters
input_dim (int) – Dimension of input variable
hidden_factors (list) – Hidden layer i has hidden_factors[i] hidden units per input dimension. This corresponds to both \(a\) and \(b\) in De Cao et al. (2019). The elements of hidden_factors must be integers.
activation (string) – Activation function to use. One of ‘ELU’, ‘LeakyReLU’, ‘sigmoid’, or ‘tanh’.
residual (string) – Type of residual connections to use. Choices are “None”, “normal” for \(\mathbf{y}+f(\mathbf{y})\), and “gated” for \(\alpha\mathbf{y} + (1 - \alpha\mathbf{y})\) for learnable parameter \(\alpha\).
conditional_affine_autoregressive
- conditional_affine_autoregressive(input_dim, context_dim, hidden_dims=None, **kwargs)[source]
A helper function to create an
ConditionalAffineAutoregressiveobject that takes care of constructing a dense network with the correct input/output dimensions.- Parameters
input_dim (int) – Dimension of input variable
context_dim (int) – Dimension of context variable
hidden_dims (list[int]) – The desired hidden dimensions of the dense network. Defaults to using [10*input_dim]
log_scale_min_clip (float) – The minimum value for clipping the log(scale) from the autoregressive NN
log_scale_max_clip (float) – The maximum value for clipping the log(scale) from the autoregressive NN
sigmoid_bias (float) – A term to add the logit of the input when using the stable tranform.
stable (bool) – When true, uses the alternative “stable” version of the transform (see above).
conditional_affine_coupling
- conditional_affine_coupling(input_dim, context_dim, hidden_dims=None, split_dim=None, dim=- 1, **kwargs)[source]
A helper function to create an
ConditionalAffineCouplingobject that takes care of constructing a dense network with the correct input/output dimensions.- Parameters
input_dim (int) – Dimension of input variable
context_dim (int) – Dimension of context variable
hidden_dims (list[int]) – The desired hidden dimensions of the dense network. Defaults to using [10*input_dim]
split_dim (int) – The dimension to split the input on for the coupling transform. Defaults to using input_dim // 2
dim (int) – the tensor dimension on which to split. This value must be negative and defines the event dim as abs(dim).
log_scale_min_clip (float) – The minimum value for clipping the log(scale) from the autoregressive NN
log_scale_max_clip (float) – The maximum value for clipping the log(scale) from the autoregressive NN
conditional_generalized_channel_permute
- conditional_generalized_channel_permute(context_dim, channels=3, hidden_dims=None)[source]
A helper function to create a
ConditionalGeneralizedChannelPermuteobject for consistency with other helpers.- Parameters
channels (int) – Number of channel dimensions in the input.
conditional_householder
- conditional_householder(input_dim, context_dim, hidden_dims=None, count_transforms=1)[source]
A helper function to create a
ConditionalHouseholderobject that takes care of constructing a dense network with the correct input/output dimensions.
conditional_matrix_exponential
- conditional_matrix_exponential(input_dim, context_dim, hidden_dims=None, iterations=8, normalization='none', bound=None)[source]
A helper function to create a
ConditionalMatrixExponentialobject for consistency with other helpers.- Parameters
input_dim (int) – Dimension of input variable
context_dim (int) – Dimension of context variable
hidden_dims (list[int]) – The desired hidden dimensions of the dense network. Defaults to using [input_dim * 10, input_dim * 10]
iterations (int) – the number of terms to use in the truncated power series that approximates matrix exponentiation.
normalization (string) – One of [‘none’, ‘weight’, ‘spectral’] normalization that selects what type of normalization to apply to the weight matrix. weight corresponds to weight normalization (Salimans and Kingma, 2016) and spectral to spectral normalization (Miyato et al, 2018).
bound (float) – a bound on either the weight or spectral norm, when either of those two types of regularization are chosen by the normalization argument. A lower value for this results in fewer required terms of the truncated power series to closely approximate the exact value of the matrix exponential.
conditional_neural_autoregressive
- conditional_neural_autoregressive(input_dim, context_dim, hidden_dims=None, activation='sigmoid', width=16)[source]
A helper function to create a
ConditionalNeuralAutoregressiveobject that takes care of constructing an autoregressive network with the correct input/output dimensions.- Parameters
input_dim (int) – Dimension of input variable
context_dim (int) – Dimension of context variable
hidden_dims (list[int]) – The desired hidden dimensions of the autoregressive network. Defaults to using [3*input_dim + 1]
activation (string) – Activation function to use. One of ‘ELU’, ‘LeakyReLU’, ‘sigmoid’, or ‘tanh’.
width (int) – The width of the “multilayer perceptron” in the transform (see paper). Defaults to 16
conditional_planar
- conditional_planar(input_dim, context_dim, hidden_dims=None)[source]
A helper function to create a
ConditionalPlanarobject that takes care of constructing a dense network with the correct input/output dimensions.
conditional_radial
- conditional_radial(input_dim, context_dim, hidden_dims=None)[source]
A helper function to create a
ConditionalRadialobject that takes care of constructing a dense network with the correct input/output dimensions.
conditional_spline
- conditional_spline(input_dim, context_dim, hidden_dims=None, count_bins=8, bound=3.0, order='linear')[source]
A helper function to create a
ConditionalSplineobject that takes care of constructing a dense network with the correct input/output dimensions.- Parameters
input_dim (int) – Dimension of input variable
context_dim (int) – Dimension of context variable
hidden_dims (list[int]) – The desired hidden dimensions of the dense network. Defaults to using [input_dim * 10, input_dim * 10]
count_bins (int) – The number of segments comprising the spline.
bound (float) – The quantity \(K\) determining the bounding box, \([-K,K] imes[-K,K]\), of the spline.
order (string) – One of [‘linear’, ‘quadratic’] specifying the order of the spline.
conditional_spline_autoregressive
- conditional_spline_autoregressive(input_dim, context_dim, hidden_dims=None, count_bins=8, bound=3.0, order='linear')[source]
A helper function to create a
ConditionalSplineAutoregressiveobject that takes care of constructing an autoregressive network with the correct input/output dimensions.- Parameters
input_dim (int) – Dimension of input variable
context_dim (int) – Dimension of context variable
hidden_dims (list[int]) – The desired hidden dimensions of the autoregressive network. Defaults to using [input_dim * 10, input_dim * 10]
count_bins (int) – The number of segments comprising the spline.
bound (float) – The quantity \(K\) determining the bounding box, \([-K,K]\times[-K,K]\), of the spline.
order (string) – One of [‘linear’, ‘quadratic’] specifying the order of the spline.
elu
generalized_channel_permute
- generalized_channel_permute(**kwargs)[source]
A helper function to create a
GeneralizedChannelPermuteobject for consistency with other helpers.- Parameters
channels (int) – Number of channel dimensions in the input.
householder
- householder(input_dim, count_transforms=None)[source]
A helper function to create a
Householderobject for consistency with other helpers.
leaky_relu
- leaky_relu()[source]
A helper function to create a
LeakyReLUTransformobject for consistency with other helpers.
matrix_exponential
- matrix_exponential(input_dim, iterations=8, normalization='none', bound=None)[source]
A helper function to create a
MatrixExponentialobject for consistency with other helpers.- Parameters
input_dim (int) – Dimension of input variable
iterations (int) – the number of terms to use in the truncated power series that approximates matrix exponentiation.
normalization (string) – One of [‘none’, ‘weight’, ‘spectral’] normalization that selects what type of normalization to apply to the weight matrix. weight corresponds to weight normalization (Salimans and Kingma, 2016) and spectral to spectral normalization (Miyato et al, 2018).
bound (float) – a bound on either the weight or spectral norm, when either of those two types of regularization are chosen by the normalization argument. A lower value for this results in fewer required terms of the truncated power series to closely approximate the exact value of the matrix exponential.
neural_autoregressive
- neural_autoregressive(input_dim, hidden_dims=None, activation='sigmoid', width=16)[source]
A helper function to create a
NeuralAutoregressiveobject that takes care of constructing an autoregressive network with the correct input/output dimensions.- Parameters
input_dim (int) – Dimension of input variable
hidden_dims (list[int]) – The desired hidden dimensions of the autoregressive network. Defaults to using [3*input_dim + 1]
activation (string) – Activation function to use. One of ‘ELU’, ‘LeakyReLU’, ‘sigmoid’, or ‘tanh’.
width (int) – The width of the “multilayer perceptron” in the transform (see paper). Defaults to 16
permute
- permute(input_dim, permutation=None, dim=- 1)[source]
A helper function to create a
Permuteobject for consistency with other helpers.- Parameters
input_dim (int) – Dimension(s) of input variable to permute. Note that when dim < -1 this must be a tuple corresponding to the event shape.
permutation (torch.LongTensor) – Torch tensor of integer indices representing permutation. Defaults to a random permutation.
dim (int) – the tensor dimension to permute. This value must be negative and defines the event dim as abs(dim).
planar
polynomial
- polynomial(input_dim, hidden_dims=None)[source]
A helper function to create a
Polynomialobject that takes care of constructing an autoregressive network with the correct input/output dimensions.- Parameters
input_dim (int) – Dimension of input variable
hidden_dims – The desired hidden dimensions of of the autoregressive network. Defaults to using [input_dim * 10]
radial
spline
spline_autoregressive
- spline_autoregressive(input_dim, hidden_dims=None, count_bins=8, bound=3.0, order='linear')[source]
A helper function to create an
SplineAutoregressiveobject that takes care of constructing an autoregressive network with the correct input/output dimensions.- Parameters
input_dim (int) – Dimension of input variable
hidden_dims (list[int]) – The desired hidden dimensions of the autoregressive network. Defaults to using [3*input_dim + 1]
count_bins (int) – The number of segments comprising the spline.
bound (float) – The quantity \(K\) determining the bounding box, \([-K,K]\times[-K,K]\), of the spline.
order (string) – One of [‘linear’, ‘quadratic’] specifying the order of the spline.
spline_coupling
- spline_coupling(input_dim, split_dim=None, hidden_dims=None, count_bins=8, bound=3.0)[source]
A helper function to create a
SplineCouplingobject for consistency with other helpers.- Parameters
input_dim (int) – Dimension of input variable
sylvester
- sylvester(input_dim, count_transforms=None)[source]
A helper function to create a
Sylvesterobject for consistency with other helpers.- Parameters
input_dim (int) – Dimension of input variable
count_transforms – Number of Sylvester operations to apply. Defaults to input_dim // 2 + 1. :type count_transforms: int
Constraints
Pyro’s constraints library extends
torch.distributions.constraints.
Constraint
boolean
alias of torch.distributions.constraints.boolean
cat
alias of torch.distributions.constraints.cat
corr_cholesky
alias of torch.distributions.constraints.corr_cholesky
corr_cholesky_constraint
alias of torch.distributions.constraints.corr_cholesky_constraint
corr_matrix
dependent
alias of torch.distributions.constraints.dependent
dependent_property
alias of torch.distributions.constraints.dependent_property
greater_than
alias of torch.distributions.constraints.greater_than
greater_than_eq
alias of torch.distributions.constraints.greater_than_eq
half_open_interval
alias of torch.distributions.constraints.half_open_interval
independent
alias of torch.distributions.constraints.independent
integer
integer_interval
alias of torch.distributions.constraints.integer_interval
interval
alias of torch.distributions.constraints.interval
is_dependent
alias of torch.distributions.constraints.is_dependent
less_than
alias of torch.distributions.constraints.less_than
lower_cholesky
alias of torch.distributions.constraints.lower_cholesky
lower_triangular
alias of torch.distributions.constraints.lower_triangular
multinomial
alias of torch.distributions.constraints.multinomial
nonnegative
alias of torch.distributions.constraints.nonnegative
nonnegative_integer
alias of torch.distributions.constraints.nonnegative_integer
one_hot
alias of torch.distributions.constraints.one_hot
ordered_vector
positive
alias of torch.distributions.constraints.positive
positive_definite
alias of torch.distributions.constraints.positive_definite
positive_integer
alias of torch.distributions.constraints.positive_integer
positive_ordered_vector
positive_semidefinite
alias of torch.distributions.constraints.positive_semidefinite
real
alias of torch.distributions.constraints.real
real_vector
alias of torch.distributions.constraints.real_vector
simplex
alias of torch.distributions.constraints.simplex
softplus_lower_cholesky
softplus_positive
sphere
square
alias of torch.distributions.constraints.square
stack
alias of torch.distributions.constraints.stack
symmetric
alias of torch.distributions.constraints.symmetric
unit_interval
alias of torch.distributions.constraints.unit_interval