distributions

Distributions

Overview

Greybox supports 26 distributions for use with ALM. Each distribution is identified by a short code (e.g. "dnorm") used in the distribution parameter. The d/p/q/r convention follows R: density, CDF, quantile, and random generation.

Distribution Families

Continuous (location-scale)

These distributions model continuous data with identity link (mu = X @ beta).

Code	Full Name	Extra Parameter	Use Case
`dnorm`	Normal	—	General continuous data, default
`dlaplace`	Laplace	—	Heavy tails, robust to outliers
`ds`	S (half-Laplace)	—	Light-tailed data
`dgnorm`	Generalized Normal	`shape` (default 2.0)	Flexible tail weight; shape=2 is Normal, shape=1 is Laplace
`dlogis`	Logistic	—	Heavy tails, symmetric, longer tails than Normal
`dt`	Student's t	`nu` (default 2)	Heavy tails, small samples; nu→∞ approaches Normal
`dalaplace`	Asymmetric Laplace	`alpha` (default 0.5, range 0–1)	Quantile regression; alpha=0.5 is symmetric Laplace

Log-transformed continuous

These model positive continuous data. The location parameter operates on the log scale.

Code	Full Name	Extra Parameter	Use Case
`dlnorm`	Log-Normal	—	Positive, right-skewed (e.g. prices, durations)
`dllaplace`	Log-Laplace	—	Positive, heavy-tailed
`dls`	Log-S	—	Positive, light-tailed
`dlgnorm`	Log-Generalized Normal	`shape` (default 2.0)	Positive data, flexible tails

Bounded / special continuous

Code	Full Name	Extra Parameter	Use Case
`dfnorm`	Folded Normal	—	Absolute values, non-negative data
`drectnorm`	Rectified Normal	—	Zero-inflated non-negative (zeros are structural)
`dbcnorm`	Box-Cox Normal	`lambda_bc` (default 0.1, range 0–1)	Non-normal data, power transformation
`dlogitnorm`	Logit-Normal	—	Proportions in (0, 1)
`dbeta`	Beta	—	Proportions in (0, 1), two-part model (shape1 + shape2)

Non-negative continuous (log-link)

These use log-link: mu = exp(X @ beta), so coefficients are initialized from lstsq(X, log(y)).

Code	Full Name	Extra Parameter	Use Case
`dinvgauss`	Inverse Gaussian	—	Positive, right-skewed (e.g. waiting times)
`dgamma`	Gamma	—	Positive, right-skewed (e.g. insurance claims)
`dexp`	Exponential	—	Time between events, memoryless

Discrete (log-link)

Count data distributions using log-link: mu = exp(X @ beta).

Code	Full Name	Extra Parameter	Use Case
`dpois`	Poisson	—	Count data where mean ≈ variance
`dnbinom`	Negative Binomial	`size` (default var(y))	Overdispersed count data (variance > mean)
`dbinom`	Binomial	—	Binary/count with known number of trials
`dgeom`	Geometric	—	Number of trials until first success
`dchisq`	Chi-squared	`nu` (default 1)	Sum of squared normal variables

Binary / occurrence

These model binary (0/1) outcomes.

Code	Full Name	Link	Use Case
`plogis`	Logistic CDF	logistic	Binary classification (logistic regression)
`pnorm`	Normal CDF (probit)	probit	Binary classification (probit regression)

Extra Parameters

Some distributions require an additional parameter beyond the standard location and scale. If not provided by the user, ALM estimates it automatically.

Distribution	Parameter	R argument	Python argument	Default (if estimated)
`dalaplace`	Quantile level	`alpha`	`alpha`	0.5
`dgnorm`	Shape	`shape`	`shape`	2.0
`dlgnorm`	Shape	`shape`	`shape`	2.0
`dbcnorm`	Box-Cox lambda	`lambdaBC`	`lambda_bc`	0.1
`dt`	Degrees of freedom	`nu`	`nu`	2
`dchisq`	Degrees of freedom	`nu`	`nu`	1
`dnbinom`	Size	`size`	`size`	var(y)
`dfnorm`	—	—	—	Estimated as sd(y)
`drectnorm`	—	—	—	Estimated as sd(y)

Distribution Functions (Python)

The greybox.distributions module provides d/p/q/r functions for each distribution:

d — density (PDF/PMF): dnorm(x, loc=0, scale=1)
p — cumulative distribution (CDF): pnorm(q, loc=0, scale=1)
q — quantile (inverse CDF): qnorm(p, loc=0, scale=1)
r — random generation: rnorm(n, loc=0, scale=1)

from greybox import distributions as dist

# Normal distribution
dist.dnorm(0, loc=0, scale=1)       # density at x=0
dist.pnorm(1.96, loc=0, scale=1)    # CDF at q=1.96
dist.qnorm(0.975, loc=0, scale=1)   # quantile at p=0.975
dist.rnorm(100, loc=0, scale=1)     # 100 random draws

# Laplace distribution
dist.dlaplace(0, loc=0, scale=1)
dist.plaplace(0, loc=0, scale=1)

# Generalized Normal
dist.dgnorm(0, loc=0, scale=1, shape=2)

Using Distributions with ALM

# R — Laplace regression
model <- alm(y ~ x1 + x2, data, distribution="dlaplace")

# R — Quantile regression (median)
model <- alm(y ~ x1 + x2, data, distribution="dalaplace", alpha=0.5)

# R — Poisson count model
model <- alm(count ~ x1 + x2, data, distribution="dpois")

# Python — Laplace regression
from greybox import ALM, formula
y, X = formula("y ~ x1 + x2", data)
model = ALM(distribution="dlaplace")
model.fit(X, y)

# Python — Quantile regression (90th percentile)
model = ALM(distribution="dalaplace", alpha=0.9)
model.fit(X, y)

# Python — Poisson count model
model = ALM(distribution="dpois")
model.fit(X, y)

References

Svetunkov, I. (2023). Statistics for Business Analytics. https://openforecast.org/sba/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

distributions

Distributions

Overview

Distribution Families

Continuous (location-scale)

Log-transformed continuous

Bounded / special continuous

Non-negative continuous (log-link)

Discrete (log-link)

Binary / occurrence

Extra Parameters

Distribution Functions (Python)

Using Distributions with ALM

References

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally