Skip to content

distributions

Ivan Svetunkov edited this page Feb 24, 2026 · 2 revisions

Distributions

Overview

Greybox supports 26 distributions for use with ALM. Each distribution is identified by a short code (e.g. "dnorm") used in the distribution parameter. The d/p/q/r convention follows R: density, CDF, quantile, and random generation.

Distribution Families

Continuous (location-scale)

These distributions model continuous data with identity link (mu = X @ beta).

Code Full Name Extra Parameter Use Case
dnorm Normal General continuous data, default
dlaplace Laplace Heavy tails, robust to outliers
ds S (half-Laplace) Light-tailed data
dgnorm Generalized Normal shape (default 2.0) Flexible tail weight; shape=2 is Normal, shape=1 is Laplace
dlogis Logistic Heavy tails, symmetric, longer tails than Normal
dt Student's t nu (default 2) Heavy tails, small samples; nu→∞ approaches Normal
dalaplace Asymmetric Laplace alpha (default 0.5, range 0–1) Quantile regression; alpha=0.5 is symmetric Laplace

Log-transformed continuous

These model positive continuous data. The location parameter operates on the log scale.

Code Full Name Extra Parameter Use Case
dlnorm Log-Normal Positive, right-skewed (e.g. prices, durations)
dllaplace Log-Laplace Positive, heavy-tailed
dls Log-S Positive, light-tailed
dlgnorm Log-Generalized Normal shape (default 2.0) Positive data, flexible tails

Bounded / special continuous

Code Full Name Extra Parameter Use Case
dfnorm Folded Normal Absolute values, non-negative data
drectnorm Rectified Normal Zero-inflated non-negative (zeros are structural)
dbcnorm Box-Cox Normal lambda_bc (default 0.1, range 0–1) Non-normal data, power transformation
dlogitnorm Logit-Normal Proportions in (0, 1)
dbeta Beta Proportions in (0, 1), two-part model (shape1 + shape2)

Non-negative continuous (log-link)

These use log-link: mu = exp(X @ beta), so coefficients are initialized from lstsq(X, log(y)).

Code Full Name Extra Parameter Use Case
dinvgauss Inverse Gaussian Positive, right-skewed (e.g. waiting times)
dgamma Gamma Positive, right-skewed (e.g. insurance claims)
dexp Exponential Time between events, memoryless

Discrete (log-link)

Count data distributions using log-link: mu = exp(X @ beta).

Code Full Name Extra Parameter Use Case
dpois Poisson Count data where mean ≈ variance
dnbinom Negative Binomial size (default var(y)) Overdispersed count data (variance > mean)
dbinom Binomial Binary/count with known number of trials
dgeom Geometric Number of trials until first success
dchisq Chi-squared nu (default 1) Sum of squared normal variables

Binary / occurrence

These model binary (0/1) outcomes.

Code Full Name Link Use Case
plogis Logistic CDF logistic Binary classification (logistic regression)
pnorm Normal CDF (probit) probit Binary classification (probit regression)

Extra Parameters

Some distributions require an additional parameter beyond the standard location and scale. If not provided by the user, ALM estimates it automatically.

Distribution Parameter R argument Python argument Default (if estimated)
dalaplace Quantile level alpha alpha 0.5
dgnorm Shape shape shape 2.0
dlgnorm Shape shape shape 2.0
dbcnorm Box-Cox lambda lambdaBC lambda_bc 0.1
dt Degrees of freedom nu nu 2
dchisq Degrees of freedom nu nu 1
dnbinom Size size size var(y)
dfnorm Estimated as sd(y)
drectnorm Estimated as sd(y)

Distribution Functions (Python)

The greybox.distributions module provides d/p/q/r functions for each distribution:

  • d — density (PDF/PMF): dnorm(x, loc=0, scale=1)
  • p — cumulative distribution (CDF): pnorm(q, loc=0, scale=1)
  • q — quantile (inverse CDF): qnorm(p, loc=0, scale=1)
  • r — random generation: rnorm(n, loc=0, scale=1)
from greybox import distributions as dist

# Normal distribution
dist.dnorm(0, loc=0, scale=1)       # density at x=0
dist.pnorm(1.96, loc=0, scale=1)    # CDF at q=1.96
dist.qnorm(0.975, loc=0, scale=1)   # quantile at p=0.975
dist.rnorm(100, loc=0, scale=1)     # 100 random draws

# Laplace distribution
dist.dlaplace(0, loc=0, scale=1)
dist.plaplace(0, loc=0, scale=1)

# Generalized Normal
dist.dgnorm(0, loc=0, scale=1, shape=2)

Using Distributions with ALM

# R — Laplace regression
model <- alm(y ~ x1 + x2, data, distribution="dlaplace")

# R — Quantile regression (median)
model <- alm(y ~ x1 + x2, data, distribution="dalaplace", alpha=0.5)

# R — Poisson count model
model <- alm(count ~ x1 + x2, data, distribution="dpois")
# Python — Laplace regression
from greybox import ALM, formula
y, X = formula("y ~ x1 + x2", data)
model = ALM(distribution="dlaplace")
model.fit(X, y)

# Python — Quantile regression (90th percentile)
model = ALM(distribution="dalaplace", alpha=0.9)
model.fit(X, y)

# Python — Poisson count model
model = ALM(distribution="dpois")
model.fit(X, y)

References

Clone this wiki locally