A Julia package for working with multivariate Bernoulli distributions. This package provides a complete implementation of multivariate Bernoulli distributions as a DiscreteMultivariateDistribution from Distributions.jl, with support for multiple internal representations and comprehensive statistical operations.
A multivariate Bernoulli distribution is a discrete probability distribution for a vector of binary random variables (X₁, X₂, ..., Xₘ), where each Xᵢ ∈ {0, 1}. Unlike independent Bernoulli variables, the components can be correlated.
For m variables, there are 2^m possible outcomes, each with its own probability. For example, with 2 variables, the possible outcomes are:
- (0,0) with probability P₀₀
- (0,1) with probability P₀₁
- (1,0) with probability P₁₀
- (1,1) with probability P₁₁
where P₀₀ + P₀₁ + P₁₀ + P₁₁ = 1.
using Pkg
Pkg.add("https://github.com/dufourc1/MVBernoulli.jl")using MVBernoulli
using Random
# Create a 2D multivariate Bernoulli distribution
# Probabilities: P(0,0)=0.1, P(0,1)=0.2, P(1,0)=0.3, P(1,1)=0.4
probabilities = [0.1, 0.2, 0.3, 0.4]
d = from_tabulation(probabilities)
# Generate samples
Random.seed!(42)
samples = rand(d, 5)
println("Samples: ", samples)
# Calculate marginal probabilities
marginals_prob = marginals(d)
println("P(X₁=1) = ", marginals_prob[1]) # 0.7
println("P(X₂=1) = ", marginals_prob[2]) # 0.6The package supports three equivalent representations of multivariate Bernoulli distributions:
- Tabulation: Direct specification of outcome probabilities
- Ordinary moments: Expected values of variable products
- Normalized moments: Centered moments (deviations from means)
using MVBernoulli
# Create from tabulation
d1 = from_tabulation([0.1, 0.2, 0.3, 0.4])
# Create from ordinary moments
d2 = from_ordinary_moments(d1.ordinary_moments)
# Both representations are equivalent
d1.tabulation.p ≈ d2.tabulation.p # trueCalculate various statistical properties:
using MVBernoulli, Distributions
d = from_tabulation([0.1, 0.2, 0.3, 0.4])
# Marginal probabilities
marg = marginals(d)
# Correlation matrix
corr_matrix = correlation_matrix(d)
# Covariance matrix
cov_matrix = covariance_matrix(d)
# Probability mass function
prob = exp(logpdf(d, [1, 0])) # P(X₁=1, X₂=0) = 0.2Fit distributions from observed data:
using MVBernoulli, Random
# Generate synthetic data
Random.seed!(123)
true_dist = from_tabulation([0.2, 0.3, 0.1, 0.4])
data = rand(true_dist, 1000) # 1000 samples
# Fit using MLE
estimated_dist = fit_mle(true_dist, data)
println("Original: ", true_dist.tabulation.p)
println("Estimated: ", estimated_dist.tabulation.p)Create a distribution where variables are independent:
using MVBernoulli
# For independent variables: P(X₁,X₂) = P(X₁) × P(X₂)
p1, p2 = 0.3, 0.7
independent_probs = [
(1-p1)*(1-p2), # P(0,0)
(1-p1)*p2, # P(0,1)
p1*(1-p2), # P(1,0)
p1*p2 # P(1,1)
]
d_indep = from_tabulation(independent_probs)
corr = correlation_matrix(d_indep)
println("Correlation for independent variables: ", corr[1,2]) # ≈ 0using MVBernoulli
# Variables that tend to have the same value
correlated_probs = [0.4, 0.1, 0.1, 0.4] # High prob for (0,0) and (1,1)
d_corr = from_tabulation(correlated_probs)
corr = correlation_matrix(d_corr)
println("Correlation: ", corr[1,2]) # Positive correlation
# Marginals are still balanced
marg = marginals(d_corr)
println("Marginals: ", marg) # Both ≈ 0.5The package supports conditional probabilities with missing values:
using MVBernoulli
d = from_tabulation([0.1, 0.2, 0.3, 0.4])
# P(X₁=1 | X₂=1) - probability that X₁=1 given X₂=1
cond_prob = conditional_proba(d, [1, missing], [missing, 1])
println("P(X₁=1 | X₂=1) = ", cond_prob)
# Marginal probability with missing values
marg_prob = pmf(d, [1, missing]) # P(X₁=1)
println("P(X₁=1) = ", marg_prob)using MVBernoulli
# 3D multivariate Bernoulli (8 possible outcomes)
probs_3d = [0.1, 0.05, 0.1, 0.15, 0.1, 0.15, 0.2, 0.15]
d_3d = from_tabulation(probs_3d)
println("Number of variables: ", length(d_3d)) # 3
println("Marginals: ", marginals(d_3d))
# Sample from 3D distribution
samples_3d = rand(d_3d, 3)
println("3D samples: ", samples_3d)from_tabulation(p::Vector): Create from outcome probabilitiesfrom_ordinary_moments(moments::Vector): Create from ordinary moments
marginals(d): Marginal probabilities P(Xᵢ=1)correlation_matrix(d): Correlation matrix between variablescovariance_matrix(d): Covariance matrix between variables
rand(d, n): Generate n random sampleslogpdf(d, x): Log probability densitypmf(d, x): Probability mass function (supports missing values)
fit_mle(d, data): Maximum likelihood estimation from data
conditional_proba(d, x, y): Conditional probability P(x|y)length(d): Number of variablesinsupport(d, x): Check if x is in the support of the distribution
The package handles the mathematical complexities of multivariate Bernoulli distributions, including:
- Moment conversions: Efficient transformation between tabulation and moment representations using Kronecker products
- Correlation analysis: Proper handling of degenerate cases (when marginal probabilities are 0 or 1)
- Missing value handling: Marginalization over unobserved variables
- Parameter estimation: Maximum likelihood estimation with proper normalization
- Distributions.jl: General probability distributions framework
- Kronecker.jl: Efficient Kronecker product operations