Skip to content

shinghinho/stats-monad

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A probability monad with statistics

This package implements an unnormalized distribution monad P designed for discrete probabilistic computations. The inference is exact by enumerating all possibilities, which means it is slow and is not meant for big programs. However, the ability to enumerate over the support of a program means we can provide primitives for computing statistical properties such as independence/moments of random variables.

Example: coin flip

The package exports a module Control.Monad.Statistics with a monad P :: * -> * where P a represents a distribution over a. For example, suppose we would like to simulate two coin flips, we can write the following program:

coins :: P (Int, Int)
coins = do
  x <- coin
  y <- coin
  return (x, y)

where coin is a primitive. Evaluating coins in ghci yields the following distribution:

>>> coins
{(0,0)  0.25; (0,1)  0.25; (1,0)  0.25; (1,1)  0.25}

Recall that an a-valued discrete random variable over a distribution p :: P omega is a function of type omega -> a. This means we can define two random variables x, y :: (Int, Int) -> Int representing the outputs of coins as follows:

x, y :: (Int, Int) -> Int
x = fst
y = snd

With the independent function, we can compute that the two random variables are indeed probabilistically independent with respect to the distribution coins:

>>> independent coins x y
True

However, x and y are not independent when we condition on the maximum of x and y, which is reflected via the conditionallyIndependent function:

>>> z = \(x, y) -> max x y
>>> conditionallyIndependent coins x y z
False

Further, statistics such as expected value and correlation of random variables can be calculated. Since x and y are independent with respect to coins, we know that their correlation is zero:

>>> correlation coins x y
0.0

However, conditioning the coin model to their maximum makes the two outputs negatively correlated. For example, consider the program

coins' = do
  (x, y) <- coins
  observe (max x y == 1)
  return (x, y)

We know x are y are negatively correlated with respect to coins'. This is reflected by the computation

>>> correlation coins' x y
-1.0

Example: dice

Suppose we roll two dice x and y and assume that their sum is greater than 5. Then 1. what is the probability that one of them is even? and 2. what is the expected value of x + y? Instead of doing the calculations manually, we write the program experiment as follows

experiment :: P (Float, Float)
experiment = do
  x <- uniform [1..6]
  y <- uniform [1..6]
  observe (x + y > 5)
  return (x, y)

We then answer the two questions using prob and expected:

>>> prob experiment (\(x, y) -> even x || even y)
0.7692307692307689

>>> expected experiment (\(x, y) -> x + y)
8.153846153846153

Alternatively, if we want the exact probability, we can cast the result of prob to Rational:

>>> prob experiment (\(x, y) -> even x || even y) :: Rational
10 % 13

Theory

Our implementation approximates the finite distribution monad on Set without the convexity restriction, i.e. we include distributions with arbitrary normalizing constants. However, we identify two distributions up to their normalizing constants, which means the resulting equivalence class of distributions is the set of probability measures along with the zero measure.

About

A discrete probability monad with statistics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors