Many different probabilistic programming languages exist that specialize to specific kinds of probabilistic programs, broadly falling into the categories of approximate and exact inference.
MultiPPL is a host compiler of two syntactically and semantically different probabilistic programming languages: an approximate language leveraging importance sampling, and an exact language using binary decision diagrams (BDDs) for knowledge compilation. Our work demonstrates sound interoperation of these two languages under a Matthews and Findler-style multi-language framework[1].
MultiPPL uses tree-sitter to parse syntax with the full grammar defined in
tree-sitter-multippl/grammar.js. Here we describe each sub-language in our
framework and how to interoperate between the Cont and Disc languages by
example, and provide a summarization of the tree-sitter grammar provided. The
examples/ folder contains all programs documented.
—
A MultiPPL program is introduced using exact { ... } or sample { ... } blocks:
choice('exact, 'sample') '{' <expr> '}'
Here, choice denotes an alternative and comes from the tree-sitter metalanguage,
while <expr> comes from the chosen expression language of Disc (exact) or Cont (sample). Single-quoted characters denote requisite symbols.
MultiPPL supports procedures, which similarly require a sample or exact keyword to
describe where the function is allowed to run:
choice('sample', 'exact') 'fn' <id> '(' repeat(<id>) ')' '{' <expr> '}'
From tree-sitter we use repeat for zero-or-more repetitions, <id> is a placeholder for a variable (defined as x in each grammar), and <expr> once again corresponds to the chosen expression language.
Disc syntax draws heavily from the Dice programming language[2] and Disc’s inference strategy uses the same knowledge compilation engine[3] used by Dice. The largest difference between the two languages is that Disc is currently untyped (the type-system will arrive in the next release). Disc allows for floating point values obtained by interoperation but returning a float result in undefined behavior. The syntactic differences between Disc and Dice include:
observe eis not an expression, but instead a statements.- integers are not bit-encoded and do not need an explicit size.
An illustrative, simple Disc program will flip two biased coins and observe an event that one of the two coins will land on heads:
exact {
let a = flip 1.0 / 3.0 in
let b = flip 1.0 / 4.0 in
observe a || b in
a
}
In this program flip will represent a coinflip with the
probability of heads being flip’s parameter; the first line of the program will
create a Bernoulli distribution which returns true (ie: “heads”) with probability 1/3
and bind this to a; on the next line we similarly create a Bernoulli distribution that
is true with probability 1/4 and assign this to b. Next, observe encodes
evidence that one of these variables must be true and the program queries for
the posterior of a’s distribution.
We can analytically derive the solution (or construct a probability table) to show that the posterior of this model is 2/3. Running this program with the MultiPPL compiler, we see:
$ multippl --file examples/two-coins.yo --steps 1 0.6666666666666666 3ms
In contrast to the Dice compiler, MultiPPL will take this program and produce a
sampler which executes the program for as many samples as indicated by the
--steps flag. For this reason, the command above only returns the expectation of
the compiled distribution and does not return a representation of the underlying
probability table. The final line reports the wall-clock time of execution.
Because we are compiling a Disc program, the sampled distribution is exact and is invariant to the requested number of samples. If we increase the expected number of samples to 10, we will observe that this is the same as compiling the exact distribution 10 times and taking the average of these (identical) samples.
$ multippl --file examples/two-coins.yo --steps 10 0.6666666666666667 4ms
Disc supports products and projections, and we can use this to query for b’s expectation as well:
exact {
let a = flip 1.0 / 3.0 in
let b = flip 1.0 / 4.0 in
let ab = (a, b) in
observe a || b in
(ab[0], ab[1])
}
Compiling this query, will yield a space-delineated list of results and inform us that b’s posterior mean is 0.5:
$ multippl --file examples/two-coins-prod.yo --steps 1 0.6666666666666666 0.5 5ms
The Discrete distribution takes in a list of floats, normalizes this list so that they form a valid probability distribution, and returns an integer. Integers in Disc, however, are syntactic sugar for one-hot encodings of the represented int. For example the following program:
exact {
discrete(1.5, 1.5, 3.0)
}
Is a valid query:
$ multippl --file examples/discrete.yo --steps 1 0.25 0.25 0.5 6ms
If-then-else expressions in Disc denote probabilistic choice.
exact {
let p = flip 0.5 in
if p
then discrete(1.5, 1.5, 3.0)
else discrete(3.0, 1.5, 1.5)
}
A top-level summarization of Disc’s grammar is as follows:
Variables x
Expressions
e := a // all ANF forms
| x '()' | x '(' repeat(a ',') a ')' // function application
| 'if' a 'then' e 'else' e // choice
| 'let' x '=' e 'in' e // variable binding
| 'flip' a // Bernoulli distributions
| 'discrete' '(' repeat(a ',') a ')' // Discrete distributions, desugared into a sequence of flips.
| 'observe' a 'in' e // conditioning on hard evidence in a sequence
| 'sample' '(' sample_e ')' // inlined interoperation with an expression e from Cont
| 'sample' '{' sample_e '}' // interoperation with a block expression e from Cont
ANF forms
a := x // variables
| v // values
| '!' a // negation
| '(' repeat(a ',') a ')' // products
| x '[' a ']' // projections out of products
| a binop a // binomial operations
Binomial operations
binop := '+' | '*' | '/' | '^' | '<' | '<=' | '==' | '>=' | '>' | '&&' | '||'
Values
v := true | false // booleans
| /-?\d+\.(?:\d*|)/ // statically known floating-point values, or floats obtained through interop
| /\d+/ // statically known integers, or integers obtained through interop
| '()' | '(' repeat(v ',') v ')' // products
The Cont language is a simple sampling language that uses importance sampling as its approximate inference strategy. It contains common distributions-objects, both continuous and discrete, as well as the ability to incorporate soft-evidence, sample from distributions, while-loops, and conventional branching statements.
A example of using Cont to find the posterior of a Beta-Bernoulli process, would look like the following:
sample {
p <- ~ beta(1.0, 1.0);
observe true from bern(p);
observe false from bern(p);
observe false from bern(p);
p
}
First, this program first samples from a Beta(1, 1) distribution with the ~ operator, and binds this sample to the value p using the binding <- symbol. This sample has uniform probability between 0 and 1 and will be used to parameterize the following Bernoulli distributions. In the following three lines this program incorporates three observations into its importance weight, used to score the final query’s posterior mean. Because of conjugacy, we know that the correct posterior is a Beta distribution with
Running multippl, we see that 100 samples produces the following expectation of the posterior:
$ multippl --rng 1 --file examples/beta-bernoulli.yo --steps 100 0.3899433561293662 7ms
In this command, --rng 1 indicates a seed, --file points to the relative path of the program in the docker container, and --steps 100 defines the number of samples to produce. Increasing this number of samples, we see that our approximation converges closer to the correct value:
$ multippl --rng 1 --file examples/beta-bernoulli.yo --steps 10000 0.3989326008738859 535ms
Four data points for inference is quite limited, requiring many samples to produce an adequate result. We may want to increase how much evidence we give our program with Cont’s while-loop:
sample {
p ~ beta(1.0, 1.0);
x <- 10;
while (x > 0) {
observe true from bern(p);
observe false from bern(p);
observe false from bern(p);
x <- x - 1;
()
};
p
}
In the first line of our program, we use a binding ~ which is syntactic sugar for p <- ~ beta(1.0, 1.0).
Notably, all Cont statements terminate with semicolons including while-loops – this differs from conventional imperative programs. All blocks also return expressions and so here we provide unit () to the block in this while-loop, which always discards it’s final value. The posterior of this program is Beta(1+10, 1+20) with a mean of
multippl --rng 1 --file examples/beta-bernoulli-loop.yo --steps 10000 0.34227573553622553 732ms
Cont supports branching and control flow through if statements. To define a multi-modal Gaussian distribution, we can use samples from a Bernoulli distribution, and use this to select one of two modes:
sample {
m ~ bern(0.5);
if m {
~normal(1.0, 0.5)
} else {
~normal(-1.0, 0.5)
}
}
To perform parameter estimation for this model, we would want to write some function to perform the same scoring over both modes:
sample fn score (p, ev) {
m ~ bern(p);
if m {
observe ev from normal(1.0, 0.5); ()
} else {
observe ev from normal(-1.0, 0.5); ()
}
}
sample {
p ~ beta(1.0, 1.0);
score(p, 1.0);
score(p, 1.0);
score(p, 1.0);
p
}
The three observations above will begin to skew our posterior towards the Gaussian distribution with a mode of 1.0:
multippl --rng 1 --file examples/multimodal.yo --steps 1000 0.8051300094638457 56ms
Cont has limited support for lists and includes the head, tail, and push functions. We can represent the same program above with a list of our evidence and iterate through this list using a while loop:
sample fn score (p, ev) {
m ~ bern(p);
if m {
observe ev from normal(1.0, 0.5); ()
} else {
observe ev from normal(-1.0, 0.5); ()
}
}
sample {
p ~ beta(1.0, 1.0);
evidence <- [1.0, 1.0, 1.0];
i <- 3;
while (i > 0) {
score(p, evidence[i - 1]);
i <- i - 1;
()
};
p
}
And we can confirm that running this program with the same seed will yield the same result as before:
multippl --rng 1 \
--file examples/multimodal-iter.yo --steps 1000
0.8051300094638457
79ms
A simplified summary of Cont’s tree-sitter grammar is as follows:
Variables x
Expressions
e := a // all ANF forms
| 'while' a '{' e '}' // while loops
| x '()' | x '(' repeat(x ',') x ')' // function application
|'if' '(' a ')' '{' e '}' 'else' '{' e '}' // control flow
| x '<-' e ';' e // variable binding
| e ';' e // sequencing
| '~' e // sampling an expression
| x '~' e ';' e // sugar for binding a sample: x <- (~ e); e
| 'observe' a 'from' a // conditioning on soft evidence
| 'exact' '(' exact_e ')' // inlined interoperation with an expression e from Disc
| 'exact' '{' exact_e '}' // interoperation with a block expression e from Disc
ANF forms
a := x // variables
| v // values
| '!' a // negation
| x '[' a ']' // projections
| a binop a // binomial operations
| '(' repeat(a ',') a ')' // products
| '[' a ']' | '[' repeat(a ',') a ']' // vectors
| 'head' '(' a ')' | 'tail' '(' a ')' // vector operations
| 'push' '(' a ',' a ')' // vector operations
| 'bern' '(' a ')' // Bernoulli distributions
| 'poisson' '(' a ')' // Poisson distributions
| 'uniform' '(' a ',' a ')' // Uniform distributions
| 'normal' '(' a ',' a ')' // Normal distributions
| 'beta' '(' a ',' a ')' // Beta distributions
| 'discrete' '(' repeat(a ',') a ')' // Discrete distributions
Binomial operations
binop := '+' | '*' | '/' | '^' | '<' | '<=' | '==' | '>=' | '>' | '&&' | '||'
Values
v := true | false // booleans
| /-?\d+\.(?:\d*|)/ // floating-point values
| /\d+/ // integers
| '[]' | '[' repeat(v ',') v ']' // vectors
| '()' | '(' repeat(v ',') v ')' // products
| 'bern' '(' v ')' // Bernoulli distributions
| 'poisson' '(' v ')' // Poisson distributions
| 'uniform' '(' v ',' v ')' // Uniform distributions
| 'normal' '(' v ',' v ')' // Normal distributions
| 'beta' '(' v ',' v ')' // Beta distributions
| 'discrete' '(' repeat(v ',') v ')' // Discrete distributions
MultiPPL provides a framework in which Cont and Disc can seamlessly interoperate through boundary operators, mutually defined in each language.
An example of this is when we have components of a program which we would like to model exactly, but we would like to use this in a larger program which needs more flexibility and can be resoned about approximately. For instance, in the following program, we model a packet traversing a ladder-like network topology of unbounded length. At each “rung” on the ladder, a unbiased node is selected for the packet to continue its traversal, and we can model each node’s failure rate exactly.
exact fn rung (s1) {
let route = flip 0.5 in
let s2 = if route then s1 else false in
let drop2 = flip 0.005 in
let go2 = s2 && !drop2 in
let s3 = if route then false else s1 in
let drop3 = flip 0.001 in
let go3 = s3 && !drop3 in
go2 || go3
}
The above function models a partial traversal through this ladder network, and returns a Boolean representing whether or not the packet was able to navigate through this sub-network without getting dropped. We want to query on the probability that a packet will successfully traverse all of these intermediate steps without getting dropped, but the network has unbounded length, which cannot be modelled exactly.
sample {
ix ~ poisson(20.0);
ix <- ix + 1;
traversed <- true;
while ix > 0 {
traversed <- exact(rung(traversed));
ix <- ix - 1;
()
};
traversed
}
Using Cont, we can model the length of this network using a Poisson distribution (with an average topology size of 20 rungs). We then can iterate over each subnetwork and return a sample encapsulating the success of the packet’s traversal.
Evaluating this for 1000 samples, we can find the expectation of this model to be:
$ multippl --rng 1 --file examples/ladder.yo --steps 1000 0.942 604ms
Similarly, we can use the sample keyword inside of a Disc program to use a Cont
value in a Disc context. The MultiPPL compiler will also provide some syntactic
sugar when performing variable look-ups and will attempt to perform
interoperation wherever possible (as in the case of Cont’s traversed variable
binding in the above program).
For more examples of interoperation, we refer users to our submission and provided benchmarks.
When interoperating, it is worth noting that Disc program which are repeatedly sampled adhere to sample consistency. As an example, consider the following program:
exact {
let X = flip 0.5 in
sample {
y <- exact(X);
z <- exact(X);
y || z
}
}
Here, we have a random variable X defined in Disc, referenced twice from Cont.
When we evaluate the boundary which binds X to y we sample the Disc program. On
next line, when we again reference X and bind it to z, the sample must be consistent with the previous sample. Running this program, we can confirm that this is the case:
$ multippl --file examples/sample-consistency.yo --steps 1000 --rng 1 0.503 80ms
MultiPPL uses nix’s flakes for development. Using a flake-enabled nix binary,
the following commands can are available
nix developenters a development shell.nix flake checkrunscargo nextest runand checks our nix derivations.nix build .#multippl .#multippl-benchmark .#multippl-dockerproduces themultippl, andmultippl-benchmarkexecutables alongside a docker image for artifact evaluation.nix run .#multippl-benchmark -- <ARGS>runs themultippl-benchmarkexecutablenix run .#multippl -- <ARGS>runs themultipplexecutable
- [1] Jacob Matthews and Robert Bruce Findler. “Operational Semantics for Multi-Language Programs”. In: ACM SIGPLAN Notices 42.1 (2007), pp. 3–10. doi: 10.1145/1190215. 1190220.
- [2] Steven Holtzen, Guy Van den Broeck, and Todd Millstein. “Scaling Exact Inference for Discrete Probabilistic Programs”. In: Proceedings of the ACM on Programming Languages 4 (OOPSLA Nov. 2020), 140:1–140:31. doi: 10/gh4jhb.
- [3] https://github.com/neuppl/rsdd