Skip to content

stites/multippl

Repository files navigation

MultiPPL: A Probabilistic Multi-language Framework

Many different probabilistic programming languages exist that specialize to specific kinds of probabilistic programs, broadly falling into the categories of approximate and exact inference.

MultiPPL is a host compiler of two syntactically and semantically different probabilistic programming languages: an approximate language leveraging importance sampling, and an exact language using binary decision diagrams (BDDs) for knowledge compilation. Our work demonstrates sound interoperation of these two languages under a Matthews and Findler-style multi-language framework[1].

Syntax

MultiPPL uses tree-sitter to parse syntax with the full grammar defined in tree-sitter-multippl/grammar.js. Here we describe each sub-language in our framework and how to interoperate between the Cont and Disc languages by example, and provide a summarization of the tree-sitter grammar provided. The examples/ folder contains all programs documented.

A MultiPPL program is introduced using exact { ... } or sample { ... } blocks:

choice('exact, 'sample') '{' <expr> '}'

Here, choice denotes an alternative and comes from the tree-sitter metalanguage, while <expr> comes from the chosen expression language of Disc (exact) or Cont (sample). Single-quoted characters denote requisite symbols.

MultiPPL supports procedures, which similarly require a sample or exact keyword to describe where the function is allowed to run:

choice('sample', 'exact') 'fn' <id> '(' repeat(<id>) ')' '{' <expr> '}'

From tree-sitter we use repeat for zero-or-more repetitions, <id> is a placeholder for a variable (defined as x in each grammar), and <expr> once again corresponds to the chosen expression language.

The Disc Language

Disc syntax draws heavily from the Dice programming language[2] and Disc’s inference strategy uses the same knowledge compilation engine[3] used by Dice. The largest difference between the two languages is that Disc is currently untyped (the type-system will arrive in the next release). Disc allows for floating point values obtained by interoperation but returning a float result in undefined behavior. The syntactic differences between Disc and Dice include:

  • observe e is not an expression, but instead a statements.
  • integers are not bit-encoded and do not need an explicit size.

Example: Two Coins

An illustrative, simple Disc program will flip two biased coins and observe an event that one of the two coins will land on heads:

exact {
  let a = flip 1.0 / 3.0 in
  let b = flip 1.0 / 4.0 in
  observe a || b in
  a
}

In this program flip will represent a coinflip with the probability of heads being flip’s parameter; the first line of the program will create a Bernoulli distribution which returns true (ie: “heads”) with probability 1/3 and bind this to a; on the next line we similarly create a Bernoulli distribution that is true with probability 1/4 and assign this to b. Next, observe encodes evidence that one of these variables must be true and the program queries for the posterior of a’s distribution.

We can analytically derive the solution (or construct a probability table) to show that the posterior of this model is 2/3. Running this program with the MultiPPL compiler, we see:

$ multippl --file examples/two-coins.yo --steps 1
0.6666666666666666
3ms

In contrast to the Dice compiler, MultiPPL will take this program and produce a sampler which executes the program for as many samples as indicated by the --steps flag. For this reason, the command above only returns the expectation of the compiled distribution and does not return a representation of the underlying probability table. The final line reports the wall-clock time of execution.

Because we are compiling a Disc program, the sampled distribution is exact and is invariant to the requested number of samples. If we increase the expected number of samples to 10, we will observe that this is the same as compiling the exact distribution 10 times and taking the average of these (identical) samples.

$ multippl --file examples/two-coins.yo --steps 10
0.6666666666666667
4ms

Products

Disc supports products and projections, and we can use this to query for b’s expectation as well:

exact {
  let a = flip 1.0 / 3.0 in
  let b = flip 1.0 / 4.0 in
  let ab = (a, b) in
  observe a || b in
  (ab[0], ab[1])
}

Compiling this query, will yield a space-delineated list of results and inform us that b’s posterior mean is 0.5:

$ multippl --file examples/two-coins-prod.yo --steps 1
0.6666666666666666 0.5
5ms

The Discrete Distribution and Probabilistic Choice

The Discrete distribution takes in a list of floats, normalizes this list so that they form a valid probability distribution, and returns an integer. Integers in Disc, however, are syntactic sugar for one-hot encodings of the represented int. For example the following program:

exact {
  discrete(1.5, 1.5, 3.0)
}

Is a valid query:

$ multippl --file examples/discrete.yo --steps 1
0.25 0.25 0.5
6ms

If-then-else expressions in Disc denote probabilistic choice.

exact {
  let p = flip 0.5 in
  if p
  then discrete(1.5, 1.5, 3.0)
  else discrete(3.0, 1.5, 1.5)
}

Grammar

A top-level summarization of Disc’s grammar is as follows:

Variables x

Expressions
e := a                                        // all ANF forms
  | x '()' | x '(' repeat(a ',') a ')'        // function application
  | 'if' a 'then' e 'else' e                  // choice
  | 'let' x '=' e 'in' e                      // variable binding
  | 'flip' a                                  // Bernoulli distributions
  | 'discrete' '(' repeat(a ',') a ')'        // Discrete distributions, desugared into a sequence of flips.
  | 'observe' a 'in' e                        // conditioning on hard evidence in a sequence
  | 'sample' '(' sample_e ')'                 // inlined interoperation with an expression e from Cont
  | 'sample' '{' sample_e '}'                 // interoperation with a block expression e from Cont


ANF forms
a := x                               // variables
  | v                                // values
  | '!' a                            // negation
  | '(' repeat(a ',') a ')'          // products
  |  x '[' a ']'                     // projections out of products
  | a binop a                        // binomial operations

Binomial operations
binop := '+' | '*' | '/' | '^' | '<' | '<=' | '==' | '>=' | '>' | '&&' | '||'

Values
v := true | false                    // booleans
  | /-?\d+\.(?:\d*|)/                // statically known floating-point values, or floats obtained through interop
  | /\d+/                            // statically known integers, or integers obtained through interop
  | '()' | '(' repeat(v ',') v ')'   // products

The Cont Language

The Cont language is a simple sampling language that uses importance sampling as its approximate inference strategy. It contains common distributions-objects, both continuous and discrete, as well as the ability to incorporate soft-evidence, sample from distributions, while-loops, and conventional branching statements.

An Approximate Beta-Bernoulli

A example of using Cont to find the posterior of a Beta-Bernoulli process, would look like the following:

sample {
  p <- ~ beta(1.0, 1.0);
  observe true  from bern(p);
  observe false from bern(p);
  observe false from bern(p);
  p
}

First, this program first samples from a Beta(1, 1) distribution with the ~ operator, and binds this sample to the value p using the binding <- symbol. This sample has uniform probability between 0 and 1 and will be used to parameterize the following Bernoulli distributions. In the following three lines this program incorporates three observations into its importance weight, used to score the final query’s posterior mean. Because of conjugacy, we know that the correct posterior is a Beta distribution with $$α=2$$ and $$β=3$$, with an expectation of $$2/(2+3)=0.4$$.

running

Running multippl, we see that 100 samples produces the following expectation of the posterior:

$ multippl --rng 1 --file examples/beta-bernoulli.yo --steps 100
0.3899433561293662
7ms

In this command, --rng 1 indicates a seed, --file points to the relative path of the program in the docker container, and --steps 100 defines the number of samples to produce. Increasing this number of samples, we see that our approximation converges closer to the correct value:

$ multippl --rng 1 --file examples/beta-bernoulli.yo --steps 10000
0.3989326008738859
535ms

While-loops

Four data points for inference is quite limited, requiring many samples to produce an adequate result. We may want to increase how much evidence we give our program with Cont’s while-loop:

sample {
  p ~ beta(1.0, 1.0);
  x <- 10;
  while (x > 0) {
    observe true  from bern(p);
    observe false from bern(p);
    observe false from bern(p);
    x <- x - 1;
    ()
  };
  p
}

In the first line of our program, we use a binding ~ which is syntactic sugar for p <- ~ beta(1.0, 1.0). Notably, all Cont statements terminate with semicolons including while-loops – this differs from conventional imperative programs. All blocks also return expressions and so here we provide unit () to the block in this while-loop, which always discards it’s final value. The posterior of this program is Beta(1+10, 1+20) with a mean of $$11/32=0.34375$$

multippl --rng 1 --file examples/beta-bernoulli-loop.yo --steps 10000
0.34227573553622553
732ms

Branching and Lists

Cont supports branching and control flow through if statements. To define a multi-modal Gaussian distribution, we can use samples from a Bernoulli distribution, and use this to select one of two modes:

sample {
  m ~ bern(0.5);
  if m {
    ~normal(1.0, 0.5)
  } else {
    ~normal(-1.0, 0.5)
  }
}

To perform parameter estimation for this model, we would want to write some function to perform the same scoring over both modes:

sample fn score (p, ev) {
  m ~ bern(p);
  if m {
    observe ev from normal(1.0, 0.5); ()
  } else {
    observe ev from normal(-1.0, 0.5); ()
  }
}
sample {
  p ~ beta(1.0, 1.0);
  score(p, 1.0);
  score(p, 1.0);
  score(p, 1.0);
  p
}

The three observations above will begin to skew our posterior towards the Gaussian distribution with a mode of 1.0:

multippl --rng 1 --file examples/multimodal.yo --steps 1000
0.8051300094638457
56ms

Cont has limited support for lists and includes the head, tail, and push functions. We can represent the same program above with a list of our evidence and iterate through this list using a while loop:

sample fn score (p, ev) {
  m ~ bern(p);
  if m {
    observe ev from normal(1.0, 0.5); ()
  } else {
    observe ev from normal(-1.0, 0.5); ()
  }
}
sample {
  p ~ beta(1.0, 1.0);
  evidence <- [1.0, 1.0, 1.0];
  i <- 3;
  while (i > 0) {
    score(p, evidence[i - 1]);
    i <- i - 1;
    ()
  };
  p
}

And we can confirm that running this program with the same seed will yield the same result as before:

multippl --rng 1 \
       --file examples/multimodal-iter.yo --steps 1000
0.8051300094638457
79ms

Grammar

A simplified summary of Cont’s tree-sitter grammar is as follows:

Variables x

Expressions
e := a                                        // all ANF forms
  | 'while' a '{' e '}'                       // while loops
  | x '()' | x '(' repeat(x ',') x ')'        // function application
  |'if' '(' a ')' '{' e '}' 'else' '{' e '}'  // control flow
  | x '<-' e ';' e                            // variable binding
  | e ';' e                                   // sequencing
  | '~' e                                     // sampling an expression
  | x '~' e ';' e                             // sugar for binding a sample: x <- (~ e); e
  | 'observe' a 'from' a                      // conditioning on soft evidence
  | 'exact' '(' exact_e ')'                   // inlined interoperation with an expression e from Disc
  | 'exact' '{' exact_e '}'                   // interoperation with a block expression e from Disc

ANF forms
a := x                                          // variables
  | v                                           // values
  | '!' a                                       // negation
  |  x '[' a ']'                                // projections
  | a binop a                                   // binomial operations
  | '(' repeat(a ',') a ')'                     // products
  | '[' a ']' | '[' repeat(a ',') a ']'         // vectors
  | 'head' '(' a ')' | 'tail' '(' a ')'         // vector operations
  | 'push' '(' a ',' a ')'                      // vector operations
  | 'bern' '(' a ')'                            // Bernoulli distributions
  | 'poisson' '(' a ')'                         // Poisson distributions
  | 'uniform' '(' a ',' a ')'                   // Uniform distributions
  | 'normal' '(' a ',' a ')'                    // Normal distributions
  | 'beta' '(' a ',' a ')'                      // Beta distributions
  | 'discrete' '(' repeat(a ',') a ')'          // Discrete distributions

Binomial operations
binop := '+' | '*' | '/' | '^' | '<' | '<=' | '==' | '>=' | '>' | '&&' | '||'

Values
v := true | false                               // booleans
  | /-?\d+\.(?:\d*|)/                           // floating-point values
  | /\d+/                                       // integers
  | '[]' | '[' repeat(v ',') v ']'              // vectors
  | '()' | '(' repeat(v ',') v ')'              // products
  | 'bern' '(' v ')'                            // Bernoulli distributions
  | 'poisson' '(' v ')'                         // Poisson distributions
  | 'uniform' '(' v ',' v ')'                   // Uniform distributions
  | 'normal' '(' v ',' v ')'                    // Normal distributions
  | 'beta' '(' v ',' v ')'                      // Beta distributions
  | 'discrete' '(' repeat(v ',') v ')'          // Discrete distributions

Interoperation

MultiPPL provides a framework in which Cont and Disc can seamlessly interoperate through boundary operators, mutually defined in each language.

An example of this is when we have components of a program which we would like to model exactly, but we would like to use this in a larger program which needs more flexibility and can be resoned about approximately. For instance, in the following program, we model a packet traversing a ladder-like network topology of unbounded length. At each “rung” on the ladder, a unbiased node is selected for the packet to continue its traversal, and we can model each node’s failure rate exactly.

exact fn rung (s1) {
  let route = flip 0.5 in

  let s2 = if route then s1 else false in
  let drop2 = flip 0.005 in
  let go2 = s2 && !drop2 in

  let s3 = if route then false else s1 in
  let drop3 = flip 0.001 in
  let go3 = s3 && !drop3 in

  go2 || go3
}

The above function models a partial traversal through this ladder network, and returns a Boolean representing whether or not the packet was able to navigate through this sub-network without getting dropped. We want to query on the probability that a packet will successfully traverse all of these intermediate steps without getting dropped, but the network has unbounded length, which cannot be modelled exactly.

sample {
  ix ~ poisson(20.0);
  ix <- ix + 1;
  traversed <- true;
  while ix > 0 {
    traversed <- exact(rung(traversed));
    ix <- ix - 1;
    ()
  };
  traversed
}

Using Cont, we can model the length of this network using a Poisson distribution (with an average topology size of 20 rungs). We then can iterate over each subnetwork and return a sample encapsulating the success of the packet’s traversal.

Evaluating this for 1000 samples, we can find the expectation of this model to be:

$ multippl --rng 1 --file examples/ladder.yo --steps 1000
0.942
604ms

Similarly, we can use the sample keyword inside of a Disc program to use a Cont value in a Disc context. The MultiPPL compiler will also provide some syntactic sugar when performing variable look-ups and will attempt to perform interoperation wherever possible (as in the case of Cont’s traversed variable binding in the above program).

For more examples of interoperation, we refer users to our submission and provided benchmarks.

Sample Consistency

When interoperating, it is worth noting that Disc program which are repeatedly sampled adhere to sample consistency. As an example, consider the following program:

exact {
  let X = flip 0.5 in
  sample {
    y <- exact(X);
    z <- exact(X);
    y || z
  }
}

Here, we have a random variable X defined in Disc, referenced twice from Cont. When we evaluate the boundary which binds X to y we sample the Disc program. On next line, when we again reference X and bind it to z, the sample must be consistent with the previous sample. Running this program, we can confirm that this is the case:

$ multippl --file examples/sample-consistency.yo --steps 1000 --rng 1
0.503
80ms

Nix Development

MultiPPL uses nix’s flakes for development. Using a flake-enabled nix binary, the following commands can are available

  • nix develop enters a development shell.
  • nix flake check runs cargo nextest run and checks our nix derivations.
  • nix build .#multippl .#multippl-benchmark .#multippl-docker produces the multippl, and multippl-benchmark executables alongside a docker image for artifact evaluation.
  • nix run .#multippl-benchmark -- <ARGS> runs the multippl-benchmark executable
  • nix run .#multippl -- <ARGS> runs the multippl executable

References

  • [1] Jacob Matthews and Robert Bruce Findler. “Operational Semantics for Multi-Language Programs”. In: ACM SIGPLAN Notices 42.1 (2007), pp. 3–10. doi: 10.1145/1190215. 1190220.
  • [2] Steven Holtzen, Guy Van den Broeck, and Todd Millstein. “Scaling Exact Inference for Discrete Probabilistic Programs”. In: Proceedings of the ACM on Programming Languages 4 (OOPSLA Nov. 2020), 140:1–140:31. doi: 10/gh4jhb.
  • [3] https://github.com/neuppl/rsdd

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published