[RFC] New computation model for dune

This proposal descibe a new computation model for Dune. The aim is to improve performance, sharing, memory usage as well as to increase the expressive power of Dune. For instance with this new model it will be trivial to support including generated files in `dune` files.

From the point of view of Dune developers and in the future plugin authors, this new model is simple, general and makes it easy to understand where the build system adds overhead to user specified computations in order to minimise recomputation. It will also provide good support for profiling and debugging computations. Additionally, it will naturally benefit from multi-core.

## Problem this proposal is solving

In a given build, there are two kinds of computations: the computations done by external commands and the computations done by the build system itself. The former are already well understood and correctly tracked by the build system, however the latter are currently not properly tracked by Dune. As a result, they are not shared between runs of Dune or between different workspaces. This is fine for small projects as these computations are generally fast, however it doesn't scale to large workspaces.

Build systems such as Jenga allow to share the results of such internal computation in a single process in polling mode, but not across processes.

## Overview of the new model

The new model is simply a system that memoizes pure functions. The ouputs of functions may or may not be serializable. Outputs that are serializable may be be shared between runs and workspaces and will provide cut-off points during recomputation. Inputs must always be serializable.

The API is as follow:

```ocaml
module Input : sig
  type 'a t
  ...
end

module Output : sig
  type 'a t
  ...
end

val memoize
  :  name:string
  -> 'a Input.t
  -> 'a Output.t
  -> ('a -> 'b Comp.t)
  -> ('a -> 'b Comp.t) Staged.t
```

Where `Comp.t` is essentially the `Fiber.t` monad but that addionnally tracks the effect of the computation, such as reading/writing files, evaluating globs or evaluating other memoized functions.

Because the function is expected to be pure, it will not be necessary to recompute it on a given input when none of the observations it made during its previous execution changed. Additionally, when the memory usage becomes a problem, Dune will be able to forget intermediate results, at the expanse of needing to recompute them when they become needed again.

## Encoding the build system

In this model, the user simply needs to provide a function to build a file:

```ocaml
val build_file : Path.t -> unit Comp.t
```

In particular, there is nothing special about running external commands. To run an external command, we simply use a primitive:

```ocaml
val run : Path.t -> string list -> Effects.t -> unit Comp.t
```

However, it is still necessary to manually specify the effects of the command given that the build system cannot infer them.

## Migration plan

1. implement the memoizing and `Comp` API, except for writes effects
2. replace `Vfile_kind` by memoized functions
3. replace all global hash tables by memoized functions
4. replace the rule generator callback by a memoized function
5. dismantle `Super_context.t`
6. replace the static file tree by `Comp.t` primitves
7. share values between builds in polling mode
8. make values persistent by storing them on disk or in a database
9. implement writes in `Comp`
10. more refactoring to get the final `build_file` encoding

At step 7, polling builds will become really fast even on big workspaces. At step 8, the startup time of dune will be much faster. When combined with a shared artifact cache, startup times even on fresh workspaces will be really fast. Steps 9 and 10 are simply to make the programming model even nicer, in preparation for plugins.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] New computation model for dune #1280

Problem this proposal is solving

Overview of the new model

Encoding the build system

Migration plan

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] New computation model for dune #1280

Description

Problem this proposal is solving

Overview of the new model

Encoding the build system

Migration plan

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions