-
Notifications
You must be signed in to change notification settings - Fork 470
[RFC] New computation model for dune #1280
Description
This proposal descibe a new computation model for Dune. The aim is to improve performance, sharing, memory usage as well as to increase the expressive power of Dune. For instance with this new model it will be trivial to support including generated files in dune files.
From the point of view of Dune developers and in the future plugin authors, this new model is simple, general and makes it easy to understand where the build system adds overhead to user specified computations in order to minimise recomputation. It will also provide good support for profiling and debugging computations. Additionally, it will naturally benefit from multi-core.
Problem this proposal is solving
In a given build, there are two kinds of computations: the computations done by external commands and the computations done by the build system itself. The former are already well understood and correctly tracked by the build system, however the latter are currently not properly tracked by Dune. As a result, they are not shared between runs of Dune or between different workspaces. This is fine for small projects as these computations are generally fast, however it doesn't scale to large workspaces.
Build systems such as Jenga allow to share the results of such internal computation in a single process in polling mode, but not across processes.
Overview of the new model
The new model is simply a system that memoizes pure functions. The ouputs of functions may or may not be serializable. Outputs that are serializable may be be shared between runs and workspaces and will provide cut-off points during recomputation. Inputs must always be serializable.
The API is as follow:
module Input : sig
type 'a t
...
end
module Output : sig
type 'a t
...
end
val memoize
: name:string
-> 'a Input.t
-> 'a Output.t
-> ('a -> 'b Comp.t)
-> ('a -> 'b Comp.t) Staged.tWhere Comp.t is essentially the Fiber.t monad but that addionnally tracks the effect of the computation, such as reading/writing files, evaluating globs or evaluating other memoized functions.
Because the function is expected to be pure, it will not be necessary to recompute it on a given input when none of the observations it made during its previous execution changed. Additionally, when the memory usage becomes a problem, Dune will be able to forget intermediate results, at the expanse of needing to recompute them when they become needed again.
Encoding the build system
In this model, the user simply needs to provide a function to build a file:
val build_file : Path.t -> unit Comp.tIn particular, there is nothing special about running external commands. To run an external command, we simply use a primitive:
val run : Path.t -> string list -> Effects.t -> unit Comp.tHowever, it is still necessary to manually specify the effects of the command given that the build system cannot infer them.
Migration plan
- implement the memoizing and
CompAPI, except for writes effects - replace
Vfile_kindby memoized functions - replace all global hash tables by memoized functions
- replace the rule generator callback by a memoized function
- dismantle
Super_context.t - replace the static file tree by
Comp.tprimitves - share values between builds in polling mode
- make values persistent by storing them on disk or in a database
- implement writes in
Comp - more refactoring to get the final
build_fileencoding
At step 7, polling builds will become really fast even on big workspaces. At step 8, the startup time of dune will be much faster. When combined with a shared artifact cache, startup times even on fresh workspaces will be really fast. Steps 9 and 10 are simply to make the programming model even nicer, in preparation for plugins.