This PR implements a variant of the "constructor unboxing" feature proposed by Jeremy Yallop in RFC #14, which extends the current [@@unboxed] annotation for single-constructor variant types (or single-field record types). The PR was implemented by @nchataing during an internship supervised by @gasche.
The following is now accepted:
type bignum =
| Short of int [@unboxed] (* represented directly by an integer *)
| Long of Gmp.t (* Block of tag 0 (first non-unboxed constructor) *)In a small benchmark, we tested that this representation provides the same performances as the current Zarith implementation when computations remain in the "short integer" range -- Zarith uses dirty unsafe tricks to avoid boxing short integers, which we can replace with natural OCaml code.
(Note: Zarith does not box the Long case either, checking for a Custom tag instead. We do not support this yet, but it would be a natural extension of this PR.)
The intent of the currently presented work is to do the simplest possible step in the direction of more unboxing. Most advanced aspects were left to future work. We believe that it would be reasonable to integrate the PR with its current feature scope: it provides values for users, and should not hamper further extensions.
See HEAD_SHAPE.spec.md.
The main components required for this feature are:
- We need to compute head shape information for unboxed constructors, which in turn requires computing head shape information for whole type.
- At type-declaration time, we need to check disjointness of head shapes and reject invalid declarations.
- We need to compile unboxed constructors in expressions.
- We need to compile unboxed constructors in patterns.
(3) is in fact very easy: unboxed constructors in expressions are implemented by doing nothing at all.
(1) and (2) proved fairly difficult to implement cleanly due in large part to accidental complexity.
Representing head shapes by their set of possible values is not practical, as (Imm, _) (the shape of all immediate values) would be too large a set. Instead we use a representation with an explicit "any" value in either domains (for immediates or for tags):
(* Over-approximation of the possible values of a type,
used to verify and compile unboxed head constructors.
We represent the "head" of the values: their value if
it is an immediate, or their tag if is a block.
A head "shape" is a set of possible heads.
*)
and head_shape =
{ head_imm : imm shape; (* set of immediates the head can be *)
head_blocks : tag shape; (* set of tags the head can have *)
}
(* A description of a set of ['a] elements. *)
and 'a shape =
| Shape_set of 'a list (* An element among a finite list. *)
| Shape_any (* Any element (Top/Unknown). *)
and imm = int
and tag = intThis definition is found in typing/types.mli: https://github.com/gasche/ocaml/blob/head_shape/typing/types.mli#L624-L642
Remark: we tried turning imm and tag into single-constructor variants for more safety, but this proved very cumbersome as many auxiliary functions happily process either sorts of tags to compute numeric aggregates.
Remark: in practice we don't know of any use type in OCaml that contains the (Block, _) shape without also allowing (Imm, _). (So we could do without (Block, _) and have two tops, "any immediate" or "any value whatsoever") No type describes "any block but not immediates". Allowing it in the compiler implementation makes the code more regular, and it could come handy in the future (if we wanted, say, to assign shapes to Cmm-level types) or for weird FFI projects.
Pattern-matching compilation is done in lambda/matching.ml.
Remark: The other pattern-matching-heavy module, typing/parmatch.ml, is in charge of checking for exhaustiveness and other type-checking-time warnings; it required basically no change, given that unboxing a constructor does not change the observable behavior of pattern-matching, unboxed constructors are handled just like usual constructors for exhaustiveness and all other static-analysis purpose.
The pattern-matching compiler works by decomposing pattern-matching clauses into a lower-level representation built from "switches", a lower-level control-flow construct that just dispatches on the value of immediates and tags of a head block. Those switches are then lowered further to Lambda constructs (Lifthenelse + tag accesses, or a native Lswitch instruction used for the bytecode compiler).
A switch is a list of switch clauses, which are of the form (cstr_tag, action), where cstr_tag is either an immediate value or a block tag, and actions are pieces of lambda-code. Our implementation adds a new sort of cstr_tag, morally Cstr_unboxed of head_shape (representation details in the next section); these "unboxed tags" are "expanded away" by replacing clauses witch unboxed tags by clauses for the underlying elements of the head shape.
For example, consider the following type declarations:
type internal = Foo | Bar of string [@unboxed] (* head shape: {(Imm, 0); (Block, 252)} *)
type t = First of int | Second of internal [@unboxed]An exhaustive switch on a value of type t may look like
| Cstr_block 0 -> act1
| Cstr_unboxed [{(Imm, 0); (Block, 252)}] -> act2)and we expand it to a switch without unboxed constructors, to be compiled by the existing machinery:
| Cstr_block 0 -> act1
| Cstr_constant 0 -> act2
| Cstr_block 252 -> act2This is the general idea, but in fact we need slightly more sophistication to deal with (Imm, _) or (Block, _) shapes. We change the definition of clauses to contain both:
- either a list of specific actions for some immediates, or an "any" action for all immediates, and
- either a list of specific actions for some block tags, or an "any" action for all blocks
The code that lowers switches into Lifthenelse or Lswitch has to be adapted to deal with this new structure. We tried to make it possible/easy to check that, in absence of unboxed constructors, the code generated is the same as it was previously. (The cost of this approach is that sometimes the code is more verbose / less regular than it could be if written from scratch, accepting some changes in compilation.)
Remark: To emit Lswitch instructions, the compiler needs to know of the range of possible immediate values and tags (when we don't have any_{const,nonconst} actions). More precisely, we want the maximal possible immediate constructor and block tag at this type (minimal values are always assumed to be 0).
This new representation of switches with "any immediate" or "any block tag" cases is in a new Cases submodule:
https://github.com/gasche/ocaml/blob/head_shape/lambda/matching.ml#L2738-L2790
The expansion of unboxed constructors in switches is performed in the split_cases function:
https://github.com/gasche/ocaml/blob/head_shape/lambda/matching.ml#L2792-L2827
Finally, the lowering of those switches to Lambda code is done in the combine_constructor function:
https://github.com/gasche/ocaml/blob/head_shape/lambda/matching.ml#L2914-L3080
We use the "on-demand" approach advertised by @lpw25 and @stedolan to compute type properties on-demand, at each type declaration with unboxed constructors, instead of trying to compute the property modularly for all type definitions (which may or may not occur in the type of an unboxed constructor later).
Here "demand" is defined as either:
- a type declaration using some unboxed constructors is processed by the type-checker (this "demands" computing the shape of each unboxed constructor)
- the pattern-matching compiler encounters an unboxed constructor
In particular, it can happen that an unboxed-constructor is compiled without the type-checker having processed its declaration first! This is the case where the type declaration comes from a .cmi – it has been checked by a previous run of the compiler, and is then imported as-is into the environment without being re-checked.
The head shape computed for an unboxed constructor is cached on first demand, and will not be recomputed several times. The computation requires a typing environment -- which must be an extension of the environment in which the constructor was defined -- to unfold the definition of typing constructors used in the unboxed constructor parameter type, and further constructions reachable from them.
The details of how we perform repeated expansion of type constructors while ensuring termination are tricky, they are discussed in this ML Workshop extended abstract and PR #10479.
The computation of the head shape is done in the functions of_type_expr and of_typedescr in a new Head_shape module of typing/typedecl_unboxed.ml.
In particular, the definition of head shape on "base types" relies on a match_primitive function recognized a set of builtin type paths (we borrowed the logic from Typeopt.classify).
This, surprisingly, turned out to be the most difficult part of the PR to get right.
As discussed above, we need to store two different kind of information for compilation:
- per-unboxed-constructor information: what is the shape of the unboxed constructor argument
- per-sum-type information: how many unboxed constructors are there, what is the maximal value of the immediates and block tags they contain
The existing codebase already computes per-constructor information (imm/tag values) and per-sum-type information (the number of constant and non-constant constructors) in typing/datarepr.ml : constructor_descrs. This function is called (through Datarepr.constructors_of_type) from the Env module, which is in the business of converting type declarations (as provided by the user) into type descriptions (with extra information for compilation) as stored in the typing environment.
Unfortunately, this scheme cannot be extended for our unboxing-related information. Computing the head shape of an unboxed constructor requires unfolding type constructors from the typing environment, so in particular the function doing the computation needs an Env.t argument and depends on the Env module. It cannot be put in the module Datarepr that itself is called from Env, as this would be create a circular dependency.
We tried many ways to avoid this cyclic dependency (for example we went half-way through large refactorings trying to compute type descriptions in Typedecl, and pass it to Env-extending functions, instead of computing them within Env).
In the end we settled on a very simple approach: the per-unboxed-constructor information has type head_shape option ref, and the per-sum-type information has type unboxed_type_data option ref, they are initialized to ref None by the Datarepr module, and then filled on-demand -- from a module that can depend on Env. Caching the on-demand computation is hidden under accessor-style functions (that fills the cache if the value was not computed yet):
- per-unboxed-constructor information are accessed using Typedecl_unboxed.Head_shape.get; it is also computed and cached for all unboxed constructors of a given type by [Typedecl_unboxed.Head_shape.check_typedecl].
- per-sum-type information is placed in an
unboxed_type_data option reffield of theTypes.constructor_descriptiontype -- we sheepishly extended the dubious existing approach of storing per-sum-type information in each of its constructors. It is computed by Typedecl_unboxed.unboxed_type_data_of_shape.
Note: the unboxed_type_data option ref field is not meant to be accessed directly, but through helper functions in Typedecl, that export its content as fields:
cstr_max_block_tag : Env.t -> constructor_description -> Types.tag option
cstr_max_imm_value : Env.t -> constructor_description -> Types.imm option
cstr_unboxed_numconsts : Env.t -> constructor_description -> int option
cstr_unboxed_numnonconsts : Env.t -> constructor_description -> int optionMorally those are fields of a constructor description; the main difference is that, unlike other fields, they can only be computed "later than Datarepr", when a typing environment is available. We first tried having each of them be a mutable field, but this made it difficult to give a type to a function computing all of them at once, as [unboxed_type_data_of_shape] does.
TODO: describe our testing approach and testsuite
We are submitting this PR in the interest of gathering feedback, and getting the review train rolling. (We know it takes a while from people reading about the work to actually reviewing it. Please start right now!) There are still a few things we want to changes before we consider it mergeable -- and surely other suggestions from the reviewers.
The termination-checking algorithm currently does not take cyclic types into account, and may non-terminate on those. This should be fixed easily by keeping track of the set of type nodes we have already encountered in head position during reduction.
Our cycle-detection algorithm will sometimes reject definitions that indeed are cyclic, but do admit a valid fixpoint:
(* this bad declaration should be rejected *)
type t = Loop of t [@unboxed] | Other
(* because `Other` and `Loop Other` would have the same representation *)
(* this weird declaration could be accepted
type t = Loop of t [@unboxed]
(* because it is just an empty type *)"Good cycle" types have no value of their own and they can go to hell as far as we are concerned. However, Stephen Dolan (@stedolan) pointed out that they can be introduced by revealing the definition of an abstract type: type t = Loop of abstract [@unboxed] (accepted, even though we could have abstract = t). This introduces a case where revealing type abstractions makes a definition rejected, breaking some reasonable properties of the type system.
It would be nice to put in the work to distinguish "good cycles" from "bad cycles" to avoid this small defect of the current strategy of rejecting all cycles.
It is unsound in presence of the flat-float-array optimization to have types that contain both floating-point values and non-floating-point values. Our current approach does not enforce this, so it will allow this unsound definition:
type t = Float of float [@unboxed] | OtherOne possibility to fix this:
- Extend the domain of shapes, in addition to "imm" and "block" we add a "separated" domain, which is a single bit (true/false), which indicates whether the set of values we approximate is separated.
(a set of OCaml values is "separated" if it contains either (1) only
floatvalues or (2) nofloatvalue.) - When taking the disjoint union of two shapes, compute the "separated" bit (take the conjunction of the two input bits, and also set to 'false' if the resulting domain contains both 'Obj.double_tag' and something else).
- Existential variables of GADT constructors do not get the current "top" shape (separated=true), but a "non-separated top" with separated=false.
- If a type declaration has a non-separated final shape, reject the declaration.
Question: would it be possible to just get rid of the current separability computation for type declarations, and use this shape analysis instead? (This would be a nice simplification to the type-checker codebase.) We could try this by implementing this logic, disabling the pre-existing 'separability' check, and running the separability testsuite.
Remark: without this extension, shapes have a natural denotational semantics, where each shape is interpreted by a set of OCaml values.
interp(s) = { v | head(v) ∈ s }
With this extension, shapes can be interpreted as sets of sets of OCaml values:
interp(s) = { S | ∀v∈S, head(v) ∈ s /\ s.separated <=> is_separated(S) }
We know about some improvements that we are interested in working on eventually, but we believe are out of scope for this PR.
If a type definition is rejected due to shape conflict, it would be nice to give a clear explanation to the user, with the location of the two constructors (possibly both in different variant declarations than the one being checked) that introduce the incompatibility. One way to do this is to track, for each part of the shape, the location of the constructor that introduced it in the shape.
Finer-grained shape computations for GADTs: to compute the shape of a type foo t when t is a GADT, instead of taking the disjoint union of the shape of all constructors, we can filter only the constructors that are "compatible" with foo (we cannot prove that the types are disjoint). This is simple enough and gives good, interesting results in practice, but it also requires refining the termination-checking strategy for GADTs (in addition to the head type constructor, we need to track the set of compatible constructors for the particular expansion that was done). For now we punt on this by not doing any special handling of GADTs, even though it is less precise.
If an abstract type is meant to be defined through the FFI, the OCaml-side could come with an annotation restricting its shape from the pessimal default "{ (Imm, _); (Block, _) }" to the actual shape of FFI-produced values for this type. This requires blindly trusting the programmer, but this blind trust is the foundation of our FFI model already.
Having shape annotations on abstract types means that shape-inclusion has to be checked when verifying that a type definition in a module matches a type declaration in its interface. (Currently we don't have any work there, because means of abstractions always either preserve the head shape or turn it into "top".)
Another information we could add to our "head shape" approximation is the size of the block. For example, in theory we could allow this:
type 'a pair = ('a * 'a)
type 'a triple = ('a * 'a * 'a)
type 'a t =
| Pair of 'a pair [@unboxed] (* tag 0, size 2 *)
| Triple of 'a triple [@unboxed] (* tag 0, size 3 *)The pattern-matching compilation strategy could be improved in some cases, at the cost of substantial changes to our current implementation. Consider type t = Int of int [@unboxed] | String of string [@unboxed] and u = T of t [@unboxed] | Float of float [@unboxed]. A matching of the form
match arg with
| Float _ -> 0
| T (Int _) -> 1
| T (String _) -> 2will generate two nested switches:
match-shape arg with
| (Block, Obj.double_tag) -> 0
| (Imm, _) | (Block, Obj.string_tag) ->
match-shape arg with
| (Imm, _) -> 1
| (Block, Obj.string_tag) -> 2
(This is the exact same generated-code shape that we would have without unboxing, except that one dereference is gone, but the first switch is now more expensive as the now-unboxed input values are not in a small consecutive interval of tags.)
It would of course be better to generate a single switch instead. We see two approaches to do this:
- We could "merge" some switches after the fact.
- We could remove unboxed constructors much earlier in the pattern-matching compiler, before matrix decomposition, so that morally
T (Int _)is handled as having a single head constructorT ∘ Int.
We are not considering either of those ambitious changes a priority, because:
- We expect the performance wins from unboxing (if any) to come from tighter memory representations; the pattern-matching branches are comparatively unexpansive
- In practice most examples of unboxed constructors we have in mind do not hit this bad situation.
In particular, if you unboxed a constructor with only immediate
values, then the generated code will split the query space by checking
Obj.is_int, which is the optimal strategy in any case (what the
native compiler will do for the merged switch as well). For example,
these produce exactly the same native code:
(* what we get today *)
match-shape arg with
| (Imm, _) -> -1
| (Block, 0) | (Block, 1) ->
match-shape arg with
| (Block, 0) -> 0
| (Block, 1) -> 1
(* what switch-merging would give *)
match-shape arg with
| (Imm, _) -> -1
| (Block, 0) -> 0
| (Block, 1) -> 1The two forms get compiled to the following
if Obj.is_int arg then -1
else match Obj.tag arg with
| 0 -> 0
| 1 -> 1