A self-describing representation for function closures by xavierleroy · Pull Request #9619 · ocaml/ocaml

xavierleroy · 2020-05-29T16:47:18Z

(This PR is a resurrect of #1156 following the suggestions made in comment #1156 (review).)

The context

Today, a closure for a function or a set of mutually-recursive functions is a heap block with the following structure:

closure ::= entrypoint (infix-header entrypoint)* value*

The values are the environment part of the closure: the values of the free variables. Each entrypoint is a single code pointer in bytecode, and a 2- or 3- word record in native code:

entrypoint ::= code-pointer                         (in bytecode)

entrypoint ::= code-pointer  arity=1                (in native code)
             | code-pointer  arity<>1  code-pointer

Note that the closure really consists in two consecutive parts:

the entrypoints part, comprising code pointers and tagged integers (infix headers look like tagged integers)
the environment part, comprising well-formed OCaml values.

The problem

Given a closure block at run-time, there is no way to determine which well-formed OCaml values it contains. More precisely, there is no way to know where the entrypoints end and where the environment starts, since the first value of the environment can be a tagged integer that looks like an infix header.

This is problematic for the GC in particular, which needs to follow OCaml values that are pointers into the OCaml heap, but must not follow code pointers. Today, we use a page table to distinguish the two kinds of pointers. But we want to get rid of the page table (the "no naked pointers" mode), for better performance and for Multicore OCaml compatibility.

The solution

The solution implemented in this PR adds to each "entrypoint" record the position of the environment, i.e. the field number for the first value of the value sequence that represents the environment.

In native code we just steal some bits from the arity field:

entrypoint ::= code-pointer closure-info                (with arity = 1)
             | code-pointer closure-info code-pointer

closure-info ::=  arity (8 bits) . start-of-environment (wordsize - 9 bits) . 1

(#9620 ensures that ocamlopt never generates functions with arities that would not fit in 8 bits.)

In bytecode, we need to add a closure-info word to each entrypoint, making closures a bit larger:

entrypoint ::= code-pointer closure-info                (with arity = 0)

(The '0' value for the arity field is arbitrary, as the abstract machine does not need to know the arity of a function from its closure.)

This way, given a closure value v, the field number where the environment starts is the "start-of-environment" bits of Field(v, 1).

Example: native-code, two mutually recursive functions of arities 2 and 1, two free variables:

code1  {arity=2,start-env=6}  code1'  infix-hdr  code2  {arity=1,start_env=2}  freevar1  freevar2

The pull request

It is best reviewed commit by commit:

9f59d7f changes the representation of closures in the bytecode system and 8e97e63 does the same for the native-code compiler.
cd2ed2b shows how the GC can use the new "start of environment" information to skip over the entrypoints parts of closures. Currently it's done only in no-naked-pointers mode, but would work in naked-pointers mode too.
3d1a411 and 95bd982 further simplify the GC in no-naked-pointers mode.
14c96d1 tweaks a silly test and 7fbb556 is the obligatory bootstrap when bytecode compilation changes.

FAQ

To speed up reviewing, here are some questions and answers.

Q: What are the differences with #1156 ?

A: #1156 used a single bit in the arity words to mark the last entrypoint in the sequence of entrypoints. Consequently, jumping to the environment part required scanning the entry points one after the other. This PR turns this into a constant-time operation. (As suggested by @chambart in #1156 (review) .)

Q: What are the differences with #8984 ?

A: #8984 explores another closure representation strategy, where each function has its own heap block, containing only one entry point. This is a much bigger change in the compilers and the run-time system, with potential for bigger simplifications (getting rid of infix pointers entirely), but the various representation choices still need to be explored and benchmarked, so it will not be ready immediately. The present PR is the quick, minimal change that suffices for Multicore OCaml.

Q: What is the impact on execution time (GC not included)?

A: In native code, negligible: one extra shift instruction when applying an unknown function to 2 or more arguments. In bytecode, closures become bigger and may take a little longer to create.

Q: What is the impact on GC time?

A: Currently, nothing changes in naked-pointers mode. More generally, skipping over the entrypoints part of closures rather than scanning them might give a small speedup for the major GC.

Q: Why not steal bits in infix headers to store the "start of environment" info?

A: There is room there, but for one function we don't have an infix header. (N-1 infix headers for a N-function closure.)

Q: Was it absolutely necessary to increase the size of bytecode closures?

A: I could not find a nice solution that has only one code pointer per entry point and no closure info field. Plus, having similar closure representations in bytecode and in native-code might simplify other things in the future.

Q: Why only 8 bits for the arity?

A: So that it leaves enough bits for the "start of environment" field to represent any valid position in any heap block, even in 32 bit machines. Plus, it's not a limitation in practice.

Q: But what happens if I have a curried function of 300 arguments?

A: Right now, an assertion failure at compile-time. With #9620, your function will be compiled to a 126-argument function returning a 126-argument function returning a 48-argument function.

Q: Why the 8 most significant bits for the arity and not bits 1 to 8?

A: The arity is a signed integer (negative arity = tupled function), so a single arithmetic shift right extracts the arity with the correct sign.

Q: How far are we from a "no-naked-pointer" mode that does not need the page table at all?

A: In native code I think we're almost there. In bytecode, some more work remains (code pointers are used as naked-pointer values in many places).

This commit introduces a quantity Lambda.max_arity that is the maximal number of parameters that a Lambda function can have. Uncurrying is throttled so that, for example, assuming the limit is 10, a 15-argument curried function fun x1 ... x15 -> e becomes a 10-argument function (x1...x10) that returns a 5-argument function (x11...x15). Concerning untupling, a function that takes a N-tuple of arguments, where N is above the limit, remains a function that takes a single argument that is a tuple. Currently, max_arity is set to 126 in native-code, to match the new representation of closures proposed in ocaml#9619. A signed 8-bit field is used to store the arity. 126 instead of 127 to account for the extra "environment" argument. In bytecode the limit is infinity (max_int) because there are no needs yet for a limited number of parameters.

kayceesrk · 2020-05-30T10:01:25Z

Q: How far are we from a "no-naked-pointer" mode that does not need the page table at all?
A: In native code I think we're almost there.

Could you mention where the page table is still needed in no-naked-pointer mode?

xavierleroy · 2020-05-30T17:02:24Z

Could you mention where the page table is still needed in no-naked-pointer mode?

The things you make me do... I checked the remaining calls to caml_page_table_lookup in the native-code runtime system compiled in no-naked-pointers mode. Here they are:

array.c
    function caml_make_vect: Is_in_value_area
    function caml_make_array: Is_in_value_area

compact.c
    function invert_pointer_at: Is_in_heap

extern.c
    function extern_rec: Is_in_value_area

finalise.c
    function generic_final_register: Is_in_heap_or_young
    + lots of assertions

globroots.c
    function classify_gc_root: Is_in_heap

hash.c
    function caml_hash: Is_in_value_area
    function hash_aux: Is_in_value_area

major_gc.c
   function mark_ephe_aux: Is_in_heap

minor_gc.c
    function caml_oldify_one: Is_in_value_area

obj.c
    function caml_obj_tag: Is_in_value_area
    function caml_lazy_follow_forward: Is_in_value_area
    function caml_obj_reachable_words: Is_in_heap_or_young

signals_nat.c
   function handle_signal: Is_in_code_area

caml/weak.h
   function caml_ephe_clean_partial: Is_in_heap_or_young

Is_in_value_area can be replaced with 1 in no-naked-pointers mode, if I'm not mistaken.

Is_in_code_area could be implemented differently, using the table of code fragments. It is not performance-critical.

The remaining occurrences of Is_in_heap and Is_in_heap_or_young need scrutiny.

mshinwell

Very good. I checked all of the arithmetic and couldn't find any mistakes. I left a couple of minor comments.

Regarding removal of the page table entirely, I don't think we can do this on single core OCaml, because of the presence of the compactor.

asmcomp/cmm_helpers.mli

asmcomp/cmm_helpers.ml

runtime/major_gc.c

mshinwell · 2020-06-02T09:42:24Z

I believe this is ready to merge once the CI passes. Thanks.

damiendoligez

I have reviewed this PR and everything looks good. Note that there is another place where we examine pointers and could avoid looking at the code pointers of closures: the compactor. But we still need the page table there anyway.

@xavierleroy you'll have to rebase to get CI to run.

xavierleroy · 2020-06-05T15:34:26Z

The rebase is a pain because of the bootstrap in the middle. I'm running a round of CI precheck right now, while I figure out what to do.

Add a "closure information" field after each code pointer in a closure. This field generalizes the "arity" field used by the native-code compiler, in that it has room both for an arity (always 0 in bytecode) and for the distance from the closure to the first environment variable in the closure block. This makes closures "self-described" and easy to scan for pointers: everything up to the first env var is out-of-heap code pointers or integers; everything after the first env var is a well-formed value. At this point a bootstrap is needed.

ocamlc's computations of offsets in closures changed to match changes in the abstract machine.

In code that builds closures, instead of the old arity field, produce a closure information field encoding arity + position of environment.

Expected outputs contain integer values for the "closure info" field of some closures. These values differ in 32 and 64 bits, since the arity is stored in top 8 bits. This test would need different expected outputs for 32- and 64-bit platforms. To keep things simple, this commit restricts the test to only run on 64-bit platforms. Since this changes the locations

Here we start reaping the benefits of the new closure representation. The fields of a closure block that contain the code pointers need not be scanned (in general) and must not be scanned (in no-naked-pointers mode). Here, conservatively, we skip them in no-naked-pointers mode only, but it would be sound to skip them unconditionally.

Atoms are zero-sized blocks allocated outside the heap. It simplifies the GC in no-naked-pointers mode if their headers have GC color Black, meaning "don't traverse". In ocamlopt, Black is already used as the color for constant blocks statically allocated outside the heap.

Now that atoms have black headers, all zero-sized blocks (atoms or ocamlopt-generated static data) have black headers and will not be traversed or changed by the major GC.

xavierleroy · 2020-06-05T16:19:17Z

CI precheck: OK. Rebase from hell: done. Waiting for Travis&Appveyor.

xavierleroy · 2020-06-05T16:22:19Z

Kicking Travis.

xavierleroy · 2020-06-05T16:26:00Z

Travis is not run, and Appveyor is pending as usual, so I'll just merge and take a well-deserved week-end.

This commit introduces a quantity Lambda.max_arity that is the maximal number of parameters that a Lambda function can have. Uncurrying is throttled so that, for example, assuming the limit is 10, a 15-argument curried function fun x1 ... x15 -> e becomes a 10-argument function (x1...x10) that returns a 5-argument function (x11...x15). Concerning untupling, a function that takes a N-tuple of arguments, where N is above the limit, remains a function that takes a single argument that is a tuple. Currently, max_arity is set to 126 in native-code, to match the new representation of closures proposed in ocaml#9619. A signed 8-bit field is used to store the arity. 126 instead of 127 to account for the extra "environment" argument. In bytecode the limit is infinity (max_int) because there are no needs yet for a limited number of parameters.

…9620) This commit introduces a quantity Lambda.max_arity that is the maximal number of parameters that a Lambda function can have. Uncurrying is throttled so that, for example, assuming the limit is 10, a 15-argument curried function fun x1 ... x15 -> e becomes a 10-argument function (x1...x10) that returns a 5-argument function (x11...x15). Concerning untupling, a function that takes a N-tuple of arguments, where N is above the limit, remains a function that takes a single argument that is a tuple. Currently, max_arity is set to 126 in native-code, to match the new representation of closures implemented by #9619. A signed 8-bit field is used to store the arity. 126 instead of 127 to account for the extra "environment" argument. In bytecode the limit is infinity (max_int) because there are no needs yet for a limit on the number of parameters.

jhjourdan · 2020-06-17T06:24:54Z

I was also caught up by the bug found in #9681, which led me to do a grep Closure_tag in the runtime sources: apparently caml_alloc_dummy_infix still allocates a block with this tag, but without taking care of the closinfo field. Is this intended?

It turns out that the closinfo field is initialized by caml_alloc with Val_unit, which (by chance?) corresponds to a "somewhat valid" value of the closinfo field (if my understanding is good). If this is intended, then a comment would at the very welcome!

xavierleroy · 2020-06-17T07:12:39Z

Re caml_alloc_dummy_infix: no, it's not intended, thanks for having spotted it. I also need to check caml_update_dummy carefully.

xavierleroy · 2020-06-21T14:07:46Z

Could you mention where the page table is still needed in no-naked-pointer mode?

Once #9698 is merged, only the compactor will need the page table, more precisely this line of the compactor:

ocaml/runtime/compact.c

Line 77 in 5f45428

if (Ecolor (q) == 0 && Is_in_heap (q)){

Kakadu · 2020-07-29T07:42:05Z

@xavierleroy Do we already have a documentation entry for new closure representation?

I'm not interested in the documentation per se, actually I want to try to implement memoization/hashconsing of partially applied high-order functions. It seems that your changes are directly related to un-(partial applying) because

Given a closure block at run-time, there is no way to determine which well-formed OCaml values it contains. More precisely, there is no way to know where the entrypoints end and where the environment starts, since the first value of the environment can be a tagged integer that looks like an infix header.

Last time when I tried to dig into it, I realized that the representation changes when -flambda is on, and I couldn't to understand it from documentation too.

xavierleroy · 2020-07-29T08:47:53Z

Do we already have a documentation entry for new closure representation?

For the moment, there's the description at the top of this PR #9619 and you can see examples of how to traverse closures here for example:

ocaml/runtime/hash.c

Lines 264 to 283 in 302d735

    
                 case Closure_tag: { 
        
                   mlsize_t startenv; 
        
                   len = Wosize_val(v); 
        
                   startenv = Start_env_closinfo(Closinfo_val(v)); 
        
                   CAMLassert (startenv <= len); 
        
                   /* Mix in the tag and size, but do not count this towards [num] */ 
        
                   h = caml_hash_mix_uint32(h, Whitehd_hd(Hd_val(v))); 
        
                   /* Mix the code pointers, closure info fields, and infix headers */ 
        
                   for (i = 0; i < startenv; i++) { 
        
                     h = caml_hash_mix_intnat(h, Field(v, i)); 
        
                     num--; 
        
                   } 
        
                   /* Copy environment fields into queue, 
        
                      not exceeding the total size [sz] */ 
        
                   for (/*nothing*/; i < len; i++) { 
        
                     if (wr >= sz) break; 
        
                     queue[wr++] = Field(v, i); 
        
                   } 
        
                   break; 
        
                 }

It seems that your changes are directly related to un-(partial applying)

Yes and no. Yes insofar as partial applications produce "interesting" closures, with non-trivial environments. No because the change of representation applies to all closures, not just those arising out of partial applications.

xavierleroy mentioned this pull request May 29, 2020

Limit the number of parameters for an uncurried or untupled function #9620

Merged

xavierleroy force-pushed the new-closure-repr branch from fafdc23 to 71e4f00 Compare May 31, 2020 13:37

mshinwell approved these changes Jun 1, 2020

View reviewed changes

asmcomp/cmm_helpers.mli Outdated Show resolved Hide resolved

asmcomp/cmm_helpers.ml Show resolved Hide resolved

runtime/major_gc.c Show resolved Hide resolved

damiendoligez approved these changes Jun 5, 2020

View reviewed changes

xavierleroy added 8 commits June 5, 2020 17:42

New representation of closures, intermediate bootstrap

a1f2166

ocamlc's computations of offsets in closures changed to match changes in the abstract machine.

New representation of closures, native-code compilation

7f5a137

In code that builds closures, instead of the old arity field, produce a closure information field encoding arity + position of environment.

No need to special-case zero-sized blocks in no-naked-pointers mode

85f5006

Now that atoms have black headers, all zero-sized blocks (atoms or ocamlopt-generated static data) have black headers and will not be traversed or changed by the major GC.

Changes entry for ocaml#9619

59da229

xavierleroy force-pushed the new-closure-repr branch from 042824d to 59da229 Compare June 5, 2020 16:18

xavierleroy closed this Jun 5, 2020

xavierleroy reopened this Jun 5, 2020

xavierleroy merged commit 9d4679f into ocaml:trunk Jun 5, 2020

xavierleroy deleted the new-closure-repr branch June 5, 2020 16:26

This was referenced Jun 7, 2020

Generic hashing for the new closure representation #9648

Closed

Marshaling for the new closure representation #9649

Merged

hhugo mentioned this pull request Jun 10, 2020

Compiler: prepare parse_bytecode for OCaml 4.12 ocsigen/js_of_ocaml#1024

Merged

sadiqj mentioned this pull request Jun 14, 2020

Add closure-info field when reifying bytecode #9681

Merged

xavierleroy mentioned this pull request Jun 16, 2020

Generic hashing for the new closure representation, take 2 #9689

Merged

xavierleroy mentioned this pull request Jun 17, 2020

Adapt caml_alloc_dummy_infix to new closure representation #9690

Merged

gasche mentioned this pull request Jun 17, 2020

use [raw_field] primitives in camlinternalMod #9691

Merged

dra27 mentioned this pull request Jul 14, 2020

CI test no-naked-pointers mode #9766

Merged

dra27 mentioned this pull request Aug 3, 2020

test-locations failure when AFL is enabled #9822

Closed

Conversation

xavierleroy commented May 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The context

The problem

The solution

The pull request

FAQ

Uh oh!

kayceesrk commented May 30, 2020

Uh oh!

xavierleroy commented May 30, 2020

Uh oh!

mshinwell left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mshinwell commented Jun 2, 2020

Uh oh!

damiendoligez left a comment

Choose a reason for hiding this comment

Uh oh!

xavierleroy commented Jun 5, 2020

Uh oh!

xavierleroy commented Jun 5, 2020

Uh oh!

xavierleroy commented Jun 5, 2020

Uh oh!

xavierleroy commented Jun 5, 2020

Uh oh!

jhjourdan commented Jun 17, 2020

Uh oh!

xavierleroy commented Jun 17, 2020

Uh oh!

xavierleroy commented Jun 21, 2020

Uh oh!

Kakadu commented Jul 29, 2020

Uh oh!

xavierleroy commented Jul 29, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

xavierleroy commented May 29, 2020 •

edited

Loading