Don't use the page table when scanning closures by mshinwell · Pull Request #1156 · ocaml/ocaml

mshinwell · 2017-04-28T14:16:01Z

This is related to #203 , which relates to an overall goal of removing the page table completely when in "no naked pointers" mode (and, in due course, for multicore). When that goal has been reached I am expecting approximately 5-10% performance improvement over the existing page table mode.

At present it is not possible to scan closures without the page table (or alternatively the code fragments table) because there is an ambiguity between environment entries and Infix_tag separators. This means that the positions of code pointers cannot be determined. Here is an example of the ambiguity. The following closure relates to only one function:

Closure_tag | Code ptr 1 | Arity = 2 | Code ptr 2 | FV1 | FV2 | FV3

and we suppose the value of the first free variable, when encoded as an OCaml integer, happens to equal Infix_tag. This looks just like the following closure which relates to two closed functions:

In #203 an attempt was made to add GC headers before the start of functions' instructions to solve this problem. That proved to be complicated on one architecture and on x86-64 appears to have a surprising penalty -- on the cpdf libraries for example code size was increased by 5%. This is not helped that it is difficult to efficiently pack headers before the start of functions that must be 16 byte aligned, since no assembler directive exists to "align to the next 8-byte boundary that is not a 16-byte boundary".

This patch solves the problem another way, which I think is overall better (benchmarks in progress); it simply uses the bottom bit of the arity field (after having de-tagged it) to indicate whether the corresponding function is the last one in the closure block. This means that the GC can skip right over all of the code pointers and arities, landing at the start of any environment which may be present. I claim the performance impact should be minimal since the only place that appears to examine arities is the caml_apply family of functions, which now only gain two simple arithmetic instructions.

I mentioned above about the code fragments table, which might provide another solution to this problem, namely by checking every non-immediate member of a closure against it. However I think in general such tables are fragile and best avoided; indeed, there appear to be at least two different idioms for use of the code fragments table in the compiler code already. byterun/extern.c only checks the table when locating a code pointer, whereas Is_in_code_area also checks the page table. Furthermore, checking of the code fragments table is likely to be slow in the presence of a large number of dynamically-linked objects.

chambart

I think this is a good change and seems to be ok.

But I would push it a bit further. Having different native and bytecode value representation makes the GC more complex. I don't see a good reason not to addopt this change in bytecode too. Of course this would require some changes to the VM, but I don't think that it would break anything.

Also, rather than using a single bit to mark if the next value is the beginning of the environment part of the closure, you could use the most significant bits (or least ones) to store how far the environment is from the current position. This would simplify distance_to_environment and probably make it fast enough to remove the #ifdef in caml_oldify_one. Restricting functions to at most 32000 arguments and closures to 65000 functions on 32 bits systems does not feel like too bothersome.

chambart · 2017-05-02T14:30:58Z

byterun/minor_gc.c

+mlsize_t caml_distance_to_closure_environment(value closure)
+{
+#ifndef NATIVE_CODE_AND_NO_NAKED_POINTERS
+  return 0;


This function seems to be correct without NO_NAKED_POINTERS.
The ifndef should only be on NATIVE_CODE even if it is unused

chambart · 2017-05-02T14:38:43Z

byterun/minor_gc.c

+    CAMLassert ((pos - start) < Wosize_val(start));
+
+    arity = Long_val(pos[1]);
+    finished = (arity & (uintnat) 1) != 0;


This should be 'documented' as an arity manipulating macro (or static inline function), probably in mlvalues.h.
#define Last_function_of_closures(x) ...

chambart · 2017-05-02T14:39:28Z

byterun/minor_gc.c

+
+    arity >>= 1;
+
+    if (arity <= 1) {


Similarly this should probably be a Next_function_of_closure macro

chambart · 2017-05-02T14:42:49Z

asmcomp/cmmgen.ml

               [alloc_closure_header 5 Debuginfo.none;
                Cconst_symbol(name1 ^ "_" ^ string_of_int (num+1));
-                int_const (arity - num - 1);
+                int_const (((arity - num - 1) lsl 1) lor 1);


This should use the arity_word function.

chambart · 2017-05-02T14:43:01Z

asmcomp/cmmgen.ml

                [alloc_closure_header 4 Debuginfo.none;
                 Cconst_symbol(name1 ^ "_" ^ string_of_int (num+1));
-                 int_const 1; Cvar arg; Cvar clos],
+                 int_const ((1 lsl 1) lor 1); Cvar arg; Cvar clos],


Same as above

mshinwell · 2017-05-02T15:21:14Z

Thanks for the review. I will think about changing the arity field representation as you suggest.

I believe @stedolan is in favour of harmonising the closure representations too. I will discuss that with him later this week, there may be some code already in existence for this.

stedolan

I think this is a good idea. In particular, something like this is necessary to spot code pointers when marshalling with marshalling of closures enabled.

I'm not sure that the arity field is the right place to steal a bit from, since it doesn't exist on bytecode. Consider stealing a bit from the Infix_tag word instead, where there are lots of free bits (although, then you'll need to figure out how to distinguish closures with no infix objects from closures with just one).

stedolan · 2017-05-02T16:17:33Z

byterun/minor_gc.c

+        for (i = 0; i < sz; i++) {
+          value field = Field (v, i);
+          if (i >= first_part && Is_block (field) && Is_young (field)) {
+            /* This seems most unlikely to cause a stack overflow. */


Hughes-style difference lists will cause this to stack overflow:

type 'a dlist = 'a list -> 'a list let singleton x = fun tail -> x :: tail let concat a b = fun tail -> a (b tail)

A long list consists of deeply nested closures.

Ah, indeed. I'll fix that then.

mshinwell · 2017-05-02T16:25:34Z

I think we could just introduce the arity word for bytecode as well.

xavierleroy · 2017-05-02T16:28:25Z

Generally speaking: I am very much in favour of changing the in-memory representation of closures to simplify block scanning or other operations of the runtime system. However, I'd like to see some design discussions and some exploration of the design space before discussing pull requests that implement a particular design.

stedolan · 2017-05-02T16:30:07Z

I think we could just introduce the arity word for bytecode as well.

That's a better plan.

Incidentally, the link is wrong in the description - you meant #203, not #208.

xavierleroy · 2017-05-02T16:55:52Z

Continuing to think aloud: as an example of "(re-) exploring the design space", now might possibly be a good time to give up on infix pointers, Infix_tag, and the associated complexity, and go back to a pre-SML/NJ world where closures for mutually-recursive functions are implemented using cycles between multiple blocks. Just a thought, mind you. But one that we may want to entertain together.

mshinwell · 2017-05-03T11:49:21Z

Here are some options that come to mind:

Put GC headers before function entry points. I've experimented with this; I think the code size penalty (5% on cpdf x86-64 for example) seemed too large, and performance implications were a bit uncertain. It's also potentially problematic on architectures with complicated ABIs. I don't think this is worth pursuing, even though as mentioned above, it is in theory possible to reduce the code size penalty.
Do something like what is in this pull request, by adjusting the arity field so that the start of the environment can be identified. This is a minor change that seems neutral for performance and code size. (The only place decoding the arity fields is the caml_apply functions; the overhead of the arithmetic will be negligible and there isn't any noticeable overall code size increase since these functions are never inlined.)
Abandon the Infix_tag business altogether. I believe @stedolan is in favour of this. I'm a bit wary of going down this path, personally, as it doesn't seem that unlikely to be found useful for other things in the future. As an implementation remark, Spacetime's in-memory representation currently makes use of this tag.

Even if we decide on something like option 3, I'd like to argue for doing something like option 2 in the short term, simply because this issue has been dragging on for a long time and it would be nice to bring it to a resolution. We are sufficiently pressed for manpower that I don't reckon implementing option 3 at the present time can be justified.

There is a separate issue pertaining to the harmonisation of the closure representation between bytecode and native. My current thoughts on this are to simply adopt the native representation in bytecode.

stedolan · 2017-05-12T10:36:10Z

I am indeed in favour of getting rid of Infix_tag - surprisingly, most of the really odd segfaults during development of the multicore GC were not weird concurrency bugs but failure to specially handle Infix_tag. For instance, Wosize_val(v) tells you how long a block is (but for Infix_tag), and Color_val(v) will tell you whether it has been marked (but for Infix_tag). In the minor GC, oldify_one returns an object with a valid header (but for Infix_tag).

More speculatively, I'd like to eventually be able to mix bytecode and native code in the same process, especially to be able to load native libraries in a bytecode toplevel. Making the closure representations agree would be a good step in this direction.

damiendoligez

The patch looks OK in general, except for the stack overflow, but it looks like you're going to refactor anyway.

damiendoligez · 2017-10-11T09:15:18Z

asmcomp/cmmgen.ml

+              match rem with
+              | [] -> true
+              | _::_ -> false
+            in


You could write this as

let last_one = rem = [] in

damiendoligez · 2017-10-11T09:15:22Z

asmcomp/cmmgen.ml

+            | [] -> true
+            | _::_ -> false
+          in
+          let arity_word = (f2.arity lsl 1) lor (if last_one then 1 else 0) in


Same here, or make last_one an integer and avoid double-testing?

damiendoligez · 2017-10-11T09:25:21Z

byterun/minor_gc.c

+  CAMLassert(Tag_val(closure) == Closure_tag);
+  CAMLassert(Wosize_val(closure) >= 2);
+
+  start = (value*) closure;


Should be &Field(closure, 0)

damiendoligez · 2017-10-11T09:30:48Z

byterun/minor_gc.c

+  start = (value*) closure;
+  pos = start;
+
+  while (!finished) {


You could refactor this loop to avoid testing finished twice, either as a for loop, or as a while(1) loop with if (finished) break; in the right place. I'd find the latter more readable, but it's probably a matter of taste.

kayceesrk · 2018-09-14T10:58:19Z

I am in favour of this PR. This should be sufficient to unblock marshalling on multicore.

In the long term, it would be useful to remove Infix_tag as it has proved to be a constant source of bugs during the implementation of the multicore GC. But since that ford was crossed, I find less motivation to remove Infix_tag in the short term.

mshinwell · 2018-09-14T12:23:02Z

Right, so, it's been more than a year since this has been stalled pending any further design discussions. As far as I know no-one has had the time to try implementing closure environments in a different form during this period. My preferred way forward is to see option 2 above, which corresponds to this patch, approved in principle. I can then address @damiendoligez 's comments above. We can always consider changing the representation in more major ways at a later date.

kayceesrk · 2018-10-16T16:35:40Z

Any updates on this PR. Could we aim to get this in for 4.08?

XVilka · 2019-05-10T03:54:24Z

@kayceesrk as practice shown - no.

mshinwell · 2020-04-20T16:33:53Z

We're now exploring use of a different (and self-describing) closure representation that would supercede this work.

mshinwell added 2 commits April 28, 2017 14:44

Don't use the page table when scanning closures

8066486

Changes

7e361a8

mshinwell added the multicore-prerequisite label Apr 28, 2017

mshinwell requested review from a user and damiendoligez April 28, 2017 14:16

chambart reviewed May 2, 2017

View reviewed changes

stedolan reviewed May 2, 2017

View reviewed changes

stedolan mentioned this pull request May 2, 2017

Marshalling and unmarshalling ocaml-multicore/ocaml-multicore#75

Closed

gasche added the caml-weekly-news label May 12, 2017

mshinwell mentioned this pull request Jun 7, 2017

Improved register typing #1192

Closed

damiendoligez reviewed Oct 11, 2017

View reviewed changes

damiendoligez added this to the 4.07-or-later milestone Oct 11, 2017

damiendoligez added the high-priority label Jun 5, 2018

damiendoligez removed this from the consider-for-4.07 milestone Jun 5, 2018

samwgoldman mentioned this pull request Oct 9, 2018

Invert page table check in polymorphic hash with no-naked-pointers #2095

Closed

damiendoligez self-assigned this Sep 10, 2019

mshinwell mentioned this pull request Sep 26, 2019

Towards a new closure representation (native code) #8984

Closed

mshinwell closed this Apr 20, 2020

xavierleroy mentioned this pull request May 29, 2020

A self-describing representation for function closures #9619

Merged

stedolan added a commit to stedolan/ocaml that referenced this pull request Mar 21, 2023

Shrink the ref_table if it grows large (ocaml#1156)

150402e

EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024

Add OCaml.org Newsletter for March (ocaml#1156)

ee382f1

EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024

Add OCaml.org Newsletter for March (ocaml#1156)

9cdb15f

Conversation

mshinwell commented Apr 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chambart left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mshinwell commented May 2, 2017

Uh oh!

stedolan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mshinwell commented May 2, 2017

Uh oh!

xavierleroy commented May 2, 2017

Uh oh!

stedolan commented May 2, 2017

Uh oh!

xavierleroy commented May 2, 2017

Uh oh!

mshinwell commented May 3, 2017

Uh oh!

stedolan commented May 12, 2017

Uh oh!

damiendoligez left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kayceesrk commented Sep 14, 2018

Uh oh!

mshinwell commented Sep 14, 2018

Uh oh!

kayceesrk commented Oct 16, 2018

Uh oh!

XVilka commented May 10, 2019

Uh oh!

mshinwell commented Apr 20, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

mshinwell commented Apr 28, 2017 •

edited

Loading

chambart left a comment •

edited

Loading