Skip to content

use [raw_field] primitives in camlinternalMod#9691

Merged
xavierleroy merged 6 commits intoocaml:trunkfrom
gasche:closure-repr-camlinternalMod
Jun 19, 2020
Merged

use [raw_field] primitives in camlinternalMod#9691
xavierleroy merged 6 commits intoocaml:trunkfrom
gasche:closure-repr-camlinternalMod

Conversation

@gasche
Copy link
Copy Markdown
Member

@gasche gasche commented Jun 17, 2020

Looking at #9690 (changes to caml_{alloc,update}_dummy for the new closure representation of #9619), I have the impression that camlinternalMod.update_mod needs to be updated as well. Here is a naive proposal.

(cc @xavierleroy @jhjourdan)

@gasche gasche force-pushed the closure-repr-camlinternalMod branch from 638bdb3 to 0905589 Compare June 17, 2020 12:23
assert (Obj.size o >= Obj.size n);
for i = 0 to Obj.size n - 1 do
Obj.set_raw_field o i (Obj.raw_field n i)
done
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to reason about correctness on fields that are not the code pointer. I have the impression that we are doing a raw write, when we probably should use a proper write barrier. If this is indeed the case, then the fix would be to loop over the precise closure representation, instead of handling all fields as raw data. (I don't think there is OCaml-side code doing this yet, and it might benefit from some auxiliary functions in Obj.)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a type Obj.Closure.info = { arity: int; start_env : int }, which is now used to use set_raw_field on closure metadata fields and set_field on the others.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that closures need as special case. However this code is not good for the reason you mention.

Here I think it's enough to copy field 0 using raw-field accesses and all other fields using normal accesses. In block o there is only one code pointer at offset 0, the rest is integers. As mentioned elsewhere, caml_modify (i.e. Obj.set_field) of a code pointer over an integer is safe.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your new code you still need to justify why you're taking the startenv of the new block and not the one of the old block.

Copy link
Copy Markdown
Member Author

@gasche gasche Jun 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In block o there is only one code pointer at offset 0, the rest is integers.

My understanding is that overwrite_closure may be called with n an arbitrary closure and o being the template value of the init_mod case:

Obj.repr (fun _ -> raise (Undefined_recursive_module loc))

Here we have a non-integer closure field for loc.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your new code you still need to justify why you're taking the startenv of the new block and not the one of the old block.

Can I start by filling the segment [o.start_env; n.start_env] with unit values?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well spotted. I stand corrected. But I think this is also a problem with your approach.

There are TWO kinds of assignments between fields that must be avoided:

  • assigning via caml_modify (i.e. Obj.set_field) when the old value is a code pointer
  • assigning directly (i.e. Obj.set_raw_field) when the old value is a pointer within the major heap, because then it may escape marking even though it is live elsewhere. (This is why caml_modify does a caml_darken on the old value.)

If you Obj.set_raw over the field of o that contains the loc free variable, you're running into the second forbidden case.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One solution is to first set fields 1...size - 1 of o to an integer value

  Obj.set_field o i (Obj.repr 0)

so that the loc value is properly seen by the GC.

Then you can do your copy with raw assignments up to startenv(n), and normal assignments above.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  let n_start_env = Obj.Closure.((info n).start_env) in
  let o_start_env = Obj.Closure.((info o).start_env) in
  (* if the environment of n starts before the one of o,
     clear the raw fields in between. *)
  for i = n_start_env to o_start_env - 1 do
    Obj.set_raw_field o i Nativeint.one
  done;
  (* if the environment of o starts before the one of n,
     clear the environment fields in between. *)
  for i = o_start_env to n_start_env - 1 do
    Obj.set_field o i (Obj.repr ())
  done;
  for i = 0 to n_start_env - 1 do
    (* code pointers, closure info fields, infix headers *)
    Obj.set_raw_field o i (Obj.raw_field n i)
  done;
  for i = n_start_env to Obj.size n - 1 do
    (* environment fields *)
    Obj.set_field o i (Obj.field n i)
  done;
  ()

what do you think?

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Jun 17, 2020

The Obj.Closure submodule in this PR could possibly be extended a bit, but carefully. At first I defined an abstract type code_ptr with code_ptr : Obj.t -> code_ptr and set_code_ptr : Obj.t -> code_ptr -> unit with the trace code in mind. Then I realized that the setter might distress the multicore and/or flambda people, so I got rid of it; for now it only exposes read-only operations.

Copy link
Copy Markdown
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overwrite_closure is getting pretty robust, that's nice. The last set of assignments could be avoided (see below), but that's a performance optimization, not a correctness issue.

I'm afraid there is still an issue with the construction of the initial, dummy closure.

@jhjourdan
Copy link
Copy Markdown
Contributor

@gasche: I see you mentioned me. However, I don't know anything about how camlinternalMod works, and I don't have a good understanding of #9619.... Hence I don't feel like I have expertise here.

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Jun 17, 2020

Hence I don't feel like I have expertise here.

Yet!

@gasche gasche force-pushed the closure-repr-camlinternalMod branch from 9b0bfae to 367e7e4 Compare June 17, 2020 14:20
Copy link
Copy Markdown
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, except a signed/unsigned error in the extraction of the "arity" field.

stdlib/obj.ml Outdated
Comment on lines +85 to +92
(* the nativeints below are unsigned, but we know they can
always fit an OCaml integer so we use [to_int]
instead of [unsigned_to_int]. *)
let arity =
if Sys.word_size = 64 then
to_int (shift_right_logical info 56)
else
to_int (shift_right_logical info 24)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arity is signed (negative arity = tupled function), so please use shift_right and update the comment.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This should now be fixed.

runtime/obj.c Outdated
}

if (tg == Closure_tag) {
/* Closinfo_val is the seconnd field, so we need size at least 2 */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "seconnd".

Cosmetic: switch (tg) { case Closure_tag: ... case String_tag: ... } could look good. Maybe.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did the switch (haha) in a separate commit, let me know if you like it.

@gasche gasche force-pushed the closure-repr-camlinternalMod branch from 367e7e4 to 65270a9 Compare June 17, 2020 20:48
@gasche
Copy link
Copy Markdown
Member Author

gasche commented Jun 17, 2020

I fixed the remaining comments, rebased the commit history, and added a Changes entry (common with #9690 and #9681).

Copy link
Copy Markdown
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new "switch" is good, but carried you too far! See explanation below.

Comment on lines +121 to +136
case Custom_tag: {
/* It is difficult to correctly use custom objects allocated
through [Obj.new_block], so we disallow it here. The first
field of a custom object must contain a valid pointer to
a block of custom operations. Without initialisation, hashing,
finalising or serialising this custom object will lead to
crashes. See #9513 for more details. */
caml_invalid_argument ("Obj.new_block");
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry to be difficult, but the Custom_tag case must be rejected BEFORE doing caml_alloc. The reason is as follows: if the block is big enough, it is allocated in the major heap, with tag Custom_tag but an invalid "operations" field. Then, we raise an exception, so the block is unreachable. So, at the next major GC sweep, it will be collected, and the sweeper will look into the "operations" field to see if there is a finalization function. This will segfault.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I noticed that the allocation would happen, but I figured that this would be fine given that this exception corresponds to a programming error, that is not supposed to be handled by the program. Your point about the segfault is very reasonable :-)

For the other failure cases in the function, there is no such issue: there are failure cases that can lead to ill-formed values on the heap, but they are not ill-formed enough that it would be a problem for the GC, which is the only case we have to consider for this allocation that is dead on arrival.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xavierleroy I moved back the Custom_tag check, but then I pushed a second commit that moves the allocation within the switch, so that it is always after the failure test. Let me know what you prefer.

@gasche gasche force-pushed the closure-repr-camlinternalMod branch from 7a2ca1d to df5658d Compare June 18, 2020 12:21
Copy link
Copy Markdown
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approving because the code looks correct to me and to stop you from torturing it some more :-)

gasche added 3 commits June 18, 2020 15:23
This change is careful to avoid writing a value into what was
previously a raw field or conversely, clearing fields that change
category first.

We also clear the end of the block, to make it easier to reason about
lifetime of values that could have been referenced there. (We don't
expect this to make a different in practice.)
@gasche gasche force-pushed the closure-repr-camlinternalMod branch from df5658d to 4cde684 Compare June 18, 2020 13:23
@gasche gasche force-pushed the closure-repr-camlinternalMod branch from 4cde684 to 514d2ff Compare June 18, 2020 13:24
@xavierleroy
Copy link
Copy Markdown
Contributor

There was a suspicious Travis failure before, so let's wait until T&A succeed.

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Jun 18, 2020

The CI was failing with a weird failure (it claimed that it did not know about to_int but there was an open Nativeint above). The Obj->Nativeint dependency was missing in the .depend, so I refreshed it, but I am skeptical that this was the reason.

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Jun 18, 2020

Now Travis is green but AppVeyor fails (when trying to build FlexDLL, it seems) with a strange error that does look related to this PR (but may be an issue with the PR that introduced Obj.raw_field, or the AppVeyor setup):

File "reloc.ml", line 1:
Error: Error while linking ../boot\stdlib.cma(Stdlib__obj):
       The external function `caml_obj_raw_field' is not available
make[1]: *** [Makefile:140: flexlink.exe] Error 2

@xavierleroy
Copy link
Copy Markdown
Contributor

I'll run CI precheck on this PR. Concerning the Appveyor failure, I don't know the details of the "build FlexDLL while we're building OCaml" procedure, so I'm mentioning @dra27.

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Jun 18, 2020

(precheck seems happy, the only failure was also present in other builds.)

@xavierleroy
Copy link
Copy Markdown
Contributor

It looks like the "build FlexDLL while we're building OCaml" procedure assumes a properly-bootstrapped system and will fail if boot/ocamlc doesn't know about newly-added primitives. I'll discuss that on caml-devel.

@xavierleroy xavierleroy merged commit bdbf5c3 into ocaml:trunk Jun 19, 2020
@gasche
Copy link
Copy Markdown
Member Author

gasche commented Jun 23, 2020

Note: the flexdll build failure was fixed in #9700.

xavierleroy pushed a commit that referenced this pull request Oct 4, 2023
This module was introduced in #9691, for use in CamlinternalMod, but rendered obsolete by #10205.
sadiqj pushed a commit to sadiqj/ocaml that referenced this pull request Oct 8, 2023
This module was introduced in ocaml#9691, for use in CamlinternalMod, but rendered obsolete by ocaml#10205.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants