Skip to content

The end of the page table is near#9698

Merged
xavierleroy merged 6 commits intoocaml:trunkfrom
xavierleroy:less-page-table
Jun 23, 2020
Merged

The end of the page table is near#9698
xavierleroy merged 6 commits intoocaml:trunkfrom
xavierleroy:less-page-table

Conversation

@xavierleroy
Copy link
Copy Markdown
Contributor

(Let's see if a click bait title attracts more reviews...)

This pull request eliminates some more uses of the page table in the runtime system when built in no-naked-pointers mode, in preparation for Multicore OCaml, which has no page table.

In the end, only one use remains in the code of the runtime system (file compact.c), plus a number of uses in assertions in debug mode.

The 5 commits in this PR are well documented in their commit messages. Before reviewing, it can be helpful to re-read the description of the pointer classification macros in https://github.com/ocaml/ocaml/blob/trunk/runtime/caml/address_class.h.

The first commit is the continuation of #9680 by other means.

#9678 must be merged before this one, hence the "draft" status.

Copy link
Copy Markdown
Contributor

@jhjourdan jhjourdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good to me. I have left a few nitpicks here and there.

Comment on lines +77 to +82
#ifdef NO_NAKED_POINTERS
#define Is_in_value_area(a) 1
#else
#define Is_in_value_area(a) \
(Classify_addr(a) & (In_heap | In_young | In_static_data))
#endif
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is, I think, less harmless than it seems. As is more clear in the following commits, it actually changes the behavior of hashing and comparison when it comes to out-of-heap values (such as black-colored blocks allocated by C libraries outside of the heap).

Even though I think this change in behavior is fine, it should be well advertised in (at least) a Changes entry.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is correct that hashing and comparison now work differently over naked pointers in classic mode and the corresponding out-of-heap pointers-with-a-black-header in no-naked-pointers heap. (That was already the case before this PR.) OK for documenting it in a Changes entry or elsewhere, but quite frankly the previous behavior wasn't terribly useful to begin with.

ephemeron_again:
if (child != caml_ephe_none
&& Is_block (child) && Is_in_heap_or_young (child)){
&& Is_block (child) && Is_in_value_area (child)){
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contrarily to what is explained in the commit comment, this is also used in the case the block is white. Fortunately, this is fine, since an out-of-heap pointer should point to a black block.

However:

  • Could you please update the commit comment (and remove the "to be merged", which does not seem to be relevant)
  • This makes the property that out-of-heap pointers should be black more important than I have read somewhere else recently (the argument was that the GC would mark it as black during its first pass if it is left white). Is this clear in the docs (I have not checked)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a more detailed comment.

As to out-of-heap pointers having black headers: yes, this turns out to be a crucial invariant, here and elsewhere, and not just an optimization to prevent useless marking. The "detect naked pointers" variant of OCaml 4.10 that was recently released enforces this invariant.

As to documentation: currently, the "interfacing with C" chapter of the reference manual says that all out-of-heap pointers must be encapsulated (boxing or integer-like tagging), so it is not the place to document the black-header trick.

Comment on lines 87 to 94
#define Is_in_code_area(pc) \
( ((char *)(pc) >= caml_code_area_start && \
(char *)(pc) <= caml_code_area_end) \
|| (Classify_addr(pc) & In_code_area) )

#endif

#define Is_in_static_data(a) (Classify_addr(a) & In_static_data)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps Is_in_static_data could also be made part of the #else.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is_in_static_data will be changed as soon as #9678 is merged. There is one test that uses this macro and needs to be turned off in no-naked-pointers mode, and this work is done as part of #9678.

@jhjourdan
Copy link
Copy Markdown
Contributor

In the end, only one use remains in the code of the runtime system (file compact.c),

Do you have a plan for this one. Is the plan to get rid of the compactor, since the best-fit mode seems to prevent fragmentation enough?

@xavierleroy
Copy link
Copy Markdown
Contributor Author

Is the plan to get rid of the compactor, since the best-fit mode seems to prevent fragmentation enough?

@damiendoligez is thinking of some changes to the compactor that would make the page table unnecessary, under the assumption that statically-allocated blocks have black headers.

It is also possible to rebuild the page table or an equivalent data structure that keeps track of major heap areas at the beginning of every compaction.

Since there are no naked pointers, all pointers are to well-formed
OCaml blocks.
Having Is_in_value_area always true in no-naked-pointers mode
achieves the same result as the `#ifndef NO_NAKED_POINTERS` removed here.
In weak.c there are two tests determining whether a value is a custom
block or not.  The original code, using Is_in_heap_or_young, would
conclude "not a custom block" for statically-allocated custom blocks
like int64 literals.

In weak.h there is one test Is_in_heap_or_young (child) that guards
1- a test for forward blocks
2- a test for white blocks.

Both guarded tests are false for a statically-allocated block:
- test 1 because lazy values are not statically allocated normally
- test 2 because statically-allocated blocks must have black headers.

Hence it makes no difference to test Is_in_value_area (child).
This is in preparation for the complete removal of the page table.

Without the page table, there is no way to distinguish pointers into
the heaps (minor or major) from pointers outside the heaps,
e.g. structured constants statically allocated by ocamlopt.

Hence, the difference between Is_in_heap_or_young and Is_in_value_area
(behavior on static data) disappears, and both should always return true.

All uses of Is_in_heap_or_young but one are in assertions, which
therefore become useless, but not wrong.

The one use inside the code is when registering a finalisation with
`Gc.finalise`.  Today, `Invalid_argument` is raised if the value to be
finalised is statically allocated.  With this commit and in
no-naked-pointers mode, `GC.finalise` succeeds, but the finalisation
function will never be called.
This is in preparation for the complete removal of the page table.

Without the page table, there is no way to distinguish pointers into
the major heap from pointers to structured constants statically
allocated by ocamlopt.  The only distinction we can make between pointers
is whether they point to the minor heap (Is_young) or not.

However, we cannot (yet) define Is_in_heap(v) as !Is_young(v), because
Is_in_heap with its current meaning is used
- in the compactor
- in many assertions.

Yet, there are a few places, mostly in the handling of ephemerons,
where Is_in_heap(v) can safely be replaced with !Is_young(v).
This is done in this commit.
This is another classification macro that requires the page table.
Its only uses in the whole OPAM universe is two of our own tests
(tests/asmcomp/is_static.ml and tests/lib-obj/reachable_words_np.ml)
which are now run only in naked-pointers mode.
@xavierleroy xavierleroy marked this pull request as ready for review June 23, 2020 08:39
@xavierleroy
Copy link
Copy Markdown
Contributor Author

Now that #9678 is merged, I could rebase on top of it, and add a commit that removes Is_in_static_data and its uses in the test suite in no-naked-pointers mode. This PR is ready for another review (if desired) then for merging.

@jhjourdan
Copy link
Copy Markdown
Contributor

This is still good for merging, as far as I can see.

@xavierleroy
Copy link
Copy Markdown
Contributor Author

Thanks, let's merge then!

@xavierleroy xavierleroy merged commit 83c4909 into ocaml:trunk Jun 23, 2020
@xavierleroy xavierleroy deleted the less-page-table branch June 23, 2020 14:36
Copy link
Copy Markdown
Member

@damiendoligez damiendoligez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you've merged while I was reviewing. Here's a small nitpick anyway.

h = caml_hash_mix_intnat(h, v);
num--;
}
#ifndef NO_NAKED_POINTERS
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is temporarily reintroducing a page-table check into the NNP runtime. You then remove it in commit 8c06218.
I suggest that you move that commit before this one, and change the commit message (of 8c06218) because it wrongly claims that there is only one use of Is_in_value_area in finalise.c.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I merged during your review, didn't know you were on it.

At some point Github was displayign the commits in the wrong order.

I think we're OK here: first 6e9d39b defines Is_in_value_area to 1 in no-naked-pointers mode, then 247c1da removes the #ifdef above.

it wrongly claims that there is only one use of Is_in_value_area in finalise.c

I'm not seeing this claim in the commit messages.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And now you can turn your attention to #9704 :-)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it wrongly claims that there is only one use of Is_in_value_area in finalise.c.

I see you meant Is_in_heap_or_young. And I meant only one use outside assertions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants