The end of the page table is near#9698
Conversation
jhjourdan
left a comment
There was a problem hiding this comment.
The PR looks good to me. I have left a few nitpicks here and there.
runtime/caml/address_class.h
Outdated
| #ifdef NO_NAKED_POINTERS | ||
| #define Is_in_value_area(a) 1 | ||
| #else | ||
| #define Is_in_value_area(a) \ | ||
| (Classify_addr(a) & (In_heap | In_young | In_static_data)) | ||
| #endif |
There was a problem hiding this comment.
This change is, I think, less harmless than it seems. As is more clear in the following commits, it actually changes the behavior of hashing and comparison when it comes to out-of-heap values (such as black-colored blocks allocated by C libraries outside of the heap).
Even though I think this change in behavior is fine, it should be well advertised in (at least) a Changes entry.
There was a problem hiding this comment.
It is correct that hashing and comparison now work differently over naked pointers in classic mode and the corresponding out-of-heap pointers-with-a-black-header in no-naked-pointers heap. (That was already the case before this PR.) OK for documenting it in a Changes entry or elsewhere, but quite frankly the previous behavior wasn't terribly useful to begin with.
| ephemeron_again: | ||
| if (child != caml_ephe_none | ||
| && Is_block (child) && Is_in_heap_or_young (child)){ | ||
| && Is_block (child) && Is_in_value_area (child)){ |
There was a problem hiding this comment.
Contrarily to what is explained in the commit comment, this is also used in the case the block is white. Fortunately, this is fine, since an out-of-heap pointer should point to a black block.
However:
- Could you please update the commit comment (and remove the "to be merged", which does not seem to be relevant)
- This makes the property that out-of-heap pointers should be black more important than I have read somewhere else recently (the argument was that the GC would mark it as black during its first pass if it is left white). Is this clear in the docs (I have not checked)?
There was a problem hiding this comment.
I pushed a more detailed comment.
As to out-of-heap pointers having black headers: yes, this turns out to be a crucial invariant, here and elsewhere, and not just an optimization to prevent useless marking. The "detect naked pointers" variant of OCaml 4.10 that was recently released enforces this invariant.
As to documentation: currently, the "interfacing with C" chapter of the reference manual says that all out-of-heap pointers must be encapsulated (boxing or integer-like tagging), so it is not the place to document the black-header trick.
runtime/caml/address_class.h
Outdated
| #define Is_in_code_area(pc) \ | ||
| ( ((char *)(pc) >= caml_code_area_start && \ | ||
| (char *)(pc) <= caml_code_area_end) \ | ||
| || (Classify_addr(pc) & In_code_area) ) | ||
|
|
||
| #endif | ||
|
|
||
| #define Is_in_static_data(a) (Classify_addr(a) & In_static_data) |
There was a problem hiding this comment.
Perhaps Is_in_static_data could also be made part of the #else.
Do you have a plan for this one. Is the plan to get rid of the compactor, since the best-fit mode seems to prevent fragmentation enough? |
@damiendoligez is thinking of some changes to the compactor that would make the page table unnecessary, under the assumption that statically-allocated blocks have black headers. It is also possible to rebuild the page table or an equivalent data structure that keeps track of major heap areas at the beginning of every compaction. |
dcd4c62 to
a0f3d95
Compare
Since there are no naked pointers, all pointers are to well-formed OCaml blocks.
Having Is_in_value_area always true in no-naked-pointers mode achieves the same result as the `#ifndef NO_NAKED_POINTERS` removed here.
In weak.c there are two tests determining whether a value is a custom block or not. The original code, using Is_in_heap_or_young, would conclude "not a custom block" for statically-allocated custom blocks like int64 literals. In weak.h there is one test Is_in_heap_or_young (child) that guards 1- a test for forward blocks 2- a test for white blocks. Both guarded tests are false for a statically-allocated block: - test 1 because lazy values are not statically allocated normally - test 2 because statically-allocated blocks must have black headers. Hence it makes no difference to test Is_in_value_area (child).
This is in preparation for the complete removal of the page table. Without the page table, there is no way to distinguish pointers into the heaps (minor or major) from pointers outside the heaps, e.g. structured constants statically allocated by ocamlopt. Hence, the difference between Is_in_heap_or_young and Is_in_value_area (behavior on static data) disappears, and both should always return true. All uses of Is_in_heap_or_young but one are in assertions, which therefore become useless, but not wrong. The one use inside the code is when registering a finalisation with `Gc.finalise`. Today, `Invalid_argument` is raised if the value to be finalised is statically allocated. With this commit and in no-naked-pointers mode, `GC.finalise` succeeds, but the finalisation function will never be called.
This is in preparation for the complete removal of the page table. Without the page table, there is no way to distinguish pointers into the major heap from pointers to structured constants statically allocated by ocamlopt. The only distinction we can make between pointers is whether they point to the minor heap (Is_young) or not. However, we cannot (yet) define Is_in_heap(v) as !Is_young(v), because Is_in_heap with its current meaning is used - in the compactor - in many assertions. Yet, there are a few places, mostly in the handling of ephemerons, where Is_in_heap(v) can safely be replaced with !Is_young(v). This is done in this commit.
This is another classification macro that requires the page table. Its only uses in the whole OPAM universe is two of our own tests (tests/asmcomp/is_static.ml and tests/lib-obj/reachable_words_np.ml) which are now run only in naked-pointers mode.
a0f3d95 to
40a55c7
Compare
|
Now that #9678 is merged, I could rebase on top of it, and add a commit that removes |
|
This is still good for merging, as far as I can see. |
|
Thanks, let's merge then! |
damiendoligez
left a comment
There was a problem hiding this comment.
I see you've merged while I was reviewing. Here's a small nitpick anyway.
| h = caml_hash_mix_intnat(h, v); | ||
| num--; | ||
| } | ||
| #ifndef NO_NAKED_POINTERS |
There was a problem hiding this comment.
This one is temporarily reintroducing a page-table check into the NNP runtime. You then remove it in commit 8c06218.
I suggest that you move that commit before this one, and change the commit message (of 8c06218) because it wrongly claims that there is only one use of Is_in_value_area in finalise.c.
There was a problem hiding this comment.
Sorry I merged during your review, didn't know you were on it.
At some point Github was displayign the commits in the wrong order.
I think we're OK here: first 6e9d39b defines Is_in_value_area to 1 in no-naked-pointers mode, then 247c1da removes the #ifdef above.
it wrongly claims that there is only one use of Is_in_value_area in finalise.c
I'm not seeing this claim in the commit messages.
There was a problem hiding this comment.
And now you can turn your attention to #9704 :-)
There was a problem hiding this comment.
it wrongly claims that there is only one use of Is_in_value_area in finalise.c.
I see you meant Is_in_heap_or_young. And I meant only one use outside assertions.
(Let's see if a click bait title attracts more reviews...)
This pull request eliminates some more uses of the page table in the runtime system when built in no-naked-pointers mode, in preparation for Multicore OCaml, which has no page table.
In the end, only one use remains in the code of the runtime system (file compact.c), plus a number of uses in assertions in debug mode.
The 5 commits in this PR are well documented in their commit messages. Before reviewing, it can be helpful to re-read the description of the pointer classification macros in https://github.com/ocaml/ocaml/blob/trunk/runtime/caml/address_class.h.
The first commit is the continuation of #9680 by other means.
#9678 must be merged before this one, hence the "draft" status.