Skip to content

Naked pointers and the bytecode interpreter, alternative approach#9687

Closed
xavierleroy wants to merge 5 commits intoocaml:trunkfrom
xavierleroy:bytecode-nnp-alt
Closed

Naked pointers and the bytecode interpreter, alternative approach#9687
xavierleroy wants to merge 5 commits intoocaml:trunkfrom
xavierleroy:bytecode-nnp-alt

Conversation

@xavierleroy
Copy link
Copy Markdown
Contributor

This is an alternative to #9680 (please see for context) where code pointers in the stack of the bytecode interpreter follow approach 1a (encapsulation by setting the low bit to 1) instead of 2b (special tests at scanning time).

Rather than storing a pointer to the previous frame in the Trap_link
field of the current frame, store the distance (pointer difference)
between the current frame and the previous frame, tagged as an OCaml
integer.

Using a tagged integer instead of a raw pointer means fever problems
later with strict no-naked-pointer support.

Using a distance rather than an absolute address simplifies
the code that resizes the stack.
… of aligned naked pointers

In "naked pointers" mode, these are no-operations.

In "no naked pointers" mode, the pointer (assumed 2-aligned) is
disguised as an OCaml integer by setting its low bit to 1.
Use the new Val_foreign_ptr and Ptr_foreign_val conversions, which are
no-ops in "naked pointers" mode, but tag the code pointers like OCaml
integers in "no naked pointers" mode.
Earlier, nonaked-pointer mode was effective only in native code.
…ge table

`!Is_in_value_area(pc)` is always false if we turn the page table off.
A better check would be `caml_find_code_fragment_by_pc(pc) != NULL`,
but I feel this is too costly even for the debug mode of the interpreter.
Copy link
Copy Markdown
Contributor

@jhjourdan jhjourdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, this version may produce incorrect callstacks (see comments on the code) an should therefore be avoided.

/* Pointers outside the OCaml heap that are used as values.
(E.g. code pointers.) Must be 2-aligned. */

Caml_inline value Val_foreign_ptr(void * p)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for using a different implementation in naked pointer mode and without this mode?

I don't think there should be compatibility problems: there should not be any extern libraries which directly reads entries in the bytecode interpreter stack.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it would be useful for performance comparison purposes to preserve the current implementation (using unmodified naked pointers) in "naked pointers" mode. Eventually, we would keep only the "+1 / -1" implementation.

Comment on lines +287 to 297
/* Code pointers in the stack are tagged 1 in NO_NAKED_POINTER mode
and 0 otherwise */
#ifdef NO_NAKED_POINTERS
if (! Is_long(*sp)) continue;
#else
if (Is_long(*sp)) continue;
p = (code_t) *sp;
#endif
p = (code_t) Foreign_ptr_val(*sp);
if (Caml_state->backtrace_pos >= BACKTRACE_BUFFER_SIZE) break;
if (find_debug_info(p) != NULL)
Caml_state->backtrace_buffer[Caml_state->backtrace_pos++] = p;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In no naked pointers mode, how can we be sure that p is not an integer value which turns out to have the same value than some bytecode instruction?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent observation, thank you. This is a serious problem with this approach.

while (sp < Caml_state->stack_high) {
if (sp == tr) {
tr = tr + Long_val(Trap_link_offset(tr));
sp += 4;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior is different from the previous one. Could you document which stack cells you are ignoring here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got lost among the * so I tried to write equivalent code that would be clearer to me. (Basically I prefer in-out parameters to by-reference parameters.) Perhaps I failed. What did I change?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to say that I don't find any of the two versions very clear. This is probably because the problem to solve itself is not particularly easy.

Anyway, in the original version, we did not increment any pointer by 4, so I am surprised to see this in this version.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have commented. The idea was to skip the whole trap frame when we hit it. A trap frame is 4 words on the stack, and no code pointer can appear there.

#endif
p = (code_t) Foreign_ptr_val(v);
if (find_debug_info(p) != NULL) {
*sp_inout = sp; *trsp_inout = tr; return p;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark as above. p could actually be an integer stored in the stack.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted.

@jhjourdan
Copy link
Copy Markdown
Contributor

There is an assert failure in debug mode reported by CI, so there is another bug lurking somewhere.

@xavierleroy
Copy link
Copy Markdown
Contributor Author

There is an assert failure in debug mode reported by CI, so there is another bug lurking somewhere.

There remains debug assertions that are not compatible with no-naked-pointers. But, yes, I'll check that. Unless we decide to drop this approach immediately.

@jhjourdan
Copy link
Copy Markdown
Contributor

Well, except if you find a fix for the callstack problem, I'm afraid we will need to drop it, indeed.

@gasche
Copy link
Copy Markdown
Member

gasche commented Jun 16, 2020

Note: in the taxonomy of #9680 we could combine 1a (as done here) with 2a (stack frame metadata), so that the stack remains a valid OCaml value (delimcc is happy) and we have precise stack types. This also avoids the performance worries with approach 2b. It sounds like more work, but it also goes in the same direction as the closure representation (better metadata to avoid dynamic checks).

@xavierleroy
Copy link
Copy Markdown
Contributor Author

The "metadata" approach is a lot of work, at least if we want to follow the approach of the native-code compiler (with the statically-generated frame descriptors). I don't think the bytecode interpreter deserves that much effort. Also, if you have metadata (2a) you don't need 1a (encoding of code pointers).

@gasche
Copy link
Copy Markdown
Member

gasche commented Jun 16, 2020

Also, if you have metadata (2a) you don't need 1a (encoding of code pointers).

The point of also doing (1a) is to make the byte stack a valid array of OCaml values, to help delimcc -- as you pointed out. This being said, maybe just sticking an Abstract_tag on the stack copy would be fine. (I see interesting uses to the feature of being able to copy stack fragments on the heap, for example for algebraic effects, but we also want them to work with the native runtime, so it needs a more general solution than a bytecode-specific approach anyway.)

@xavierleroy
Copy link
Copy Markdown
Contributor Author

This being said, maybe just sticking an Abstract_tag on the stack copy would be fine

Aaaaargh! The stack and its copy contain pointers into the heap that the GC must follow!!

One solution is to have two blocks for the copy of the stack, one tagged 0 and containing the values, the other tagged Abstract and containing the code pointers.

A variant is to have only one block, tagged 0, with code pointers being encapsulated as integers, and a separate array giving the positions of the code pointers in this block, so that they can be decapsulated correctly.

Yet another solution (which delimcc uses for natively-compiled OCaml code) is to copy to a malloc()-ed block (or just a bigarray) and register a special stack scanning action with the OCaml GC. (This can be done with hooks initially put there for the systhreads library.)

@xavierleroy
Copy link
Copy Markdown
Contributor Author

I'm closing this PR because this is not a viable alternative to #9680 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants