Naked pointers and the bytecode interpreter, alternative approach#9687
Naked pointers and the bytecode interpreter, alternative approach#9687xavierleroy wants to merge 5 commits intoocaml:trunkfrom
Conversation
Rather than storing a pointer to the previous frame in the Trap_link field of the current frame, store the distance (pointer difference) between the current frame and the previous frame, tagged as an OCaml integer. Using a tagged integer instead of a raw pointer means fever problems later with strict no-naked-pointer support. Using a distance rather than an absolute address simplifies the code that resizes the stack.
… of aligned naked pointers In "naked pointers" mode, these are no-operations. In "no naked pointers" mode, the pointer (assumed 2-aligned) is disguised as an OCaml integer by setting its low bit to 1.
Use the new Val_foreign_ptr and Ptr_foreign_val conversions, which are no-ops in "naked pointers" mode, but tag the code pointers like OCaml integers in "no naked pointers" mode.
Earlier, nonaked-pointer mode was effective only in native code.
…ge table `!Is_in_value_area(pc)` is always false if we turn the page table off. A better check would be `caml_find_code_fragment_by_pc(pc) != NULL`, but I feel this is too costly even for the debug mode of the interpreter.
jhjourdan
left a comment
There was a problem hiding this comment.
As far as I understand, this version may produce incorrect callstacks (see comments on the code) an should therefore be avoided.
| /* Pointers outside the OCaml heap that are used as values. | ||
| (E.g. code pointers.) Must be 2-aligned. */ | ||
|
|
||
| Caml_inline value Val_foreign_ptr(void * p) |
There was a problem hiding this comment.
Is there a reason for using a different implementation in naked pointer mode and without this mode?
I don't think there should be compatibility problems: there should not be any extern libraries which directly reads entries in the bytecode interpreter stack.
There was a problem hiding this comment.
I thought it would be useful for performance comparison purposes to preserve the current implementation (using unmodified naked pointers) in "naked pointers" mode. Eventually, we would keep only the "+1 / -1" implementation.
| /* Code pointers in the stack are tagged 1 in NO_NAKED_POINTER mode | ||
| and 0 otherwise */ | ||
| #ifdef NO_NAKED_POINTERS | ||
| if (! Is_long(*sp)) continue; | ||
| #else | ||
| if (Is_long(*sp)) continue; | ||
| p = (code_t) *sp; | ||
| #endif | ||
| p = (code_t) Foreign_ptr_val(*sp); | ||
| if (Caml_state->backtrace_pos >= BACKTRACE_BUFFER_SIZE) break; | ||
| if (find_debug_info(p) != NULL) | ||
| Caml_state->backtrace_buffer[Caml_state->backtrace_pos++] = p; |
There was a problem hiding this comment.
In no naked pointers mode, how can we be sure that p is not an integer value which turns out to have the same value than some bytecode instruction?
There was a problem hiding this comment.
Excellent observation, thank you. This is a serious problem with this approach.
| while (sp < Caml_state->stack_high) { | ||
| if (sp == tr) { | ||
| tr = tr + Long_val(Trap_link_offset(tr)); | ||
| sp += 4; |
There was a problem hiding this comment.
The behavior is different from the previous one. Could you document which stack cells you are ignoring here?
There was a problem hiding this comment.
I got lost among the * so I tried to write equivalent code that would be clearer to me. (Basically I prefer in-out parameters to by-reference parameters.) Perhaps I failed. What did I change?
There was a problem hiding this comment.
I have to say that I don't find any of the two versions very clear. This is probably because the problem to solve itself is not particularly easy.
Anyway, in the original version, we did not increment any pointer by 4, so I am surprised to see this in this version.
There was a problem hiding this comment.
I should have commented. The idea was to skip the whole trap frame when we hit it. A trap frame is 4 words on the stack, and no code pointer can appear there.
| #endif | ||
| p = (code_t) Foreign_ptr_val(v); | ||
| if (find_debug_info(p) != NULL) { | ||
| *sp_inout = sp; *trsp_inout = tr; return p; |
There was a problem hiding this comment.
Same remark as above. p could actually be an integer stored in the stack.
|
There is an assert failure in debug mode reported by CI, so there is another bug lurking somewhere. |
There remains debug assertions that are not compatible with no-naked-pointers. But, yes, I'll check that. Unless we decide to drop this approach immediately. |
|
Well, except if you find a fix for the callstack problem, I'm afraid we will need to drop it, indeed. |
|
Note: in the taxonomy of #9680 we could combine 1a (as done here) with 2a (stack frame metadata), so that the stack remains a valid OCaml value (delimcc is happy) and we have precise stack types. This also avoids the performance worries with approach 2b. It sounds like more work, but it also goes in the same direction as the closure representation (better metadata to avoid dynamic checks). |
|
The "metadata" approach is a lot of work, at least if we want to follow the approach of the native-code compiler (with the statically-generated frame descriptors). I don't think the bytecode interpreter deserves that much effort. Also, if you have metadata (2a) you don't need 1a (encoding of code pointers). |
The point of also doing (1a) is to make the byte stack a valid array of OCaml values, to help delimcc -- as you pointed out. This being said, maybe just sticking an Abstract_tag on the stack copy would be fine. (I see interesting uses to the feature of being able to copy stack fragments on the heap, for example for algebraic effects, but we also want them to work with the native runtime, so it needs a more general solution than a bytecode-specific approach anyway.) |
Aaaaargh! The stack and its copy contain pointers into the heap that the GC must follow!! One solution is to have two blocks for the copy of the stack, one tagged 0 and containing the values, the other tagged Abstract and containing the code pointers. A variant is to have only one block, tagged 0, with code pointers being encapsulated as integers, and a separate array giving the positions of the code pointers in this block, so that they can be decapsulated correctly. Yet another solution (which |
|
I'm closing this PR because this is not a viable alternative to #9680 . |
This is an alternative to #9680 (please see for context) where code pointers in the stack of the bytecode interpreter follow approach 1a (encapsulation by setting the low bit to 1) instead of 2b (special tests at scanning time).