Skip to content

Add a naked pointers dynamic checker#9947

Closed
abbysmal wants to merge 11 commits intoocaml:trunkfrom
abbysmal:engil/nnp_checker
Closed

Add a naked pointers dynamic checker#9947
abbysmal wants to merge 11 commits intoocaml:trunkfrom
abbysmal:engil/nnp_checker

Conversation

@abbysmal
Copy link
Copy Markdown
Contributor

This pull requests is a rebased version of the naked pointer checker, as discussed in #9534, for introduction in the runtime.
It remains mostly the same as the code that was developed in the RFC, with a few adjustments:

  • Introduction of a configure time option to enable it (--enable-naked-pointers-checker)
  • The major_gc trigger at the end of the program (to force a gc and run the checker even on programs that may terminate before having the opportunity to do so) was reworked as well.
  • checking_pointer_pc is now hidden behind an ifdef when the checker is not built.

While the checker operates mostly the same, heavy changes were brought to the GC, so I could observe different behaviors between this branch and the previous implementation.

One particular thing I observed (and I'm unsure about the correct resolution of this issue, if it qualifies as such), is that now, a few test programs now segfaults in the no-naked-pointers mode, and as such as well in the naked-pointer-checker mode, where they used not to.
Two simple examples are

I'm unsure what is the correct course of action with this observation, and would be happy if anyone had an opinion on the matter.
Maybe there should be a specific adjustment in no-naked-pointer mode I am missing, related to this recent change.
I can provided further data on these specific points if it helps.

I added a temporary Changes entry, I quoted the names and reviewers that contributed to the RFC at #9534, but since this PR definitely could use more proofreading (with the recent changes to the GC), I did not meant this was reviewed already. :)

Have a nice day,
Enguerrand.

@xavierleroy
Copy link
Copy Markdown
Contributor

Thanks for getting the ball rolling. Just to make sure we're on the same page:

One particular thing I observed (and I'm unsure about the correct resolution of this issue, if it qualifies as such), is that now, a few test programs now segfaults in the no-naked-pointers mode, and as such as well in the naked-pointer-checker mode, where they used not to.

I would expect the naked pointers detector to run in a naked-pointer-tolerant run-time system, so that naked pointers won't crash the runtime immediately and can be flagged and reported at the next major GC. And, yes, no-naked-pointers mode is getting less and less tolerant in the working sources.

@abbysmal
Copy link
Copy Markdown
Contributor Author

abbysmal commented Sep 29, 2020

I would expect the naked pointers detector to run in a naked-pointer-tolerant run-time system, so that naked pointers won't crash the runtime immediately and can be flagged and reported at the next major GC. And, yes, no-naked-pointers mode is getting less and less tolerant in the working sources.

The previous code was running inside the no-naked-pointer runtime, it would make sense however to have the checks performed outside indeed, I can change this if this is the right thing to do.


#ifdef NAKED_POINTERS_CHECKER
Caml_state->checking_pointer_pc = NULL;
#endif
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used under Windows?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it is not, I will guard it for _WIN32 as well, thank you.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for the definition in domain_state.tbl.

Changes Outdated
- #9534: Introduce a naked pointers checker mode to the runtime,
to print information on relevant values when they are
inspected by the garbage collector.
(KC Sivaramakrishnan, with contributions by Stephen Dolan and David Allsopp,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add yourself here @Engil. And to be consistent with the other entries in the file, perhaps do:

Enguerrand Decorne, KC Sivaramakrishnan, Xavier Leroy, Stephen Dolan, David Allsopp, Nicolás Ojeda Bär, review by ...

Xavier contributed the inline assembly and Nicolás kick-started the work on Windows support.

@abbysmal
Copy link
Copy Markdown
Contributor Author

While implementing the move of the checker out of the no-naked-pointer mode, I noticed that the example program provided on #9534 does segfault currently on trunk even with the "default" runtime (so with naked pointers on.).
This program used to work on 4.11, 4.10 and 4.09.

Program received signal SIGSEGV, Segmentation fault.
0x00005562cf2e74e9 in mark_stack_push (work=0x0, offset=0, block=140569830424568, stk=<optimized out>) at major_gc.c:244
warning: Source file is more recent than executable.
244	    if (Is_block(v) && !Is_black_val(v))
(rr) bt
#0  0x00005562cf2e74e9 in mark_stack_push (work=0x0, offset=0, block=140569830424568, stk=<optimized out>) at major_gc.c:244
#1  caml_darken (v=140569830424568, p=<optimized out>) at major_gc.c:301
#2  0x00005562cf2e5291 in caml_darken_all_roots_slice (work=work@entry=9223372036854775804) at roots_nat.c:375
#3  0x00005562cf2e6df0 in mark_slice (work=9223372036854775804, work@entry=9223372036854775807) at major_gc.c:609
#4  0x00005562cf2e7e40 in caml_finish_major_cycle () at major_gc.c:967
#5  0x00005562cf2f7240 in caml_gc_full_major (v=<optimized out>) at gc_ctrl.c:590
#6  0x00005562cf2c6d51 in camlTest__do_gc_216 () at roots_nat.c:159
#7  0x00005562cf2c6fc4 in camlTest__entry () at roots_nat.c:159
#8  0x00005562cf2c4869 in caml_program () at roots_nat.c:159
#9  0x00005562cf30160c in caml_start_program ()
#10 0x00005562cf301e44 in caml_startup_common (argv=0x7ffc345c8258, pooling=<optimized out>, pooling@entry=0) at startup_nat.c:164
#11 0x00005562cf301e8f in caml_startup_exn (argv=<optimized out>) at startup_nat.c:174
#12 caml_startup (argv=<optimized out>) at startup_nat.c:174
#13 0x00005562cf2c4662 in main (argc=<optimized out>, argv=<optimized out>) at main.c:41

Should I fill in a separate issue for this?

@xavierleroy
Copy link
Copy Markdown
Contributor

Should I fill in a separate issue for this?

Yes,please. It's entirely possible that the new two-color marking phase is missing a case or two for naked pointers. Should be easy to fix.

@kayceesrk
Copy link
Copy Markdown
Contributor

kayceesrk commented Oct 1, 2020

@Engil I've made #9951 to fix #9950. When that is approved, you'll need a safe_load loading the header to check if it is already marked here https://github.com/kayceesrk/ocaml/blob/fix_mark_stack_push_opt/runtime/major_gc.c#L246.

@kayceesrk
Copy link
Copy Markdown
Contributor

#9951 is merged. We do not load the header anymore and hence, safe_load is not necessary.

@xavierleroy
Copy link
Copy Markdown
Contributor

In order to make progress faster, I "forked" this PR at #9956 to get something that seems to work and has a test suite.

@abbysmal
Copy link
Copy Markdown
Contributor Author

abbysmal commented Oct 5, 2020

Sure, thank you very much for this. This PR can be closed I believe?

@xavierleroy
Copy link
Copy Markdown
Contributor

This was merged as part of #9956, commit af48d9f.

@xavierleroy xavierleroy closed this Oct 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants