Conversation
This was done by using Makefile.menhir as follows:
touch .depend.menhir
make -f Makefile.menhir import-menhirLib
Uses the new $symbolstartpos feature of Menhir to get locations identical to the OCamlYacc ones.
The compiler calls several parsers and fails if the parsed ASTs do not
coincide. You can set the environment variable OCAML_PARSERS to select
the parser(s) to use, as a space-separated list of parser names.
OCAML_PARSERS="yacc"
OCAML_PARSERS="menhir"
OCAML_PARSERS="yacc menhir"
45a6d02 to
8f34901
Compare
|
Hi @nojb, I haven't had any time to look at what you did yet, but I agree: diverging from the yacc presentation is ok as long as we start from a version that (1) is close and (2) agrees on ASTs and locations, and that we preserve property (2) as we evolve the parser code. The only problem with the "diverging" part is that it will be harder / more painful to rebase onto newer OCaml versions, so if we push in this direction it's better to do it all the way and get something upstream. I can't promise I will have a lot of time in the next few days/weeks to help you on that, but I will have enough time to review the parts you do, maybe help a bit along the way, and help a discussion on upstreaming when we feel ready. (I have to remember how to re-do the correctness tests and also the performance overhead tests -- I guess that |
3600a23 to
c92be1f
Compare
|
This work has now been integrated into ocaml#292. Thanks! |
The toplevel printer detects cycles by keeping a hashtable of values
that it has already traversed.
However, some OCaml runtime types (at least bigarrays) may be
partially uninitialized, and hashing them at arbitrary program points
may read uninitialized memory. In particular, the OCaml testsuite
fails when running with a memory-sanitizer enabled, as bigarray
printing results in reads to uninitialized memory:
```
==133712==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x4e6d11 in caml_ba_hash /var/home/edwin/git/ocaml/runtime/bigarray.c:486:45
#1 0x52474a in caml_hash /var/home/edwin/git/ocaml/runtime/hash.c:251:35
#2 0x599ebf in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1065:14
#3 0x5a909a in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9
#4 0x540ccb in main /var/home/edwin/git/ocaml/runtime/main.c:37:3
#5 0x7f0910abb087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
#6 0x7f0910abb14a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
#7 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37)
Uninitialized value was created by a heap allocation
#0 0x47d306 in malloc (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x47d306) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37)
#1 0x4e7960 in caml_ba_alloc /var/home/edwin/git/ocaml/runtime/bigarray.c:246:12
#2 0x4e801f in caml_ba_create /var/home/edwin/git/ocaml/runtime/bigarray.c:673:10
#3 0x59b8fc in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1058:14
#4 0x5a909a in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9
#5 0x540ccb in main /var/home/edwin/git/ocaml/runtime/main.c:37:3
#6 0x7f0910abb087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
#7 0x7f0910abb14a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
#8 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37)
SUMMARY: MemorySanitizer: use-of-uninitialized-value /var/home/edwin/git/ocaml/runtime/bigarray.c:486:45 in caml_ba_hash
```
The only use of hashing in genprintval is to avoid cycles, that is, it
is only useful for OCaml values that contain other OCaml values
(including possibly themselves). Bigarrays cannot introduce cycles,
and they are always printed as "<abstr>" anyway.
The present commit proposes to be more conservative in which values
are hashed by the cycle detector to avoid this issue: we skip hashing
any value with tag above No_scan_tag -- which may not contain any
OCaml values.
Suggested-by: Gabriel Scherer <gabriel.scherer@gmail.com>
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
…l#13294) The toplevel printer detects cycles by keeping a hashtable of values that it has already traversed. However, some OCaml runtime types (at least bigarrays) may be partially uninitialized, and hashing them at arbitrary program points may read uninitialized memory. In particular, the OCaml testsuite fails when running with a memory-sanitizer enabled, as bigarray printing results in reads to uninitialized memory: ``` ==133712==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x4e6d11 in caml_ba_hash /var/home/edwin/git/ocaml/runtime/bigarray.c:486:45 #1 0x52474a in caml_hash /var/home/edwin/git/ocaml/runtime/hash.c:251:35 #2 0x599ebf in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1065:14 #3 0x5a909a in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9 #4 0x540ccb in main /var/home/edwin/git/ocaml/runtime/main.c:37:3 #5 0x7f0910abb087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) #6 0x7f0910abb14a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) #7 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37) Uninitialized value was created by a heap allocation #0 0x47d306 in malloc (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x47d306) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37) #1 0x4e7960 in caml_ba_alloc /var/home/edwin/git/ocaml/runtime/bigarray.c:246:12 #2 0x4e801f in caml_ba_create /var/home/edwin/git/ocaml/runtime/bigarray.c:673:10 #3 0x59b8fc in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1058:14 #4 0x5a909a in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9 #5 0x540ccb in main /var/home/edwin/git/ocaml/runtime/main.c:37:3 #6 0x7f0910abb087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) #7 0x7f0910abb14a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) #8 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37) SUMMARY: MemorySanitizer: use-of-uninitialized-value /var/home/edwin/git/ocaml/runtime/bigarray.c:486:45 in caml_ba_hash ``` The only use of hashing in genprintval is to avoid cycles, that is, it is only useful for OCaml values that contain other OCaml values (including possibly themselves). Bigarrays cannot introduce cycles, and they are always printed as "<abstr>" anyway. The present commit proposes to be more conservative in which values are hashed by the cycle detector to avoid this issue: we skip hashing any value with tag above No_scan_tag -- which may not contain any OCaml values. Suggested-by: Gabriel Scherer <gabriel.scherer@gmail.com> Signed-off-by: Edwin Török <edwin.torok@cloud.com> Co-authored-by: Edwin Török <edwin.torok@cloud.com>
Hi Gabriel,
I am giving your
menhirbranch a closer look. As a first step I rebased your branch ontrunk, the results are here. NB the last three "new" commits which adapt both parsers totrunkand promotes the updated menhir parser.make worldstill fails when comparing ASTs due to off-by-one locations in a couple of files, did not check if this is new after the rebase or not.cppstuff along the lines you suggested and see what we are left with. Personally I feel that if we produce identical ASTs over a sufficiently large sample of code (eg the full OPAM repository) I would be confident on the new parser even if the menhir/ocamlyacc grammar diverge in style.