Skip to content

Finding the lookahead token#5

Merged
gasche merged 3 commits intogasche:menhirfrom
let-def:menhir
Jul 24, 2018
Merged

Finding the lookahead token#5
gasche merged 3 commits intogasche:menhirfrom
let-def:menhir

Conversation

@let-def
Copy link
Copy Markdown
Collaborator

@let-def let-def commented Jul 23, 2018

A potential solution. I don't know if it is as good as we would like it to be, but the purpose of this PR is to discuss and find a satisfying solution :).

More generally I suspect the menhir parser has not been tested on failing files (only the successful paths have been covered, not the erroneous ones).

@gasche gasche merged commit 230720c into gasche:menhir Jul 24, 2018
@gasche
Copy link
Copy Markdown
Owner

gasche commented Jul 24, 2018

Very nice, thank you! For the last commit, we should in theory be able to do a bit better than that, by recording the token stream from one parser and keeping the other one in synch, but this seems difficult and not worth it. Your approach will help reveal toplevel-related issues with the Menhir parser, so it's good.

gasche pushed a commit that referenced this pull request Mar 28, 2022
gasche pushed a commit that referenced this pull request Jul 13, 2024
The toplevel printer detects cycles by keeping a hashtable of values
that it has already traversed.

However, some OCaml runtime types (at least bigarrays) may be
partially uninitialized, and hashing them at arbitrary program points
may read uninitialized memory. In particular, the OCaml testsuite
fails when running with a memory-sanitizer enabled, as bigarray
printing results in reads to uninitialized memory:

```
==133712==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x4e6d11 in caml_ba_hash /var/home/edwin/git/ocaml/runtime/bigarray.c:486:45
    #1 0x52474a in caml_hash /var/home/edwin/git/ocaml/runtime/hash.c:251:35
    #2 0x599ebf in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1065:14
    #3 0x5a909a in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9
    #4 0x540ccb in main /var/home/edwin/git/ocaml/runtime/main.c:37:3
    #5 0x7f0910abb087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
    #6 0x7f0910abb14a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
    #7 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37)

  Uninitialized value was created by a heap allocation
    #0 0x47d306 in malloc (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x47d306) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37)
    #1 0x4e7960 in caml_ba_alloc /var/home/edwin/git/ocaml/runtime/bigarray.c:246:12
    #2 0x4e801f in caml_ba_create /var/home/edwin/git/ocaml/runtime/bigarray.c:673:10
    #3 0x59b8fc in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1058:14
    #4 0x5a909a in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9
    #5 0x540ccb in main /var/home/edwin/git/ocaml/runtime/main.c:37:3
    #6 0x7f0910abb087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
    #7 0x7f0910abb14a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
    #8 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37)

SUMMARY: MemorySanitizer: use-of-uninitialized-value /var/home/edwin/git/ocaml/runtime/bigarray.c:486:45 in caml_ba_hash
```

The only use of hashing in genprintval is to avoid cycles, that is, it
is only useful for OCaml values that contain other OCaml values
(including possibly themselves). Bigarrays cannot introduce cycles,
and they are always printed as "<abstr>" anyway.

The present commit proposes to be more conservative in which values
are hashed by the cycle detector to avoid this issue: we skip hashing
any value with tag above No_scan_tag -- which may not contain any
OCaml values.

Suggested-by: Gabriel Scherer <gabriel.scherer@gmail.com>
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
gasche added a commit that referenced this pull request Jul 30, 2024
…l#13294)

The toplevel printer detects cycles by keeping a hashtable of values
that it has already traversed.

However, some OCaml runtime types (at least bigarrays) may be
partially uninitialized, and hashing them at arbitrary program points
may read uninitialized memory. In particular, the OCaml testsuite
fails when running with a memory-sanitizer enabled, as bigarray
printing results in reads to uninitialized memory:

```
==133712==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x4e6d11 in caml_ba_hash /var/home/edwin/git/ocaml/runtime/bigarray.c:486:45
    #1 0x52474a in caml_hash /var/home/edwin/git/ocaml/runtime/hash.c:251:35
    #2 0x599ebf in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1065:14
    #3 0x5a909a in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9
    #4 0x540ccb in main /var/home/edwin/git/ocaml/runtime/main.c:37:3
    #5 0x7f0910abb087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
    #6 0x7f0910abb14a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
    #7 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37)

  Uninitialized value was created by a heap allocation
    #0 0x47d306 in malloc (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x47d306) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37)
    #1 0x4e7960 in caml_ba_alloc /var/home/edwin/git/ocaml/runtime/bigarray.c:246:12
    #2 0x4e801f in caml_ba_create /var/home/edwin/git/ocaml/runtime/bigarray.c:673:10
    #3 0x59b8fc in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1058:14
    #4 0x5a909a in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9
    #5 0x540ccb in main /var/home/edwin/git/ocaml/runtime/main.c:37:3
    #6 0x7f0910abb087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
    #7 0x7f0910abb14a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
    #8 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37)

SUMMARY: MemorySanitizer: use-of-uninitialized-value /var/home/edwin/git/ocaml/runtime/bigarray.c:486:45 in caml_ba_hash
```

The only use of hashing in genprintval is to avoid cycles, that is, it
is only useful for OCaml values that contain other OCaml values
(including possibly themselves). Bigarrays cannot introduce cycles,
and they are always printed as "<abstr>" anyway.

The present commit proposes to be more conservative in which values
are hashed by the cycle detector to avoid this issue: we skip hashing
any value with tag above No_scan_tag -- which may not contain any
OCaml values.

Suggested-by: Gabriel Scherer <gabriel.scherer@gmail.com>

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Co-authored-by: Edwin Török <edwin.torok@cloud.com>
gasche pushed a commit that referenced this pull request Jul 29, 2025
Move the orphaned ephemerons GC colour check inside the barrier.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants