Skip to content

runtime_events_consumer: fix uninitialized and out of bounds reads#13297

Merged
gasche merged 7 commits intoocaml:trunkfrom
edwintorok:private/edvint/msan-fix-only-0
Aug 1, 2024
Merged

runtime_events_consumer: fix uninitialized and out of bounds reads#13297
gasche merged 7 commits intoocaml:trunkfrom
edwintorok:private/edvint/msan-fix-only-0

Conversation

@edwintorok
Copy link
Copy Markdown
Contributor

@edwintorok edwintorok commented Jul 14, 2024

[As suggested in https://github.com//pull/13290#issuecomment-2226843600 I'm separating this fix from the MSAN CI changes]


runtime_events_consumer.c parses runtime events from a memory mapped file, created by another OCaml process.

There are 2 bugs here:

  • uninitialized value read in buf[2] for EV_RING_STOP events
  • SIGSEGV when parsing a runtime events ring that has been corrupted or truncated.

EV_RING_STOP is an event that doesn't have additional data, and has msg_length == 2. However runtime_events_consumer always read buf[2], which contained uninitialized data, which is potentially undefined behaviour in C.
Avoid this by introducing bounds check before accessing buf. Most messages have a minimum length of 2, except for EV_INTERNAL=0 used for padding, which can have a length of 1.
No messages can have a length of 0, because that'd result in an infinite loop.

While running a fuzzer with an MSAN (memory sanitizer) enabled build I've found some crashes due to missing bounds checks.
It'd be useful to guard against these, because:

  • the ring may have become corrupted due to bugs in the producer, and we should avoid a SIGSEGV in the consumer
  • the file storing the event ring may have become corrupted or truncated when transferred
  • it makes it difficult to use a fuzzer to find more uninitialized value reads, because the fuzzer keeps finding the crashes instead

Introduce additional bounds checks on all offsets and length in the metadata header. To prevent the header from being modified after the offsets have been validated, operate on a copy (the file header is not changed dynamically by the producer).
Also guard against multiplication overflow in the code, and in the bounds checks, e.g. restrict max_domains to Max_domains_max.

Some of these bugs were found by fuzzing with afl++ on an ASAN, and (separately) MSAN instrumented runtime, and some by reading the code after a deeper investigation was inspired by a review comment.

I've included a testcase that takes a known good event ring, and corrupts each offset in turn with various values. This reproduces most of the crashes without requiring the use of a fuzzer: they would crash on a regular ASAN sanitized build as performed by the CI currently.

I've run afl++ on the updated runtime, instrumented with ASAN and MSAN and it didn't find more crashes.
Notes on how to run the fuzzer can be found here

@edwintorok edwintorok force-pushed the private/edvint/msan-fix-only-0 branch from 2f95806 to 6a4efb3 Compare July 15, 2024 22:40
Copy link
Copy Markdown
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved on behalf of @dustanddreams. Thanks!

@gasche
Copy link
Copy Markdown
Member

gasche commented Jul 24, 2024

@edwintorok can you fix the conflict? Then this is good to merge.

edwintorok and others added 7 commits July 24, 2024 14:45
We'll want to validate the offsets in the metadata.
But if the metadata is writable by another process,
then it could get corrupted after we've already validated it.

Make a private copy of the metadata that will be useful for bounds
checking.

No functional change yet.

Co-authored-by: Miod Vallat <118974489+dustanddreams@users.noreply.github.com>
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
`max_domains * sizeof(uint64_t)` can overflow, and then
the for loop below would write beyond the end of `current_positions`.

All other bounds checks can be done in `read_poll`, but this one must be
done here to avoid potential memory corruption.

Add a testcase which takes a valid ring, and corrupts some fields in it.
For now it only corrupt the max_domains fields, but this will be
extended in followup commits.

Bug found by ASAN, and also reproducible with the included testcase.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
msg_length zero could cause us to loop infinitely.

All messages have at least a header and a timestamp,
except EV_INTERNAL (0), which is a padding event.

Extend the testcase by corrupting message lengths.

This also avoids some memory sanitizer warnings.

Rejecting all msg_length < 2 would cause a failure in
test_dropped_events.ml, because padding events would be incorrectly
rejected.

Suggested-by: Miod Vallat <miod@tarides.com>
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Protect against metadata header getting corrupted, and make a copy at
cursor open time.

Detect corrupted runtime events rings, check that:

* metadata, data and custom offsets are within file bounds
* record end is within file bounds
* masked offset is within bounds
* memcpy is within bounds
* ring_size_elements cannot be 0, because then the mask computation is
  not valid anymore

The bounds checks are performed as each entry is processed, to allow
processing the maximum number of uncorrupted entries before raising
exceptions.

Found by ASAN, and (mostly) reproducible with the included testcase.

Co-authored-by: Miod Vallat <118974489+dustanddreams@users.noreply.github.com>
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Found by -fsanitize=memory -fsanitize-memory-track-origins:
```
> ==102752==WARNING: MemorySanitizer: use-of-uninitialized-value
>     #0 0x7f2ba7fb4ea4 in caml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:496:18
>     #1 0x7f2ba7fbc016 in caml_ml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:1207:9
>     ocaml#2 0x59ba5c in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1058:14
>     ocaml#3 0x5a9220 in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9
>     ocaml#4 0x540d6b in main /var/home/edwin/git/ocaml/runtime/main.c:37:3
>     ocaml#5 0x7f2ba8120087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
>     ocaml#6 0x7f2ba812014a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
>     ocaml#7 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 617637580ee48eff08a2bce790e1667ad09f3b69)
>
>   Uninitialized value was stored to memory at
>     #0 0x7f2ba7fb4e9d in caml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:497:69
>     #1 0x7f2ba7fbc016 in caml_ml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:1207:9
>     ocaml#2 0x59ba5c in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1058:14
>     ocaml#3 0x5a9220 in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9
>     ocaml#4 0x540d6b in main /var/home/edwin/git/ocaml/runtime/main.c:37:3
>     ocaml#5 0x7f2ba8120087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
>     ocaml#6 0x7f2ba812014a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
>     ocaml#7 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 617637580ee48eff08a2bce790e1667ad09f3b69)
>
>   Uninitialized value was created by an allocation of 'buf' in the stack frame
>     #0 0x7f2ba7fb3dbc in caml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:402:7
>
```

This is in fact an EV_LIFECYCLE with EV_RING_STOP, which has 0
additional data, and thus msg_length 2:
```
runtime/runtime_events.c:      EV_RUNTIME, (ev_message_type){.runtime=EV_LIFECYCLE}, EV_RING_STOP, 0,
```

Attempting to read from `buf[2]` would read uninitialized data.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
It is not enough to check for bytes = elements * 8,
because elements * 8 could overflow.

Bug found by AFL (resulting in a hang).

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
@edwintorok edwintorok force-pushed the private/edvint/msan-fix-only-0 branch from 6a4efb3 to 83d1fb2 Compare July 24, 2024 13:46
@edwintorok
Copy link
Copy Markdown
Contributor Author

edwintorok commented Jul 24, 2024

Done. BTW do you prefer rebase + force push for fixing up Changes entries, or merge from trunk + fixing the merge conflict?

@gasche gasche self-assigned this Jul 24, 2024
@MisterDA
Copy link
Copy Markdown
Contributor

harden against corrupted rings

no need to worry if you're a hobbit

@ghost
Copy link
Copy Markdown

ghost commented Jul 24, 2024

harden against corrupted rings

no need to worry if you're a hobbit

I'd nevertheless worry, some corrupted rings can turn bugs invisible and they become a nightmare to fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants