runtime_events_consumer: fix uninitialized and out of bounds reads#13297
Merged
gasche merged 7 commits intoocaml:trunkfrom Aug 1, 2024
Merged
runtime_events_consumer: fix uninitialized and out of bounds reads#13297gasche merged 7 commits intoocaml:trunkfrom
gasche merged 7 commits intoocaml:trunkfrom
Conversation
fb40f88 to
7dd1e74
Compare
ghost
reviewed
Jul 15, 2024
MisterDA
reviewed
Jul 15, 2024
ghost
approved these changes
Jul 15, 2024
2f95806 to
6a4efb3
Compare
gasche
approved these changes
Jul 24, 2024
Member
gasche
left a comment
There was a problem hiding this comment.
Approved on behalf of @dustanddreams. Thanks!
Member
|
@edwintorok can you fix the conflict? Then this is good to merge. |
We'll want to validate the offsets in the metadata. But if the metadata is writable by another process, then it could get corrupted after we've already validated it. Make a private copy of the metadata that will be useful for bounds checking. No functional change yet. Co-authored-by: Miod Vallat <118974489+dustanddreams@users.noreply.github.com> Signed-off-by: Edwin Török <edwin.torok@cloud.com>
`max_domains * sizeof(uint64_t)` can overflow, and then the for loop below would write beyond the end of `current_positions`. All other bounds checks can be done in `read_poll`, but this one must be done here to avoid potential memory corruption. Add a testcase which takes a valid ring, and corrupts some fields in it. For now it only corrupt the max_domains fields, but this will be extended in followup commits. Bug found by ASAN, and also reproducible with the included testcase. Signed-off-by: Edwin Török <edwin.torok@cloud.com>
msg_length zero could cause us to loop infinitely. All messages have at least a header and a timestamp, except EV_INTERNAL (0), which is a padding event. Extend the testcase by corrupting message lengths. This also avoids some memory sanitizer warnings. Rejecting all msg_length < 2 would cause a failure in test_dropped_events.ml, because padding events would be incorrectly rejected. Suggested-by: Miod Vallat <miod@tarides.com> Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Protect against metadata header getting corrupted, and make a copy at cursor open time. Detect corrupted runtime events rings, check that: * metadata, data and custom offsets are within file bounds * record end is within file bounds * masked offset is within bounds * memcpy is within bounds * ring_size_elements cannot be 0, because then the mask computation is not valid anymore The bounds checks are performed as each entry is processed, to allow processing the maximum number of uncorrupted entries before raising exceptions. Found by ASAN, and (mostly) reproducible with the included testcase. Co-authored-by: Miod Vallat <118974489+dustanddreams@users.noreply.github.com> Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Found by -fsanitize=memory -fsanitize-memory-track-origins: ``` > ==102752==WARNING: MemorySanitizer: use-of-uninitialized-value > #0 0x7f2ba7fb4ea4 in caml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:496:18 > #1 0x7f2ba7fbc016 in caml_ml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:1207:9 > ocaml#2 0x59ba5c in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1058:14 > ocaml#3 0x5a9220 in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9 > ocaml#4 0x540d6b in main /var/home/edwin/git/ocaml/runtime/main.c:37:3 > ocaml#5 0x7f2ba8120087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) > ocaml#6 0x7f2ba812014a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) > ocaml#7 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 617637580ee48eff08a2bce790e1667ad09f3b69) > > Uninitialized value was stored to memory at > #0 0x7f2ba7fb4e9d in caml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:497:69 > #1 0x7f2ba7fbc016 in caml_ml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:1207:9 > ocaml#2 0x59ba5c in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1058:14 > ocaml#3 0x5a9220 in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9 > ocaml#4 0x540d6b in main /var/home/edwin/git/ocaml/runtime/main.c:37:3 > ocaml#5 0x7f2ba8120087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) > ocaml#6 0x7f2ba812014a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) > ocaml#7 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 617637580ee48eff08a2bce790e1667ad09f3b69) > > Uninitialized value was created by an allocation of 'buf' in the stack frame > #0 0x7f2ba7fb3dbc in caml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:402:7 > ``` This is in fact an EV_LIFECYCLE with EV_RING_STOP, which has 0 additional data, and thus msg_length 2: ``` runtime/runtime_events.c: EV_RUNTIME, (ev_message_type){.runtime=EV_LIFECYCLE}, EV_RING_STOP, 0, ``` Attempting to read from `buf[2]` would read uninitialized data. Signed-off-by: Edwin Török <edwin.torok@cloud.com>
It is not enough to check for bytes = elements * 8, because elements * 8 could overflow. Bug found by AFL (resulting in a hang). Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
6a4efb3 to
83d1fb2
Compare
Contributor
Author
|
Done. BTW do you prefer rebase + force push for fixing up Changes entries, or merge from trunk + fixing the merge conflict? |
Contributor
no need to worry if you're a hobbit |
I'd nevertheless worry, some corrupted rings can turn bugs invisible and they become a nightmare to fix. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
[As suggested in https://github.com//pull/13290#issuecomment-2226843600 I'm separating this fix from the MSAN CI changes]
runtime_events_consumer.cparses runtime events from a memory mapped file, created by another OCaml process.There are 2 bugs here:
buf[2]forEV_RING_STOPeventsEV_RING_STOPis an event that doesn't have additional data, and hasmsg_length == 2. Howeverruntime_events_consumeralways readbuf[2], which contained uninitialized data, which is potentially undefined behaviour in C.Avoid this by introducing bounds check before accessing
buf. Most messages have a minimum length of 2, except forEV_INTERNAL=0used for padding, which can have a length of 1.No messages can have a length of 0, because that'd result in an infinite loop.
While running a fuzzer with an MSAN (memory sanitizer) enabled build I've found some crashes due to missing bounds checks.
It'd be useful to guard against these, because:
Introduce additional bounds checks on all offsets and length in the metadata header. To prevent the header from being modified after the offsets have been validated, operate on a copy (the file header is not changed dynamically by the producer).
Also guard against multiplication overflow in the code, and in the bounds checks, e.g. restrict
max_domainstoMax_domains_max.Some of these bugs were found by fuzzing with
afl++on an ASAN, and (separately) MSAN instrumented runtime, and some by reading the code after a deeper investigation was inspired by a review comment.I've included a testcase that takes a known good event ring, and corrupts each offset in turn with various values. This reproduces most of the crashes without requiring the use of a fuzzer: they would crash on a regular ASAN sanitized build as performed by the CI currently.
I've run
afl++on the updated runtime, instrumented with ASAN and MSAN and it didn't find more crashes.Notes on how to run the fuzzer can be found here