Skip to content
This repository was archived by the owner on Jun 21, 2024. It is now read-only.

AFL: segfault and lock resetting (fixes #497)#731

Merged
jmid merged 5 commits intoocaml-multicore:5.00from
jmid:afl-fix-fatal-unlock
Nov 9, 2021
Merged

AFL: segfault and lock resetting (fixes #497)#731
jmid merged 5 commits intoocaml-multicore:5.00from
jmid:afl-fix-fatal-unlock

Conversation

@jmid
Copy link
Copy Markdown
Collaborator

@jmid jmid commented Nov 3, 2021

This is a fix to get AFL-instrumentation running on multicore again.

First off, the segfault reported in #497 is caused by a NULL-pointer dereference during error reporting.
The error is triggered because afl-fuzz runs the tested program under a memory budget with setrlimit:
https://github.com/google/AFL/blob/61037103ae3722c8060ff7082994836a794f978e/afl-fuzz.c#L2034-L2040
The allocation of the minor heap space in caml_init_domains thus fails and calls caml_raise_out_of_memory.
It eventually ends up in caml_raise where it dereferences Caml_state which hasn't been initialized yet:

exception_pointer = (char*)Caml_state->c_stack;

The PR changes caml_init_domains to use caml_fatal_error consistently.

Secondly, the AFL-instrumentation uses fork as an optimization to push more tests through.
For a simple setup, afl-fuzz can fork and execute a program repeatedly for each mutated input.
This is the behaviour you get with AFL_NO_FORKSRV=1 afl-fuzz -i input -o output a.out (also mentioned in #497).
To save time on redundant linking and initialization AFL also offers a "fork server" mode. This mode instead pauses the instrumented program at the point just before "proper execution" begins, and then cheaply forks test programs whenever afl-fuzz tells it to:

while (1) {
int child_pid = fork();
if (child_pid < 0) caml_fatal_error("afl-fuzz: could not fork");
else if (child_pid == 0) {
/* Run the program */

This setup is described in more detail in https://lcamtuf.blogspot.com/2014/10/fuzzing-binaries-without-execve.html
Note: Because we are still during initialization this fork occurrence will only execute with 1 domain running.
The child however has problems with unlocking akin to #471. Since afl-fuzz silences the tested program
https://github.com/google/AFL/blob/61037103ae3722c8060ff7082994836a794f978e/afl-fuzz.c#L2067-L2068
an strace -f helped reveal the problem:

...
[pid 97880] close(198)                  = 0
[pid 97880] close(199)                  = 0
[pid 97880] write(2, "Fatal error during unlock", 25) = 25
[pid 97880] write(2, ": Operation not permitted\n", 26) = 26
[pid 97880] exit_group(2)               = ?
[pid 97880] +++ exited with 2 +++
[pid 97879] <... wait4 resumed>[{WIFEXITED(s) && WEXITSTATUS(s) == 2}], WSTOPPED, NULL) = 97880
[pid 97879] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=97880, si_uid=1000, si_status=2, si_utime=0, si_stime=0} ---
[pid 97879] write(199, "\0\2\0\0", 4)   = 4
...

Because the child quickly fails silently during afl-fuzzs perform_dry_run it reports it as
No instrumentation detected which is a bit confusing https://github.com/google/AFL/blob/61037103ae3722c8060ff7082994836a794f978e/afl-fuzz.c#L2629-L2632

To fix the unlock issue the PR uses the caml_atfork_hook from 1a2c612
With this fix the test case from #497 runs again using -m none (no memory budget).
I'll submit a separate PR regarding heap memory requirements.

Copy link
Copy Markdown
Collaborator

@ctk21 ctk21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So using caml_fatal_error in caml_init_domains is a good thing.

I'm a bit worried about allowing fork in caml_setup_afl without checking that there are not multiple domains running, but I'm not an AFL expert. From what I could tell, caml_setup_afl is called from 'instrumented modules during intitialization'. Is it not possible for the following to happen:

  • module A: uninstrumented and starts a domain
  • module B: instrumented but initialized after module A at which point we have multiple domains

If this could happen, it could be fixed by inserting a caml_domain_alone() guard and erroring out if multiple domains are running?

@kayceesrk
Copy link
Copy Markdown
Contributor

If this could happen, it could be fixed by inserting a caml_domain_alone() guard and erroring out if multiple domains are running?

This seems like a reasonable step to move forward.

@jmid
Copy link
Copy Markdown
Collaborator Author

jmid commented Nov 8, 2021

Following @ctk21's suggestion I've added a caml_domain_alone guard to ensure that the afl-instrumentation's "fork-server" does not invoke fork after other domains have already spawned.

@jmid
Copy link
Copy Markdown
Collaborator Author

jmid commented Nov 8, 2021

I noticed that ./configure -afl-instrument and make fails with:

../boot/ocamlrun ../ocamlopt -strict-sequence -absname -w +a-4-9-41-42-44-45-48-70 -g -warn-error +A -bin-annot -nostdlib -principal -safe-string -strict-formats  -function-sections  -afl-inst-ratio 0 -c camlinternalLazy.ml
../boot/ocamlrun ../ocamlopt -strict-sequence -absname -w +a-4-9-41-42-44-45-48-70 -g -warn-error +A -bin-annot -nostdlib -principal -safe-string -strict-formats  -function-sections   \
           -o stdlib__Lazy.cmx -c lazy.ml
>> Fatal error: Primitive CamlinternalLazy.force not found.
Fatal error: exception Misc.Fatal_error
make[4]: *** [Makefile:237: stdlib__Lazy.cmx] Error 2

This is caused by c0ef11f AFAICS which unified force and force_val in one generic force_gen operation (CC @gadmm). For now I've gone with just calling that, but one could also roll back the unified interface to minimize interface changes to trunk. I've furthermore updated the comment in camlinternalLazy.ml to match the updated comment in trunk OCaml, so that it is clearer that the operation is actually used.

Thematically this is part of getting "AFL instrumentation working again" - but this can also be submitted as a separate PR if you prefer.

@gadmm
Copy link
Copy Markdown
Contributor

gadmm commented Nov 8, 2021

This is caused by c0ef11f AFAICS which unified force and force_val in one generic force_gen operation (CC @gadmm). For now I've gone with just calling that, but one could also roll back the unified interface to minimize interface changes to trunk. I've furthermore updated the comment in camlinternalLazy.ml to match the updated comment in trunk OCaml, so that it is clearer that the operation is actually used.

Thanks, this seems correct to me, but I am not an expert of the affected code so it is best proof-read by someone else as well.

Copy link
Copy Markdown
Contributor

@kayceesrk kayceesrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me.

@jmid jmid merged commit b47bdeb into ocaml-multicore:5.00 Nov 9, 2021
sadiqj pushed a commit to sadiqj/ocaml that referenced this pull request Jan 10, 2022
…fix-fatal-unlock

AFL: segfault and lock resetting (fixes ocaml#497). Also fixes broken ./configure -afl-instrument && make
ctk21 pushed a commit to ctk21/ocaml that referenced this pull request Jan 11, 2022
…fix-fatal-unlock

AFL: segfault and lock resetting (fixes ocaml#497). Also fixes broken ./configure -afl-instrument && make
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants