Instrumentation for american fuzzy lop (afl-fuzz)#504
Instrumentation for american fuzzy lop (afl-fuzz)#504damiendoligez merged 6 commits intoocaml:trunkfrom
Conversation
|
To make this easier to review, could you just give a paragraph or two about what the new code does (the instrumentation, and the runtime portion)? I would suggest that the afl-specific code in Cmmgen is moved into a separate file called afl.ml, which then matches up nicely with the runtime file. Please format the code to 80 columns. This contribution would need a CLA. Have you signed one? |
|
Can you say a few words about the use of Random in the instrumentation code? Should one favors a reproducible behavior (fixed seed) or rather a more random one (auto init)? In both cases, it seems better to create a custom rng object instead of relying on the global one. |
I think it is better to use the global Random. Deterministic builds vs. truly randomised sampling is a global decision - there's no point in having separate parts of the compiler make their own decisions, since the build will only be reproducible bit-for-bit if every component is deterministic. So, I use Incidentally, this only matters in the rare case that you're using |
asmcomp/cmmgen.ml
Outdated
| (* these cases add afl logging, since they may be targets of conditional branches *) | ||
| | Cifthenelse (cond, t, f) -> Cifthenelse (instrument cond, with_afl_logging t, with_afl_logging f) | ||
| | Ccatch (nfail, ids, e1, e2) -> | ||
| Ccatch (nfail, ids, instrument e1, with_afl_logging e2) |
There was a problem hiding this comment.
This probably don't need logging as the relevant branching was normally already reported before going to the Cexit
There was a problem hiding this comment.
Hmm, I think you're right. That should just be instrument.
|
To configure an opam switch to always use some option you can add them to a file named The format is To be able to use that you should add a Also, we will probably need some special handling/annotation of different kind of uncaught exception to be |
|
Also can you give information about some values that can matter for the control flow to the tool ? |
But how do I specify that in an OPAM .comp file? I can easily specify Also, it's not generally useful to instrument only part of a project's code.
I suspect it would be easier to write a wrapper that silently ignores the |
What do you mean? |
asmcomp/cmmgen.ml
Outdated
| | Cconst_blockheader _ | Cvar _ as c -> c | ||
|
|
||
| let maybe_instrument c = | ||
| if !Clflags.afl_instrument then instrument c else c |
There was a problem hiding this comment.
This should be a real instrumentation point, an indirect call can take multiple branches depending on the function.
Avoiding instrumentation on for direct calls would be interesting to limit collisions, but probably a bit tricky
There was a problem hiding this comment.
Done. There's no benefit to avoiding the instrumentation on direct calls - the main thing that afl-fuzz is designed to do is filter out such noise.
There was a problem hiding this comment.
Not exactly, having irrelevant points in the trace can be hash conflict and make traces indistinguishable. But that's probably not important enough to care.
| } | ||
|
|
||
| void caml_maybe_setup_afl() | ||
| { |
There was a problem hiding this comment.
There needs to be a check here that the binary was instrumented for AFL. Otherwise executing
__AFL_SHM_ID=10a <any native-code OCaml program>would cause the program to crash.
Probably the best approach is for ocamlopt to link in a different C-level main function when instrumentation is enabled.
There was a problem hiding this comment.
Fixed! afl setup is now done from a primitive, a call to which is inserted in module initialisers that are instrumented. So, only afl-instrumented code checks __AFL_SHM_ID
|
I would be interested in moving forward on this PR. @alainfrisch and @chambart , you have looked at the codebase closely, do you think that it would be possible to merge it? @damiendoligez or @xavierleroy , do you have an opinion? (I understand that this is currently the only direct use of |
(I did not really look closely. It's just the use of Random that caught my eye.) |
|
I think it looks good. A few remarks:
|
|
@stedolan Have you signed a CLA? |
|
This is probably not the right place give "user feedbacks", but a few months ago i tried this tool on a bug i was struggling with. It was very successful and i was impressed by the quality of the results. It found the bug and a few others ones i wasn't looking for. Things were not totally perfect though. I tried it on a binary that had a quite high initialization cost, and the fuzzer was running at the snail speed of 10 execs/s. |
da0d4d2 to
40bf52f
Compare
|
Finally got around to updating this! Fixes since last version:
The instrumentation code uses |
gasche
left a comment
There was a problem hiding this comment.
Given that the patch is fairly non-invasive and provides useful functionality, I would be in favor of merging it in the trunk.
I think it would be good to have minimal sanity checking of the instrument code (the fact that it works when afl-fuzz is absent for example) somewhere in the testsuite.
asmcomp/afl_instrument.ml
Outdated
| | Cblockheader _ | Cvar _ as c -> c | ||
|
|
||
| let instrument_function c = | ||
| if !Clflags.afl_instrument then with_afl_logging c else c |
There was a problem hiding this comment.
There is something slightly weird with the proposed API, which is that instrument unconditionally adds instrumentation, but instrument_{function,initialiser} conditionally add instrumentation (and are called unconditionally from the driver). The fact that instrument_* are no-ops when instrumentation is not configured should be explicitly documented in afl_instrument.mli. I think I would have a mild preference for all these functions doing un-conditional instrumentation, and the Clflags test being done from the handler -- Cmmgen here.
There was a problem hiding this comment.
You're right, it's a bad API. Checking Clflags in Cmmgen is better.
| inf " safe strings ............. no" | ||
| fi | ||
| if test "$afl_instrument" = "true"; then | ||
| inf " afl-fuzz always enabled .. yes" |
There was a problem hiding this comment.
Do we really need the configure setting now that $(opam config var lib)/ocaml/ocaml_compiler_internal_params exists?
byterun/misc.c
Outdated
| { | ||
| fprintf (stderr, "%s", msg); | ||
| exit(2); | ||
| abort(); |
There was a problem hiding this comment.
I don't know anything about exit(n) vs. abort. Could you comment on the potential impact of this change on users? As far as I can tell, this is the only part of the patch that is invasive, in the sense that it modifies the behavior for non-afl users.
There was a problem hiding this comment.
This is a fairly large semantic change to caml_fatal_error, as abort will send SIGABRT and not cause atexit handlers to to be invoked.
See the 4.03.0 compiler description in the opam repository. I have mixed feelings about this feature (in particular about the fact that it is currently undocumented -- MPR#7311), but if it is there we may as well use it and avoid adding configure-time arguments that do not really require being configure-time arguments. |
|
I am in favour of this change being included. I think the matter of |
That seems like a bit of a hack. Is this really how behaviour of an OCaml installation should be configured?
I'm not sure what the correct behaviour of |
|
I agree. The GHC RTS has an analogous function called barf(), which does On Oct 19, 2016 13:59, "Stephen Dolan" notifications@github.com wrote:
|
I don't know, but I'm not sure that adding configure flags for "enable this command-line flag by default please" for all configure flags of interest is a better approach either. If we decide we want to have the configure flag, then I would suggest making it complete by having both a |
|
Yes, I've signed a CLA. The patch already has a configure option (although not the inverse), a runtime option, OCAMLPARAM support and compiler config file support. I don't think this is a great situation, needing to modify so many places to add an option. I've created #865 as an alternative way of specifying defaults at configure-time. |
|
I changed This patch still has a Is there anything else that needs to be done before this patch can be merged? |
|
I'm in favor of merging now that these issues have been well-discussed (I think the current consensus is reasonable, and improvable after merge), but this is a large enough change that I will wait for support from a maintainer before going with it. One thing that I would like to see (apologies for thinking of it only now) is some documentation for the feature. In particular, a small demo with a small buggy OCaml program and a step-by-step guide on how to afl-fuzz it would be very, very helpful for others willing to use the feature. Such documentation could go in the manual, as a new chapter of the "The OCaml tools" part, that is a new file in |
9f151c7 to
1f7acc2
Compare
|
@mshinwell do I correctly understand that you are in favor of merging the branch in its current state? |
|
I think it's ok to merge, but I'd like to see one change made first: can we pull out the byterun/{misc,printexc}.c changes and merge those separately? They look like fixes unrelated to AFL. |
|
Might be worth a note somewhere that this could be extended to the Windows ports at some point. This looks actively maintained: https://github.com/ivanfratric/winafl |
|
This breaks Inria's CI on macosx: afl.c:111:11: error: implicit declaration of function 'kill' is invalid in C99 |
|
Indeed, implicit declarations are no longer kosher in C99. (And are often wrong.) Just insert in the relevant file. |
|
CI should work again: byterun/afl.c missed the signal.h header. Incidentally, this file has no copyright header. |
|
Xavier Leroy (2016/12/06 10:34 -0800):
Indeed, implicit declarations are no longer kosher in C99. (And are often wrong.) Just insert
```
#include <sys/types.h>
#include <signal.h>
```
in the relevant file.
The first one was already there, I just added the second one.
|
64235a3 flambda-backend: Change Float.nan from sNaN to qNaN (ocaml#466) 14a8e27 flambda-backend: Track GC work for all managed bigarray allocations (upstream 11022) (ocaml#569) c3cda96 flambda-backend: Add two new methods to targetint for dwarf (ocaml#560) e6f1fed flambda-backend: Handle arithmetic overflow in select_addr (ocaml#570) dab7209 flambda-backend: Add Target_system to ocaml/utils (ocaml#542) 82d5044 flambda-backend: Enhance numbers.ml with more primitive types (ocaml#544) 216be99 flambda-backend: Fix flambda_o3 and flambda_oclassic attributes (ocaml#536) 4b56e07 flambda-backend: Test naked pointer root handling (ocaml#550) 40d69ce flambda-backend: Stop local function optimisation from moving code into function bodies; opaque_identity fixes for class compilation (ocaml#537) f08ae58 flambda-backend: Implemented inlining history and use it inside inlining reports (ocaml#365) ac496bf flambda-backend: Disable the local keyword in typing (ocaml#540) 7d46712 flambda-backend: Bugfix for Typedtree generation of arrow types (ocaml#539) 61a7b47 flambda-backend: Insert missing page table check in roots_nat.c (ocaml#541) 323bd36 flambda-backend: Compiler error when -disable-all-extensions and -extension are used (ocaml#534) d8956b0 flambda-backend: Persistent environment and reproducibility (ocaml#533) 4a0c89f flambda-backend: Revert "Revert bswap PRs (480 and 482)" (ocaml#506) 7803705 flambda-backend: Cause a C warning when CAMLreturn is missing in C stubs. (ocaml#376) 6199db5 flambda-backend: Improve unboxing during cmm for Flambda (ocaml#295) 96b9e1b flambda-backend: Print diagnostics at runtime for Invalid (ocaml#530) 42ab88e flambda-backend: Disable bytecode compilers in ocamltest (ocaml#504) 58c72d5 flambda-backend: Backport ocaml#10595 from upstream/trunk (ocaml#471) 1010539 flambda-backend: Use C++ name mangling convention (ocaml#483) 81881bb flambda-backend: Local allocation test no longer relies on lifting (ocaml#525) f5c4719 flambda-backend: Fix an assertion in Closure that breaks probes (ocaml#505) c2cf2b2 flambda-backend: Add some missing command line arguments to ocamlnat (ocaml#499) git-subtree-dir: ocaml git-subtree-split: 64235a3
Co-authored-by: tmattio <tmattio@users.noreply.github.com>
american fuzzy lop is a fuzzer that very effectively finds bugs by instrumenting programs, observing which execution paths test cases trigger, and permuting test cases to find more interesting execution paths. It's found bugs in many, many pieces of software.
This patch adds support to
ocamloptfor generatingafl-fuzzcompatible instrumentation, as well as minimal runtime support to communicate with the fuzzer. An earlier version of this patch has already found a couple of minor bugs in the OCaml compiler (see tag "afl" on mantis).The changes are:
-afl-instrumenttoocamlopt, which enables instrumentation (as well as the more advanced-afl-inst-ratiofor controlling generation)-afl-instrumentto enable instrumentation for all binaries built (useful for installing as an opam switch to fuzz large pieces of software)afl-fuzz, programs which die withcaml_fatal_erroror uncaught exceptions callabort()rather thanexit(2)The inner workings of afl's instrumentation are described here. In short,
afl-fuzzsets up a shared memory segment which is mapped by the instrumented binary (code living inasmrun/afl.c). The instrumented binary repeatedly forks and runs using input that afl-fuzz generates. Every time a conditional branch is taken, the fact is logged in the shared memory segment by a short piece of code inserted at the start of each branch target (code living incmmgen.ml).For an example of usage, compile this file with
-afl-instrument, and fuzz with afl:By inspecting instrumentation output, the fuzzer finds the crashing input in a couple of minutes.