afl-fuzz instrumentation fix for classes by stedolan · Pull Request #1345 · ocaml/ocaml

stedolan · 2017-09-15T15:54:47Z

MPR#7612 reports a bug in the afl-fuzz instrumentation produced by OCaml 4.05: the instrumentation output differs across multiple iterations of a deterministic program that uses objects.

The issue turns out to be the caching that objects use to lazily initialise the class data structures. At runtime, evaluating the expression object end checks a mutable field of its auto-generated class table to see whether it has already been initialised, and if not, initialises the class.

This means that instrumented programs can have different behaviour even on identical inputs, since a different code path is taken the second time an object creation expression is evaluated.

This patch fixes the issue in a very simple way: when compiling with afl-fuzz instrumentation, the class table cache is disabled and unconditionally regenerated every time an object is created.

This is a little tricky to test. I've written some tests and made them available in the ocaml-afl-persistent repository, which run various code snippets first once and then twice with afl instrumentation enabled, checking that the instrumentation output from the double run is twice that from the single run. (Since these tests depend on some tools shipped as part of afl-fuzz, I don't think it's appropriate to put them in the OCaml test suite)

gasche · 2017-09-15T21:22:23Z

It's a bit disappointing to make instrumentation robust by making the code (possibly sensibly) slower.

Would it be possible to fix the same issue by using an attribute to locally disable Afl_instrument, and sprinkling it in the right places in translclass? One advantage of this approach is that user-level code that embeds similar caching logic could use the attribute to regain stability, without changing the behavior (or performance) of the code.

(If I understand correctly, currently attributes are not preserved down the backend, and in particular they don't exist anymore at the level at which Afl_instrument operates. One long-term fix would be to propagate locations and attributes deeper down the backend. One shorter-term solution would be to keep the information as an extra argument of branching instructions (if-then-else, match/switch) only.)

dra27 · 2017-09-15T22:38:28Z

Regarding the tests - are they particularly time-consuming? The need to install 3rd party tools is fine, there are already tests which depend on C# or Fortran compilers, for example, which just display "skipped" if the required tool is missing.

stedolan · 2017-09-22T16:30:16Z

@gasche I agree that it would be better to selectively disable instrumentation around the caching logic, but this is more work than I have time for before 4.06. I'd prefer to fix the bug now, and allow more selective instrumentation control later.

An attribute would indeed be a nice way of expressing this, but as you note attributes are lost early. Location information is preserved by a primitive operation Ploc which is semantically the identity function. I suppose a primitive Puninstrumented would work as well, but I'm not convinced this is a nice way to express what's needed.

@dra27 no, the tests are very quick to run. I've added a horrible script to tests/afl-instrumentation

gasche

I agree in principle with the idea of having a first working solution for 4.06, to be refined later, given that the change only affect users of the afl-fuzz instrumentation.

I reviewed the code to check that, when afl-fuzz instrumentation is not set, it does not change the behavior. I believe the change would be safe to be merged, including in 4.06.0.

On the other hand, I don't understand the class-caching logic, and I could not check that the change is made at the place where it should be made (for example it's not clear to me that we need to set up the cache machinery each time if we are going not to use it, maybe the afl-fuzz codepath could be a larger revert of f209562 ).

(cc @garrigue )

stedolan · 2017-10-02T09:38:20Z

On the other hand, I don't understand the class-caching logic, and I could not check that the change is made at the place where it should be made

The class-caching logic is complicated. There appear to be many different cases depending on the degree to which the class structure depends on the dynamic environment. (The complexity here is what makes it difficult to selectively disable instrumentation around the caching logic).

The change this patch makes is to remove (when instrumentation is enabled) the only point where that code makes a branch based on mutable state (using Lifthenelse (lfield cached 0, ...)). While I don't claim to fully understand the class caching logic, I do believe that this change is sufficient to make all executions of the code have the same behaviour as the first one.

gasche · 2017-10-05T15:16:30Z

I'd like to merge this in 4.06 -- my reasoning is that the afl-fuzz mode is very experimental anyway, so there is not much risk. @stedolan, could you rebase? (Hopefully that could also un-stuck Travis, restarting manually does not help.)

Disable class initialisation cache when compiling with afl-fuzz instrumentation enabled. See MPR#7612

stedolan · 2017-10-05T16:19:13Z

Done. Travis seems to be going again, but there's a queue.

gasche · 2017-10-05T19:46:05Z

Merged, and cherry-picked in 4.06 ( 3f6eaad and d36cc37 ).

gasche · 2017-10-08T15:30:55Z

The test lacked a clean target (this is now fixed), but it also emits noise on stderr because which only has its stdout redirected (will fix). Grumble grumble.

(I haven't found a "grumbling face" emoji in the Unicode list; neither "persevering face", "face with steam from nose" nor "grimacing face" are really appropriate, but I sort of like the symbolism of "person fencing".)

shindere · 2018-02-13T14:47:48Z

The test this PR was supposed to fix is still broken on trunk.

See MPR#7725(https://caml.inria.fr/mantis/view.php?id=7725).

gasche · 2018-02-13T15:34:00Z

I discussed this with @stedolan in stedolan/ocaml-afl-persistent#2 (comment) . At the time, the current state was that the test does not work if you used ./configure -afl-instrument, but that it works otherwise (if you did not force pervasive instrumentation at configure time).

shindere · 2018-02-13T15:42:00Z

Gabriel Scherer (2018/02/13 15:34 +0000):

I discussed this with @stedolan in stedolan/ocaml-afl-persistent#2 (comment) . At the time, the current state was that the test does not work if you used `./configure -afl-instrument`, but that it works otherwise (if you did *not* force pervasive instrumentation at configure time).

Ok, god to know this is known, thanks. To be honest, the meaning of the configure option and of the different command-line options, and their relation to each other, is unclear to me. So from what you write, @gasche, am I correctly understanding that the effect of the configure option is to instrument the compiler itself? And that the ability to compile programs with instrumentation is always there and can not be disabled, is this correct?

gasche · 2018-02-13T15:44:30Z

I just checked again and indeed the tests work for trunk. @shindere, feel free to disable it when -afl-instrument was set as configure-time (this can be seen in ocamlc -configure for example).

Of course it would be nicer to change the test so that it always work, but this is difficult and it would require more work than I can spare right now. The test on non-afl-configured systems is checking what it was designed check (the stability impact of method caches).

gasche · 2018-02-13T15:50:03Z

So from what you write, @gasche, am I correctly understanding that the effect of the configure option is to instrument the compiler itself?

No, the configure-time setting has the effect that all compiler-compiled modules are instrumented (including the standard library).

And that the ability to compile programs with instrumentation is always there and can not be disabled, is this correct?

This is correct.

If you haven't set the configure-time option, in general you will run your instrumented modules on top of a non-instrumented standard library (and other users modules may or may not be instrumented).

shindere · 2018-02-13T15:52:39Z

OK, @gasche, many thanks for the clarifications, very helpful. I understand you can not yourself spend time fixing the test so that it works in both configurations, but, just out of curiosity: would you be able to describe the kind of work this would require?

gasche · 2018-02-13T16:04:48Z

I don't have the details in memory so I'm not sure I can explain the problem in details (I just tried to summarize the reason why instrumenting the stdlib makes this GPR insufficient for method cache stability, but I realized that the explication I was reconstructing was wrong.)

The problem with this specific test is that part of OCaml's object-oriented semantics is defined as a library (that uses unsafe code), in CamlInternalOO. The compiler generates code that call these functions. If those functions are instrumented by -afl-instrument (in particular when the standard library as a whole is instrumented), then anything non-deterministic in the object semantics (I suspect the generation of fresh object ids) may reflect itself as these functions behaving differently, and then afl-fuzz detects unstability and the test fails. If that was the reason (I'm not completely sure), hardcoding the fact that CamlInternalOO should not be instrumented in asmcomp/afl_instrument.ml (not too hard) could suffice to solve the problem for this test.)

The general problem (the reason why we have stability issues for some OCaml features) is that the way afl-instrumentation is implemented right now, at the cmm level, makes it very hard to say "please ignore the non-determinism in that part of the program, we know it is unimportant", which is what we need to do to get observable-determinism without sacrificing performance. I have spent some time trying to implement a way to get something like that, and this is what stedolan/ocaml-afl-persistent#2 is about (see there for details), but it's a difficult problem that I don't fully understand and requires more work. We could give this as a compilation-oriented implementation (not research) internship to a master student.

shindere · 2018-02-13T16:14:25Z

OK, thanks a lot for having taken the time to explain, @gasche. I understand the problem is hard. I'll disable the test if afl instrumentation has been enabled at configure time, as you suggest, this is easy to implement.

mshinwell added the bug label Sep 15, 2017

damiendoligez added this to the 4.06.0 milestone Sep 22, 2017

gasche closed this Sep 27, 2017

gasche reopened this Sep 27, 2017

gasche approved these changes Sep 30, 2017

View reviewed changes

stedolan added 2 commits October 5, 2017 16:54

afl-fuzz instrumentation fix for classes.

7ed63d4

Disable class initialisation cache when compiling with afl-fuzz instrumentation enabled. See MPR#7612

Add tests and Changes entry for afl-fuzz classes fix.

b5c9755

stedolan force-pushed the afl-objects branch from f9f8c7d to b5c9755 Compare October 5, 2017 16:00

gasche merged commit 35316cc into ocaml:trunk Oct 5, 2017

gasche mentioned this pull request Oct 7, 2017

fix 'make clean' broken by testsuite/tests/afl-instrumentation #1412

Merged

gasche removed the bug label Dec 7, 2017

gasche mentioned this pull request Dec 8, 2017

Soliciting feedback on a (currently failing) attempt to improve "stability" by tweaking the compiler stedolan/ocaml-afl-persistent#2

Closed

gasche mentioned this pull request May 3, 2018

AFL stability fixes for objects (MPR#7725) and lazy values. #1754

Merged

vicuna mentioned this pull request Mar 14, 2019

afl-fuzz reports 16.00% stability for a simple constant object expression #7612

Closed

vicuna mentioned this pull request Mar 14, 2019

AFL test broken #7725

Closed

EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024

add workflow to annotate broken links in markdown (ocaml#1345)

2ffeab5

Conversation

stedolan commented Sep 15, 2017

Uh oh!

gasche commented Sep 15, 2017

Uh oh!

dra27 commented Sep 15, 2017

Uh oh!

stedolan commented Sep 22, 2017

Uh oh!

gasche left a comment

Choose a reason for hiding this comment

Uh oh!

stedolan commented Oct 2, 2017

Uh oh!

gasche commented Oct 5, 2017

Uh oh!

stedolan commented Oct 5, 2017

Uh oh!

gasche commented Oct 5, 2017

Uh oh!

gasche commented Oct 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shindere commented Feb 13, 2018

Uh oh!

gasche commented Feb 13, 2018

Uh oh!

shindere commented Feb 13, 2018 via email

Uh oh!

gasche commented Feb 13, 2018

Uh oh!

gasche commented Feb 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shindere commented Feb 13, 2018 via email

Uh oh!

gasche commented Feb 13, 2018

Uh oh!

shindere commented Feb 13, 2018 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

gasche commented Oct 8, 2017 •

edited

Loading

gasche commented Feb 13, 2018 •

edited

Loading