Skip to content

afl-fuzz instrumentation fix for classes#1345

Merged
gasche merged 2 commits intoocaml:trunkfrom
stedolan:afl-objects
Oct 5, 2017
Merged

afl-fuzz instrumentation fix for classes#1345
gasche merged 2 commits intoocaml:trunkfrom
stedolan:afl-objects

Conversation

@stedolan
Copy link
Copy Markdown
Contributor

MPR#7612 reports a bug in the afl-fuzz instrumentation produced by OCaml 4.05: the instrumentation output differs across multiple iterations of a deterministic program that uses objects.

The issue turns out to be the caching that objects use to lazily initialise the class data structures. At runtime, evaluating the expression object end checks a mutable field of its auto-generated class table to see whether it has already been initialised, and if not, initialises the class.

This means that instrumented programs can have different behaviour even on identical inputs, since a different code path is taken the second time an object creation expression is evaluated.

This patch fixes the issue in a very simple way: when compiling with afl-fuzz instrumentation, the class table cache is disabled and unconditionally regenerated every time an object is created.

This is a little tricky to test. I've written some tests and made them available in the ocaml-afl-persistent repository, which run various code snippets first once and then twice with afl instrumentation enabled, checking that the instrumentation output from the double run is twice that from the single run. (Since these tests depend on some tools shipped as part of afl-fuzz, I don't think it's appropriate to put them in the OCaml test suite)

@mshinwell mshinwell added the bug label Sep 15, 2017
@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 15, 2017

It's a bit disappointing to make instrumentation robust by making the code (possibly sensibly) slower.

Would it be possible to fix the same issue by using an attribute to locally disable Afl_instrument, and sprinkling it in the right places in translclass? One advantage of this approach is that user-level code that embeds similar caching logic could use the attribute to regain stability, without changing the behavior (or performance) of the code.

(If I understand correctly, currently attributes are not preserved down the backend, and in particular they don't exist anymore at the level at which Afl_instrument operates. One long-term fix would be to propagate locations and attributes deeper down the backend. One shorter-term solution would be to keep the information as an extra argument of branching instructions (if-then-else, match/switch) only.)

@dra27
Copy link
Copy Markdown
Member

dra27 commented Sep 15, 2017

Regarding the tests - are they particularly time-consuming? The need to install 3rd party tools is fine, there are already tests which depend on C# or Fortran compilers, for example, which just display "skipped" if the required tool is missing.

@damiendoligez damiendoligez added this to the 4.06.0 milestone Sep 22, 2017
@stedolan
Copy link
Copy Markdown
Contributor Author

@gasche I agree that it would be better to selectively disable instrumentation around the caching logic, but this is more work than I have time for before 4.06. I'd prefer to fix the bug now, and allow more selective instrumentation control later.

An attribute would indeed be a nice way of expressing this, but as you note attributes are lost early. Location information is preserved by a primitive operation Ploc which is semantically the identity function. I suppose a primitive Puninstrumented would work as well, but I'm not convinced this is a nice way to express what's needed.

@dra27 no, the tests are very quick to run. I've added a horrible script to tests/afl-instrumentation

@gasche gasche closed this Sep 27, 2017
@gasche gasche reopened this Sep 27, 2017
Copy link
Copy Markdown
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree in principle with the idea of having a first working solution for 4.06, to be refined later, given that the change only affect users of the afl-fuzz instrumentation.

I reviewed the code to check that, when afl-fuzz instrumentation is not set, it does not change the behavior. I believe the change would be safe to be merged, including in 4.06.0.

On the other hand, I don't understand the class-caching logic, and I could not check that the change is made at the place where it should be made (for example it's not clear to me that we need to set up the cache machinery each time if we are going not to use it, maybe the afl-fuzz codepath could be a larger revert of f209562 ).

(cc @garrigue )

@stedolan
Copy link
Copy Markdown
Contributor Author

stedolan commented Oct 2, 2017

On the other hand, I don't understand the class-caching logic, and I could not check that the change is made at the place where it should be made

The class-caching logic is complicated. There appear to be many different cases depending on the degree to which the class structure depends on the dynamic environment. (The complexity here is what makes it difficult to selectively disable instrumentation around the caching logic).

The change this patch makes is to remove (when instrumentation is enabled) the only point where that code makes a branch based on mutable state (using Lifthenelse (lfield cached 0, ...)). While I don't claim to fully understand the class caching logic, I do believe that this change is sufficient to make all executions of the code have the same behaviour as the first one.

@gasche
Copy link
Copy Markdown
Member

gasche commented Oct 5, 2017

I'd like to merge this in 4.06 -- my reasoning is that the afl-fuzz mode is very experimental anyway, so there is not much risk. @stedolan, could you rebase? (Hopefully that could also un-stuck Travis, restarting manually does not help.)

Disable class initialisation cache when compiling with afl-fuzz
instrumentation enabled. See MPR#7612
@stedolan
Copy link
Copy Markdown
Contributor Author

stedolan commented Oct 5, 2017

Done. Travis seems to be going again, but there's a queue.

@gasche gasche merged commit 35316cc into ocaml:trunk Oct 5, 2017
@gasche
Copy link
Copy Markdown
Member

gasche commented Oct 5, 2017

Merged, and cherry-picked in 4.06 ( 3f6eaad and d36cc37 ).

@gasche
Copy link
Copy Markdown
Member

gasche commented Oct 8, 2017

The test lacked a clean target (this is now fixed), but it also emits noise on stderr because which only has its stdout redirected (will fix). Grumble grumble.

(I haven't found a "grumbling face" emoji in the Unicode list; neither "persevering face", "face with steam from nose" nor "grimacing face" are really appropriate, but I sort of like the symbolism of "person fencing".)

@shindere
Copy link
Copy Markdown
Contributor

The test this PR was supposed to fix is still broken on trunk.

See MPR#7725(https://caml.inria.fr/mantis/view.php?id=7725).

@gasche
Copy link
Copy Markdown
Member

gasche commented Feb 13, 2018

I discussed this with @stedolan in stedolan/ocaml-afl-persistent#2 (comment) . At the time, the current state was that the test does not work if you used ./configure -afl-instrument, but that it works otherwise (if you did not force pervasive instrumentation at configure time).

@shindere
Copy link
Copy Markdown
Contributor

shindere commented Feb 13, 2018 via email

@gasche
Copy link
Copy Markdown
Member

gasche commented Feb 13, 2018

I just checked again and indeed the tests work for trunk. @shindere, feel free to disable it when -afl-instrument was set as configure-time (this can be seen in ocamlc -configure for example).

Of course it would be nicer to change the test so that it always work, but this is difficult and it would require more work than I can spare right now. The test on non-afl-configured systems is checking what it was designed check (the stability impact of method caches).

@gasche
Copy link
Copy Markdown
Member

gasche commented Feb 13, 2018

So from what you write, @gasche, am I correctly understanding that the effect of the configure option is to instrument the compiler itself?

No, the configure-time setting has the effect that all compiler-compiled modules are instrumented (including the standard library).

And that the ability to compile programs with instrumentation is always there and can not be disabled, is this correct?

This is correct.

If you haven't set the configure-time option, in general you will run your instrumented modules on top of a non-instrumented standard library (and other users modules may or may not be instrumented).

@shindere
Copy link
Copy Markdown
Contributor

shindere commented Feb 13, 2018 via email

@gasche
Copy link
Copy Markdown
Member

gasche commented Feb 13, 2018

I don't have the details in memory so I'm not sure I can explain the problem in details (I just tried to summarize the reason why instrumenting the stdlib makes this GPR insufficient for method cache stability, but I realized that the explication I was reconstructing was wrong.)

The problem with this specific test is that part of OCaml's object-oriented semantics is defined as a library (that uses unsafe code), in CamlInternalOO. The compiler generates code that call these functions. If those functions are instrumented by -afl-instrument (in particular when the standard library as a whole is instrumented), then anything non-deterministic in the object semantics (I suspect the generation of fresh object ids) may reflect itself as these functions behaving differently, and then afl-fuzz detects unstability and the test fails. If that was the reason (I'm not completely sure), hardcoding the fact that CamlInternalOO should not be instrumented in asmcomp/afl_instrument.ml (not too hard) could suffice to solve the problem for this test.)

The general problem (the reason why we have stability issues for some OCaml features) is that the way afl-instrumentation is implemented right now, at the cmm level, makes it very hard to say "please ignore the non-determinism in that part of the program, we know it is unimportant", which is what we need to do to get observable-determinism without sacrificing performance. I have spent some time trying to implement a way to get something like that, and this is what stedolan/ocaml-afl-persistent#2 is about (see there for details), but it's a difficult problem that I don't fully understand and requires more work. We could give this as a compilation-oriented implementation (not research) internship to a master student.

@shindere
Copy link
Copy Markdown
Contributor

shindere commented Feb 13, 2018 via email

@vicuna vicuna mentioned this pull request Mar 14, 2019
EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants