libFuzzer support (WIP)#217
libFuzzer support (WIP)#217dvyukov merged 1 commit intodvyukov:masterfrom guidovranken:libfuzzer-mode
Conversation
| return &ast.AssignStmt{ | ||
| Lhs: []ast.Expr{counter}, | ||
| Tok: token.ASSIGN, | ||
| Rhs: []ast.Expr{ &ast.BasicLit{Kind: token.INT, Value: "1"} }, |
There was a problem hiding this comment.
Is this required? Or what's the motivation behind this change?
There was a problem hiding this comment.
Because otherwise each unique amount of branches executed registers as a new coverage signal. This leads to almost every new input getting added to the corpus.
There was a problem hiding this comment.
go-fuzz can operate in two modes (following AFL, as I understand it): either looking at which basic blocks are reached, or looking at a quantized count of how many times basic blocks are reached. See func roundUpCover and this quote from the AFL whitepaper:
In addition to detecting new tuples, the fuzzer also considers coarse tuple
hit counts. These are divided into several buckets:1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+
But I'm still new here, so I might have gotten this wrong.
There was a problem hiding this comment.
In any case, if this is what libFuzzer needs, this behavior should probably be switched by flagLibFuzzer.
There was a problem hiding this comment.
I am still don't understand the full picture -- libfuzzer should support counters, but maybe it doesn't support them via __libfuzzer_extra_counters , or maybe the problem is that you dropped:
for i := range CoverTab {
CoverTab[i] = 0
}
before each test.
Try to zero CoverTab before each test. If it does not help, then let's just switch between increments and setting to 1 depending on -libfuzzer flag.
But do we need to zero CoverTab anyway?... What does libfuzzer expect here?
There was a problem hiding this comment.
I'm afraid that using actual counters (used in standard go-fuzz, and as opposed to a binary CoverTab as implemented in my PR), will lead to a "coverage explosion".
For instance:
for i := 0; i < input1; i++ {
/* CoverTab updating inserted by go-fuzz here */
for j := 0; j < input2; j++ {
/* CoverTab updating inserted by go-fuzz here */
for k := 0; k < input3; k++ {
/* CoverTab updating inserted by go-fuzz here */
for l := 0; l < input4; l++ {
/* CoverTab updating inserted by go-fuzz here */
}
}
}
}With the standard go-fuzz CoverTab updating mechanism (incrementing the counter), this would lead to 2^32 different coverage signals, leading to 2^32 different files written to the corpus directory. This is obviously counter-productive since every unique input leads to new "coverage".
When we use my code that sets CoverTab to 1 if a branch is hit, and leaves it 0 as long it is not, the max code coverage is 4 instead of 2^32, which seems a lot more useful, especially if you consider complex targets that have enough branches of their own to give off a useful coverage signal.
I'm not a proponent of zeroing CoverTab each iteration because this incurs an unnecessary speed penalty that can be simply avoided by doing CoverTab[N] = 1 in libFuzzer mode.
So I'll just make changes to the instrumentation code to insert my customized CoverTab updater.
In the future we can look at using a hybrid in the form of using 2 or 3 bits of coverage data per branch instead of 8 bits (standard go-fuzz) or 1 bit (my PR), depending on A/B testing.
| syscall.Exit(1) | ||
| } | ||
| wr += n | ||
| func Initialize(coverTabPtr unsafe.Pointer, coverTabSize uint64) { |
There was a problem hiding this comment.
Could we just add this function to the package and leave everything else intact?
There was a problem hiding this comment.
I'd like to, but the init function is executed as soon as the binary is executed, and fails
failed to mmap fd = 3 errno = 9
There was a problem hiding this comment.
Yes, we should skip that init in libfuzzer mode.
Perhaps we could put code for normal mode and libfuzzer into separate files and use libfuzzer tag to select one or another in go-fuzz-build.
| } | ||
|
|
||
| int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) { | ||
| uint8_t* datacopy = malloc(size); |
There was a problem hiding this comment.
Why do we need to make a copy of the data rather then pass the data as is?
There was a problem hiding this comment.
Because const uint8_t data may not be passed to a function that does not treat the pointer as const. But implementing LLVMFuzzerTestOneInput in Go and using SliceHeader solves this I think.
| @@ -0,0 +1,38 @@ | |||
| #include <stdio.h> | |||
There was a problem hiding this comment.
This file need the standard copyright header. Checkout the latest version, it has just changed.
| extern void gofuzz_Run(GoSlice p0); | ||
|
|
||
|
|
||
| #ifdef __linux__ |
There was a problem hiding this comment.
This will silently break on other OSes with obscure failure mode. It's better to make the build fail loudly instead.
There was a problem hiding this comment.
Push? I don't see it yet.
There was a problem hiding this comment.
If we are looking at the same source (which is not always clear on github), then there is below:
#else
#error Currently only Linux is supported
|
|
||
| typedef long long GoInt64; | ||
| typedef GoInt64 GoInt; | ||
| typedef struct { void *data; GoInt len; GoInt cap; } GoSlice; |
There was a problem hiding this comment.
Let's pass data pointer/size to Go code instead and reconstruct the slice in Go code using https://golang.org/pkg/reflect/#SliceHeader. The less C code we have, the better.
| #include <stddef.h> | ||
| #include <stdlib.h> | ||
| #include <stdint.h> | ||
| #include <string.h> |
There was a problem hiding this comment.
How is this file involved in the build? I fail to understand how it becomes part of the build.
| extern void gofuzz_Run(GoSlice p0); | ||
|
|
||
|
|
||
| #ifdef __linux__ |
There was a problem hiding this comment.
Push? I don't see it yet.
| return &ast.AssignStmt{ | ||
| Lhs: []ast.Expr{counter}, | ||
| Tok: token.ASSIGN, | ||
| Rhs: []ast.Expr{ &ast.BasicLit{Kind: token.INT, Value: "1"} }, |
There was a problem hiding this comment.
In any case, if this is what libFuzzer needs, this behavior should probably be switched by flagLibFuzzer.
| if *flagLibFuzzer == true { | ||
| archive := c.buildInstrumentedBinary(&blocks, nil) | ||
|
|
||
| if *flagOut == "" { |
There was a problem hiding this comment.
How about we unify the decision about what to use for *flagOut. So above:
if *flagOut == "" {
suffix := ".zip"
if *flagLibFuzzer {
suffix = ".a"
}
*flagOut = c.pkgs[0].Name+"-fuzz"+suffix
}There was a problem hiding this comment.
Then if/when we add the Fuzz method name as context, we won't have to add it in three places. (We already need to add it in go-fuzz-build and go-fuzz.)
There was a problem hiding this comment.
Since libFuzzer doesn't use the literals, let's move this whole thing up higher above the giant comment and make it standalone. Something like:
if *flagLibFuzzer {
var blocks []CoverBlock
archive := ...
os.Rename ...
return
}
There was a problem hiding this comment.
Does the literal finder extract hardcoded strings from the source files?
That might actually be very useful to create a dictionary file from, that libFuzzer can consume via the -dict= parameter. But I'll have to test this.
There was a problem hiding this comment.
Should we also do the last step and run:
clang target.a -fsanitize=fuzzer
``
? This gives me a working fuzzer which is nice.
There was a problem hiding this comment.
This is interesting but won't work with oss-fuzz which requires that you link against -lFuzzingEngine.
There was a problem hiding this comment.
OK, let's leave it as is for now.
We can get back to this if/when somebody will have interest in using it. We could build both, or have an additional flag.
| *flagOut = c.pkgs[0].Name + ".a" | ||
| } | ||
|
|
||
| os.Rename(archive, *flagOut) |
There was a problem hiding this comment.
Check the returned error here
| // GOROOT/pkg/tool and GOROOT/pkg/include. | ||
| // Even better, see if we can avoid making some copies | ||
| // at all, using some combination of env vars and toolexec. | ||
| c.copyDir(filepath.Join(c.GOROOT, "src", "runtime", "cgo"), filepath.Join(c.workdir, "goroot", "src", "runtime", "cgo")) |
There was a problem hiding this comment.
Why? Maybe only do this for libfuzzer?
| return f | ||
| } | ||
|
|
||
| func getExtraBuildFlags() string { |
| } | ||
|
|
||
| func getMainSrc() string { | ||
| if *flagLibFuzzer == false { |
There was a problem hiding this comment.
Remove == false, use ! instead.
| } | ||
| } | ||
|
|
||
| func getMainSrc() string { |
There was a problem hiding this comment.
Or some such. But no leading "get".
| func getMainSrc() string { | ||
| if *flagLibFuzzer == false { | ||
| return mainSrc | ||
| } else { |
| } | ||
|
|
||
| func Main(f func([]byte) int) { | ||
| runtime.GOMAXPROCS(1) // makes coverage more deterministic, we parallelize on higher level |
There was a problem hiding this comment.
If this is going to get upstreamed, we obviously need to find some way not to delete all this code, but instead have it live harmoniously side-by-side. :)
It might even make sense just to make a new package instead, and have go-fuzz-build switch as needed; it doesn't seem like there's much code overlap.
There was a problem hiding this comment.
Could build tags be of any help here? So that if go-fuzz-build runs the compilation process, the tag 'libFuzzer' would use a special version of ```go-fuzz-dep/main.go````, without a build tag it uses the default?
IDK, I'm not really a Go guy.
There was a problem hiding this comment.
Yes, we could use build tags. It depends on how much code overlap there is. If there's substantive code overlap, build tags would be a fine idea. If the code is mostly disjoint, we may as well just have a go-fuzz-dep-libfuzzer instead.
There was a problem hiding this comment.
Let's do 2 files and build tags. go-fuzz-dep is mentioned a bunch of times in go-fuzz-build, so just adding a tag looks simpler. Tag may turn out to be useful in other places as well (maybe even in user code?).
And in the end this is all strictly internal to go-fuzz, so we can change it in future.
Since we currently define gofuzz, let's go gofuzz-libfuzzer (or gofuzz_libfuzzer id dashes are not allowed) to keep all our tags prefixed with gofuzz.
There was a problem hiding this comment.
Now supporting 2 files using build tags.
| return &ast.AssignStmt{ | ||
| Lhs: []ast.Expr{counter}, | ||
| Tok: token.ASSIGN, | ||
| Rhs: []ast.Expr{ &ast.BasicLit{Kind: token.INT, Value: "1"} }, |
There was a problem hiding this comment.
I am still don't understand the full picture -- libfuzzer should support counters, but maybe it doesn't support them via __libfuzzer_extra_counters , or maybe the problem is that you dropped:
for i := range CoverTab {
CoverTab[i] = 0
}
before each test.
Try to zero CoverTab before each test. If it does not help, then let's just switch between increments and setting to 1 depending on -libfuzzer flag.
But do we need to zero CoverTab anyway?... What does libfuzzer expect here?
| syscall.Exit(1) | ||
| } | ||
| wr += n | ||
| func Initialize(coverTabPtr unsafe.Pointer, coverTabSize uint64) { |
There was a problem hiding this comment.
Yes, we should skip that init in libfuzzer mode.
Perhaps we could put code for normal mode and libfuzzer into separate files and use libfuzzer tag to select one or another in go-fuzz-build.
| } | ||
|
|
||
| func Main(f func([]byte) int) { | ||
| runtime.GOMAXPROCS(1) // makes coverage more deterministic, we parallelize on higher level |
There was a problem hiding this comment.
Let's do 2 files and build tags. go-fuzz-dep is mentioned a bunch of times in go-fuzz-build, so just adding a tag looks simpler. Tag may turn out to be useful in other places as well (maybe even in user code?).
And in the end this is all strictly internal to go-fuzz, so we can change it in future.
Since we currently define gofuzz, let's go gofuzz-libfuzzer (or gofuzz_libfuzzer id dashes are not allowed) to keep all our tags prefixed with gofuzz.
| extern void gofuzz_Run(GoSlice p0); | ||
|
|
||
|
|
||
| #ifdef __linux__ |
There was a problem hiding this comment.
If we are looking at the same source (which is not always clear on github), then there is below:
#else
#error Currently only Linux is supported
| } | ||
|
|
||
| //export LLVMFuzzerTestOneInput | ||
| func LLVMFuzzerTestOneInput(data uintptr, size uint64) int { |
There was a problem hiding this comment.
Nice!
This improves whole bunch of things -- no copy of data, no construction of Go slice in C code, less C code.
| } | ||
| return deserialize64(buf[:]) | ||
| func init() { | ||
| CoverTab = (*[CoverSize]byte)(unsafe.Pointer(&CoverTabTmp[0])) |
There was a problem hiding this comment.
It would be super nice to put this directly into go-fuzz-dep package:
// #cgo CFLAGS: -Wall -Werror
/*
#ifdef __linux__
__attribute__((weak, section("__libfuzzer_extra_counters")))
#else
#error Currently only Linux is supported
#endif
unsigned char __go_fuzz_counters[65536];
*/
import "C"
func init() {
CoverTab = (*[CoverSize]byte)(unsafe.Pointer(&C.__go_fuzz_counters[0]))
}
Then would not need bootstrap CoverTab and the Initialize function.
But unfortunately cgo can't be used here:
go-fuzz$ go install ./... && go-fuzz-build -libfuzzer ../go-fuzz-corpus/fmt && clang fmt.a -fsanitize=fuzzer
gopath/src/github.com/dvyukov/go-fuzz/go-fuzz-dep/main.go:9:2: could not import C (no metadata for C)
gopath/src/github.com/dvyukov/go-fuzz/go-fuzz-dep/main.go:24:8: C redeclared in this block
gopath/src/github.com/dvyukov/go-fuzz/go-fuzz-dep/main.go:9:2: other declaration of C
typechecking of ../go-fuzz-corpus/fmt failed
@josharian do you know if it's theoretically possible to support cgo here?
It's not a super big deal because the current scheme works. But if it's easy to do, then it would be nice.
|
I think we are almost there. It has merge conflicts and does not seem to be based on current HEAD. |
+1. And I’ll take a last look then as well. Thanks! |
| if err != nil { | ||
| c.failf("failed to rename file: %v", err) | ||
| } | ||
| return |
There was a problem hiding this comment.
This looks unformatted, please run go fmt ./...
| ) | ||
|
|
||
|
|
||
|
|
There was a problem hiding this comment.
Please remove 2 out of 3 new lines.
| CoverTab = (*[CoverSize]byte)(coverTabPtr) | ||
| } | ||
|
|
||
| func Main(f func([]byte) int) { |
There was a problem hiding this comment.
Remove this only in this file, it does not do anything useful. And can confuse future readers as to what's its purpose.
There was a problem hiding this comment.
If I remove it, I get this:
failed to execute go build: exit status 2
# internal/testlog
/home/jhg/gofuzzpr/go/src/internal/testlog/log.go:69: undefined: gofuzzdep.Main
# errors
/home/jhg/gofuzzpr/go/src/errors/errors.go:20: undefined: gofuzzdep.Main
# math/bits
/home/jhg/gofuzzpr/go/src/math/bits/bits.go:535: undefined: gofuzzdep.Main
/home/jhg/gofuzzpr/go/src/math/bits/bits_tables.go:83: undefined: gofuzzdep.Main
# unicode/utf8
/home/jhg/gofuzzpr/go/src/unicode/utf8/utf8.go:521: undefined: gofuzzdep.Main
# internal/syscall/unix
/home/jhg/gofuzzpr/go/src/internal/syscall/unix/at.go:58: undefined: gofuzzdep.Main
/home/jhg/gofuzzpr/go/src/internal/syscall/unix/at_sysnum_linux.go:13: undefined: gofuzzdep.Main
/home/jhg/gofuzzpr/go/src/internal/syscall/unix/at_sysnum_newfstatat_linux.go:11: undefined: gofuzzdep.Main
/home/jhg/gofuzzpr/go/src/internal/syscall/unix/getrandom_linux.go:46: undefined: gofuzzdep.Main
/home/jhg/gofuzzpr/go/src/internal/syscall/unix/getrandom_linux_amd64.go:9: undefined: gofuzzdep.Main
/home/jhg/gofuzzpr/go/src/internal/syscall/unix/nonblocking.go:17: undefined: gofuzzdep.Main
# unicode
/home/jhg/gofuzzpr/go/src/unicode/casetables.go:20: undefined: gofuzzdep.Main
/home/jhg/gofuzzpr/go/src/unicode/digit.go:13: undefined: gofuzzdep.Main
/home/jhg/gofuzzpr/go/src/unicode/graphic.go:144: undefined: gofuzzdep.Main
/home/jhg/gofuzzpr/go/src/unicode/letter.go:370: undefined: gofuzzdep.Main
/home/jhg/gofuzzpr/go/src/unicode/tables.go:7761: undefined: gofuzzdep.Main
| @@ -0,0 +1,41 @@ | |||
| // Copyright 2015 Dmitry Vyukov. All rights reserved. | |||
There was a problem hiding this comment.
Note: the copyright message has changed on HEAD to a more appropriate one.
|
I will finish this tonight. |
| // Question: Do it here or in copyDir? | ||
|
|
||
| // TODO: See if we can avoid making toolchain copies, | ||
| // TODO: See if we can avoid making toolchain copies, |
There was a problem hiding this comment.
This looks unformatted. Please run go fmt ./...
|
Thanks, Guido! |
I've been waiting for the branch to be rebased and squashed to a single commit before I do another review. I think we're almost there, though. |
libFuzzer C shim: fail if host OS is not Linux Address several issues mentioned in github.com//pull/217 - Use camel case where it is customary - Implement C shim code in Go - Use SliceHeader to convert fuzzer input to Go slice go-fuzz-build: Better function names go-fuzz-build: Remove else in funcMain() go-fuzz-build: Condense testing of flagLibFuzzer go-fuzz-build: Remove else in extraBuildFlags() go-fuzz-build: Deal with os.Rename() return value Move remaining C functionality to Go Use build tags to select customized go-fuzz-dep code for -libfuzzer Build tag libfuzzer -> gofuzz_libfuzzer go-fuzz-build/main.go: go fmt go-fuzz-dep/main_libFuzzer.go: Remove excess newlines Update copyright notice go-fuzz-build populateWorkdir(): Copy cgo directory only in libFuzzer mode
|
Done. |
josharian
left a comment
There was a problem hiding this comment.
This LGTM. There are a few nits remaining, but I can just take care of those myself afterwards, since @guidovranken has been patient enough with review cycles as it is.
Before committing, though, I would like the commit message to be cleaned up a bit and force pushed.
Frustratingly, GitHub doesn't let me write line comments on the commit message (does they not care‽), but:
- Please remove " (WIP)" from the title.
- Instead of having a list of all the messy work we did along the way in the body, please write a short description instead. Something like: "This change adds a -libfuzzer flag to go-fuzz-build. When provided, go-fuzz-build generates an archive file that can be used with libFuzzer. Sample usage: . It also adds a new build tag, gofuzz_libfuzzer, that will be provided when building code for use with libFuzzer."
I will probably take that terminal transcript and send a follow-up change adding something to the README about this.
Thanks again for your patience with the review.
| if *flagLibFuzzer { | ||
| archive := c.buildInstrumentedBinary(&blocks, nil) | ||
|
|
||
| if *flagOut == "" { |
There was a problem hiding this comment.
I still think we should do my first suggestion in this thread, about adjusting *flagOut only once. But we can merge without that and I can do it as a follow-up.
| } | ||
| ` | ||
|
|
||
| var mainSrcLibFuzzer = ` |
There was a problem hiding this comment.
I will probably rewrite these to use text/template at some point in the future, since they are pretty large, and it is hard to easily see the formatting directives. Fine for now, though.
| . "github.com/dvyukov/go-fuzz/go-fuzz-defs" | ||
| ) | ||
|
|
||
| // Bool is just a bool. |
There was a problem hiding this comment.
I'd like to factor the shared code into a third file that is only protected by the gofuzz build tag.
But again, I'm happy to do that myself as a follow-up after this is merged.
Agree. |
|
Btw github allows me to edit commit title/description when squashing: Doh! This is not attributed to @guidovranken. Is it because I created the PR? I kinda created it just to see the diff myself. But since I created it off the @guidovranken tree, all pushes updated the PR so it become the review PR... |
|
Turns out this broke non-libfuzzer builds. I'm working on it now. |
|
Follow-ups: #220 |
See #213
@guidovranken
@josharian please take a look as well, as least from the perspective that you both are touching the same files